AMD Ryzen 9 9950X

R-T-B · Aug 15, 2024

Dr. Dro said:
It'd be especially egregious considered Windows only utilizes two of the four security rings in x86 CPUs, the innermost "supervisor" and outermost "user" mode rings (rings 0 and 3). As I understand, though, even with operating under a root-level account, common applications will still be in user mode, there just wouldn't be anything to inhibit system-wide (including kernel, protected and reserved regions) access.

In the name of science, one can always try to run something with NT Authority permissions to see if that alone bypasses the problem...

View attachment 359150

Well if you are using memory integrity on 11 you'll get the hypervisor ring too, but yeah, most gamers don't do that.

It'd be worth asking if w1zzard has memory integrity setting on in these tests, I suppose.

mkppo · Aug 15, 2024

rv8000 said:
From past experience with the 7900X3D I use to have, the difference was pretty moot. 7800/8000 c36/38 was generally worse latency, 5-8% bandwidth increase and incredibly difficult to stabilize with the exception of using the Tachyon/Gene. Meanwhile 6400 c32 was much easier to run/stabilize (as well as getting others OCN’ers stabilized here), with better latency, and slightly less bandwidth at much more sane voltages.

Remains to be seen if DDR5 8000 is worth it, when something along the lines of 6400 with tight timings provides similar results. This is especially true when only very specific workloads or synthetics can take advantage of this; potentially useless for 90%+ of users.

Oh yeah stabilizing it on the 2DPC X670's is a nightmare and there are very few 1DPC boards out there. But apparently that's what they are aiming for with X870, not sure how. I do hear there's a Tachyon being released and if it's not vaporware i'll try to snag one.

I was thinking more along the lines of 8000 c34 with tuned subtimings should be possible and then see how it fares vs tuned 6400.

sblantipodi · Aug 15, 2024

Dr. Dro said:
LOL, @W1zzard you might have some sleepless nights ahead of you mate

I can't even. Using the hidden sys admin account changes Zen 4 and 5 performance results, basically invalidating all reviews

I wonder if this affects Zen 3 and Intel too

be real, invalidating? it changes nearly nothing.

tfp · Aug 15, 2024

I thought the analysis of the behavior of AMDs new clustered decoders on Zen5 and CCD to CCD latency over on chips and cheese was interesting. For whatever reason I made the assumption that the latency between CCD was not vastly slower between Zen4/5 and that both decode clusters where not directly tied to an SMT thread so on a light loaded system or one with SMT enabled the main thread would benefit.

AMD’s Ryzen 9950X: Zen 5 on Desktop

AMD’s desktop Zen 5 products, codenamed Granite Ridge, are the latest in the company’s line of high performance consumer offerings.

chipsandcheese.com

rv8000 · Aug 15, 2024

mkppo said:
Oh yeah stabilizing it on the 2DPC X670's is a nightmare and there are very few 1DPC boards out there. But apparently that's what they are aiming for with X870, not sure how. I do hear there's a Tachyon being released and if it's not vaporware i'll try to snag one.

I was thinking more along the lines of 8000 c34 with tuned subtimings should be possible and then see how it fares vs tuned 6400.

8000 c34 will require 1.6+ vdimm and active cooling, c38 is more in the realm of 24/7 possible.

For some comparison, when I had that AM5 system up and running 6400 c32 tightened primary/secondary/tertiary required 1.46 vdd, 1.43 vddq, 1.35 vddio, 1.26 vsoc, and 1.1 vmisc.

Getting 7800 to boot on b650 required similar voltages at c38, 8000 was a nightmare and never fully stable at 1.48-1.5 and board or IMC was clearly holding clocks back.

Many 6400, 6600, 7800 and 8000+ results require in excess of 1.55-1.6+ at c30-32 and c36 respectively.

I do hope there have been physical memory design improvements for the 800 series. Consistency of getting 2:1 working at DDR5 8000 across many boards would be much nicer to see than a maximum frequency push.

Event Horizon · Aug 15, 2024

So it looks like there are going to be some changes: confirmed core parking/scheduling issue (guessing to be fixed by AMD and/or Microsoft), confirmed "admin" issue to be fixed by Microsoft, rumoured tdp updates on some models.

I don't think the difference will be huge but it won't be negligible either. Also some of the above will help Zen 4 as well.

Dr. Dro · Aug 15, 2024

sblantipodi said:
be real, invalidating? it changes nearly nothing.

A consistent 5-6% with room for variance, affecting a previous generation product as well? I'd argue it's at least worth a quick retest. Lest you forget, AMD fanboys routinely claim the Supreme Victory Royale™ over "le ebil ngreedia" whenever the 7900 XTX is exactly 2% ahead of the RTX 4080 in raster games... please make up your minds.

KLMR · Aug 15, 2024

I'm starting to think that this whole 9 series ryzen is made to kill AM5 platform.
Who will jump from 7xxx to 9xxxx? Nobody.
Then the new "10-serie"s appears, granting 20-30% "performance increase over last gen" and everybody gets hyped. Maybe the X870 chipset will help? How?

Is there any test where 7950 falled against 5950? This is pure nonsense.

Daven · Aug 15, 2024

KLMR said:
I'm starting to think that this whole 9 series ryzen is made to kill AM5 platform.
Who will jump from 7xxx to 9xxxx? Nobody.
Then the new "10-serie"s appears, granting 20-30% "performance increase over last gen" and everybody gets hyped. Maybe the X870 chipset will help? How?

Is there any test where 7950 falled against 5950? This is pure nonsense.

The only problem here is that AMD advertises this as a whole generation jump when really it was a refresh. If they just said it would be a refresh then very few would be surprised by the results. Intel also makes this mistake when they claim to release a new generation every year when in reality they do not.

Assimilator · Aug 15, 2024

Can someone please remind me why AMD needs this "Xbox game bar" nonsense for their dual-CCD CPUs to work properly, when Intel doesn't?

tfp · Aug 15, 2024

Daven said:
The only problem here is that AMD advertises this as a whole generation jump when really it was a refresh. If they just said it would be a refresh then very few would be surprised by the results. Intel also makes this mistake when they claim to release a new generation every year when in reality they do not.

AMD did do a lot of work on the front end, not just a mild tweak but it doesn't seem to bare a lot of fruit. I think this is more then a refresh which I think of as throw cache and clock speeds at, but it didn't hit the mark. Maybe the closest release I can compare this to from intel was 10th vs 11th gen processors? Intel clamed 18% IPC increase from Skylake, upped L1/L2 redid somethings around the front end but the CPU fell flat.

Sunny Cove (microarchitecture) - Wikipedia

en.wikipedia.org

KLMR · Aug 15, 2024

Assimilator said:
Can someone please remind me why AMD needs this "Xbox game bar" nonsense for their dual-CCD CPUs to work properly, when Intel doesn't?

Yes, thats another nonsense, probably the software-driver department found it was the easiest solution to detect if a program was a game to manage the power profile. How does discord manages to know its a mistery.

Zach_01 · Aug 15, 2024

Assimilator said:
Can someone please remind me why AMD needs this "Xbox game bar" nonsense for their dual-CCD CPUs to work properly, when Intel doesn't?

They need to be treated like X3D parts because the new (server driven) architecture has over 2x the cross-CCD latency over Zen4
So core parking is a must, assigning game threads on 1 CCD.
While this does not affect so much the productivity workloads it does vastly the gaming ones.

Gamers can wait for the X3D parts

AMDs new era of CPUs segmentation.
We are just a little slow to wrap our heads around it, mostly because previous Ryzen gen to gen upgrades were very different.

AMD Ryzen 9 9950X

I want this, Since the Threadripper is way above my paygrade :D Can someone explain how many usable lanes a X870E has? Why is this core parking thing a problem? You can set your stuff which core to run in the basic Windows process manager.

www.techpowerup.com

R0H1T · Aug 15, 2024

tfp said:
I thought the analysis of the behavior of AMDs new clustered decoders on Zen5 and CCD to CCD latency over on chips and cheese was interesting. For whatever reason I made the assumption that the latency between CCD was not vastly slower between Zen4/5 and that both decode clusters where not directly tied to an SMT thread so on a light loaded system or one with SMT enabled the main thread would benefit.

AMD’s Ryzen 9950X: Zen 5 on Desktop

AMD’s desktop Zen 5 products, codenamed Granite Ridge, are the latest in the company’s line of high performance consumer offerings.

chipsandcheese.com

From the same review ~

As with Zen 2, each CCD connects to the IO die through an Infinity Fabric link. On desktop, this link is 32 bytes per cycle in the read direction and 16 bytes per cycle in the write direction. That differs from AMD’s mobile parts, where the Infinity Fabric link from a core cluster can do 32 bytes per cycle in both directions. Infinity Fabric runs at 2 GHz on both setups, just as it did on desktop Zen 4. That’s not a surprise, since AMD has re-used Zen 4’s IO die for Zen 5. At that clock speed, each cluster has 64 GB/s of read bandwidth and 32 GB/s of write bandwidth to the rest of the system.

This was an issue with zen4 as well, they need to address this with zen6 or they might as well not even pretend to care about MSDT anymore!

tfp · Aug 15, 2024

R0H1T said:
From the same review ~

This was an issue with zen4 as well, they need to address this with zen6 or they might as well not even pretend to care about MSDT anymore!

I thought Strix Point was a slightly modified zen5 for laptop. Zen4 doesn't have the 2 decoder blocks.

R0H1T · Aug 16, 2024

This is just the memory subsystem or hierarchy.

You're probably talking about this.

Dyno · Aug 16, 2024

Hi again, you don't think the 12th, 13th, and 14th, well primarily the 14th generation intel processors will benchmark higher if you use higher frequency memory than the one you been using? I know we've talked about this, but it was in regards to AMD cpu's. I think if you use higher frequency memory for the 12th, 13th, and 14th generation intel processor's they will give more higher results. The thing is with 12th generation maybe 6400MHz, with 13th, something around 6800Mhz. The 14th generation can scale higher frequency memory for better results. I know you know this.

DemonicRyzen666 · Aug 16, 2024

R0H1T said:
From the same review ~

This was an issue with zen4 as well, they need to address this with zen6 or they might as well not even pretend to care about MSDT anymore!

Those diagram contradict each other.
Its quite obvious now that the AGESA has it stuck in 16 byte cycles for the desktop when it aupports the full fat 32 byte cycles it has to for the avx 512 support??? Was it suppose to be able to change dymanicly on detected load or something cause thats a 50% bottleneck for writes. Last that does not explain the increase in CCD latency increases at all. As the bandwidth for caches was increase from L1 to L2 and L2 to L3, they even mention how latebcys wwbt down a cycles for some of them.

It's honstly more impresaive that the AGESS micro code could be messed up that badly by someone lol

R0H1T · Aug 16, 2024

Contradict what or how?

A Computer Guy · Aug 16, 2024

Zach_01 said:
They need to be treated like X3D parts because the new (server driven) architecture has over 2x the cross-CCD latency over Zen4
So core parking is a must, assigning game threads on 1 CCD.
While this does not affect so much the productivity workloads it does vastly the gaming ones.

Gamers can wait for the X3D parts

AMDs new era of CPUs segmentation.
We are just a little slow to wrap our heads around it, mostly because previous Ryzen gen to gen upgrades were very different.

AMD Ryzen 9 9950X

I want this, Since the Threadripper is way above my paygrade :D Can someone explain how many usable lanes a X870E has? Why is this core parking thing a problem? You can set your stuff which core to run in the basic Windows process manager.

www.techpowerup.com

Here is something I don't get. If the new server driven arch has so much more cross-CCD latency wouldn't that be bad for virtualization platforms when VM's virtual processing crosses CCD's too? I feel like this is a use case AMD shot themselves in the foot with increase latency and perhaps explains why the virtualization benchmark took such a regression from last gen instead of being some improvement.

for reference

Zach_01 · Aug 16, 2024

A Computer Guy said:
Here is something I don't get. If the new server driven arch has so much more cross-CCD latency wouldn't that be bad for virtualization platforms when VM's virtual processing crosses CCD's too? I feel like this is a use case AMD shot themselves in the foot with increase latency and perhaps explains why the virtualization benchmark took such a regression from last gen instead of being 20%+ improvement.

I'm not going to claim that I've seen or know everything but I assume you are talking about virtualization in windows right?
How this workload is coming on nonWin apps? Like linux or whatever else servers may run? And how important is it really compared to other workloads on EPYC?

A Computer Guy · Aug 16, 2024

Zach_01 said:
I'm not going to claim that I've seen or know everything but I assume you are talking about virtualization in windows right?
How this workload is coming on nonWin apps? Like linux or whatever else servers may run? And how important is it really compared to other workloads on EPYC?

Those are good questions. For the charts I posted I had to go back to the setup and review but it's a bit sparse describing the setup of the Oracle VirtualBox but if I understand the test setup correctly it was a Win11 host and Win11 guest. I would think it wouldn't matter if the virtualization was a level 1 or level 2 hypervisor or the OS's involved. The potential of cross CCD latency would still be there in all situations unless the hypervisor and/or host OS was smart enough to prevent that. Perhaps I'm making a fuss over nothing but I wonder if just like for games the other tests indicated in red performed poorly for the same reasons? (edit) sorry for the word salad.

tfp · Aug 16, 2024

R0H1T said:
This is just the memory subsystem or hierarchy.

You're probably talking about this.

In part yes, but Zen4 front end looks like the below. You can see that the decoder was duplicated in Zen5 but only works on the individual SMT and not together when one thread is idle or disabled like on some of Intel's chips. It generally feels like they made good progress on the front end but don't have a wide enough CPU core, can't retire instructions fast enough, are ram/cache limited, or some other bottleneck that just makes these improvements most limited then expected. I thought the chips and cheese article makes some reasonable assumptions around this.

AMD’s Zen 4 Part 1: Frontend and Execution Engine

AMD’s Zen 4 architecture has been hotly anticipated by many in the tech sphere; as a result many rumors were floating around about its performance gains prior to its release.

chipsandcheese.com

Zach_01 · Aug 16, 2024

A Computer Guy said:
Those are good questions. For the charts I posted I had to go back to the setup and review but it's a bit sparse describing the setup of the Oracle VirtualBox but if I understand the test setup correctly it was a Win11 host and Win11 guest. I would think it wouldn't matter if the virtualization was a level 1 or level 2 hypervisor or the OS's involved. The potential of cross CCD latency would still be there in all situations unless the hypervisor and/or host OS was smart enough to prevent that. Perhaps I'm making a fuss over nothing but I wonder if just like for games the other tests indicated in red performed poorly for the same reasons? (edit) sorry for the word salad.

Fair enough... And something else on this cross-CCD latency
This we see on desktop variants with the same and probably obsolete IOD that was carried over from desktop 7000.
We dont really know what AMD has cooked up for EPYC on the cross chiplet communication matter.

Desktop variants are not that important to them apparently (not saying at all) and kept cost low (no new IOD) while they will bring the extra cache and (maybe over-) compensate for the IOD bottleneck on some workloads. Games first.
In the mean time, apps and windows may slowly improve them by a few % by upgrading for their new structure.

Time will tell
Not willing to dismiss them before I see the whole picture first because the first batch of them have fall under my expectations based on past experience.
Anyway I care more about gaming performance than anything else, and those are yet to come.

DemonicRyzen666 · Aug 16, 2024

tfp said:
In part yes, but Zen4 front end looks like the below. You can see that the decoder was duplicated in Zen5 but only works on the individual SMT and not together when one thread is idle or disabled like on some of Intel's chips. It generally feels like they made good progress on the front end but don't have a wide enough CPU core, can't retire instructions fast enough, are ram/cache limited, or some other bottleneck that just makes these improvements most limited then expected. I thought the chips and cheese article makes some reasonable assumptions around this.

AMD’s Zen 4 Part 1: Frontend and Execution Engine

AMD’s Zen 4 architecture has been hotly anticipated by many in the tech sphere; as a result many rumors were floating around about its performance gains prior to its release.

chipsandcheese.com

View attachment 359170

Expect that contradicts what the Zen engineer said in the youtube interview with chips & cheese?

Chip & cheese is also contradicting his own interview in parts of his own review.

It totally is suppose to able to excute both pipelines decoders and predictors when running a single thread through the cores. Other wise all those parts just sits idle doing nothing predictor, decode, & pipeline.

Chips & cheese test he proved that it not doing what the engineer claimed it was supposed to do or could in the youtube interview. All three parta are idle when smt is disabled. It is only using one predictor, one pipeline, & one decoder.

System Name	Pioneer
Processor	Ryzen 9 9950X
Motherboard	MSI MAG X670E Tomahawk Wifi
Cooling	Noctua NH-D15 + A whole lotta Sunon, Phanteks and Corsair Maglev blower fans...
Memory	128GB (4x 32GB) G.Skill Flare X5 @ DDR5-4000(Running 1:1:1 w/FCLK)
Video Card(s)	XFX RX 7900 XTX Speedster Merc 310
Storage	Intel 5800X Optane 800GB boot, +2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs, 1x 2TB Seagate Exos 3.5"
Display(s)	55" LG 55" B9 OLED 4K Display
Case	Thermaltake Core X31
Audio Device(s)	TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply	FSP Hydro Ti Pro 850W
Mouse	Logitech G305 Lightspeed Wireless
Keyboard	WASD Code v3 with Cherry Green keyswitches + PBT DS keycaps
Software	Gentoo Linux x64

System Name	GraniteXT
Processor	Ryzen 9950X
Motherboard	ASRock B650M-HDV
Cooling	2x360mm custom loop
Memory	2x24GB Team Xtreem DDR5-8000 [M die]
Video Card(s)	RTX 3090 FE underwater
Storage	Intel P5800X 800GB + Samsung 980 Pro 2TB
Display(s)	MSI 342C 34" OLED
Case	O11D Evo RGB
Audio Device(s)	DCA Aeon 2 w/ SMSL M200/SP200
Power Supply	Superflower Leadex VII XG 1300W
Mouse	Razer Basilisk V3
Keyboard	Steelseries Apex Pro V2 TKL

System Name	"Icy Resurrection"
Processor	13th Gen Intel Core i9-13900KS
Motherboard	ASUS ROG Maximus Z790 Apex Encore
Cooling	Noctua NH-D15S upgraded with 2x NF-F12 iPPC-3000 fans and Honeywell PTM7950 TIM
Memory	32 GB G.SKILL Trident Z5 RGB F5-6800J3445G16GX2-TZ5RK @ 7600 MT/s 36-44-44-52-96 1.4V
Video Card(s)	NVIDIA RTX A2000
Storage	500 GB WD Black SN750 SE NVMe SSD + 4 TB WD Red Plus WD40EFPX HDD
Display(s)	55-inch LG G3 OLED
Case	Pichau Mancer CV500 White Edition
Audio Device(s)	Sony MDR-V7 connected through Apple USB-C
Power Supply	EVGA 1300 G2 1.3kW 80+ Gold
Mouse	Microsoft Classic IntelliMouse (2017)
Keyboard	IBM Model M type 1391405
Software	Windows 10 Pro 22H2
Benchmark Scores	I pulled a Qiqi~

System Name	Firelance.
Processor	Threadripper 3960X
Motherboard	ROG Strix TRX40-E Gaming
Cooling	IceGem 360 + 6x Arctic Cooling P12
Memory	8x 16GB Patriot Viper DDR4-3200 CL16
Video Card(s)	MSI GeForce RTX 4060 Ti Ventus 2X OC
Storage	2TB WD SN850X (boot), 4TB Crucial P3 (data)
Display(s)	Dell S3221QS(A) (32" 38x21 60Hz) + 2x AOC Q32E2N (32" 25x14 75Hz)
Case	Enthoo Pro II Server Edition (Closed Panel) + 6 fans
Power Supply	Fractal Design Ion+ 2 Platinum 760W
Mouse	Logitech G604
Keyboard	Razer Pro Type Ultra
Software	Windows 10 Professional x64

System Name	PC on since Aug 2019, 1st CPU R5 3600 + ASUS ROG RX580 8GB >> MSI Gaming X RX5700XT (Jan 2020)
Processor	Ryzen 9 5900X (July 2022), 220W PPT limit, 85C temp limit, CO -8~14, +50MHz (up to 5.0GHz)
Motherboard	Gigabyte X570 Aorus Pro (Rev1.0), BIOS F39b, AGESA V2 1.2.0.C
Cooling	Arctic Liquid Freezer II 420mm Rev7 (Jan 2024) with off-center mount for Ryzen, TIM: Kryonaut
Memory	2x16GB G.Skill Trident Z Neo GTZN (July 2022) 3600MT/s 1.38V CL16-16-16-16-32-48 1T, tRFC:280, B-die
Video Card(s)	Sapphire Nitro+ RX 7900XTX (Dec 2023) 314~467W (382W current) PowerLimit, 1060mV, Adrenalin v24.12.1
Storage	Samsung NVMe: 980Pro 1TB(OS 2022), 970Pro 512GB(2019) / SATA-III: 850Pro 1TB(2015) 860Evo 1TB(2020)
Display(s)	Dell Alienware AW3423DW 34" QD-OLED curved (1800R), 3440x1440 144Hz (max 175Hz) HDR400/1000, VRR on
Case	None... naked on desk
Audio Device(s)	Astro A50 headset
Power Supply	Corsair HX750i, ATX v2.4, 80+ Platinum, 93% (250~700W), modular, single/dual rail (switch)
Mouse	Logitech MX Master (Gen1)
Keyboard	Logitech G15 (Gen2) w/ LCDSirReal applet
Software	Windows 11 Home 64bit (v24H2, OSBuild 26100.3037), upgraded from Win10 to Win11 on Jan 2024

System Name	AMD
Processor	AMD Ryzen 7 9800X3D
Motherboard	Asus PRIME X670E-PRO [3067] Bios
Cooling	Corsair H170i Elite Cappellix XT AIO Cooler [420mm]
Memory	2x16 32GB DDR5 6000Mhz
Video Card(s)	eVGA RTX 3080Ti
Storage	1TB Sabrent PCIe 4.0 [Main] 2TB Sabrent PCIe 3.0 [Storage] 1TB Samsung 850 EVO SSD [Storage]
Display(s)	Acer Predator 24" 1080P 144Hz
Case	NZXT H7 Flow [2024]
Audio Device(s)	Corsair Headset 7.1
Power Supply	Corsair AX860
Mouse	xVGA X20 Wireless
Keyboard	Corsair K63 Wireless
Software	Windows 11 Pro 24H2 [26100.2] Build

System Name	S.L.I + RTX research rig
Processor	Ryzen 7 5800X 3D.
Motherboard	MSI MEG ACE X570
Cooling	Corsair H150i Cappellx
Memory	Corsair Vengeance pro RGB 3200mhz 32Gbs
Video Card(s)	2x Dell RTX 2080 Ti in S.L.I
Storage	Western digital Sata 6.0 SDD 500gb + fanxiang S660 4TB PCIe 4.0 NVMe M.2
Display(s)	HP X24i
Case	Corsair 7000D Airflow
Power Supply	EVGA G+1600watts
Mouse	Corsair Scimitar
Keyboard	Cosair K55 Pro RGB

System Name	Still not a thread ripper but pretty good.
Processor	Ryzen 9 7950x, Thermal Grizzly AM5 Offset Mounting Kit, Thermal Grizzly Extreme Paste
Motherboard	ASRock B650 LiveMixer (BIOS/UEFI version P3.08, AGESA 1.2.0.2)
Cooling	EK-Quantum Velocity, EK-Quantum Reflection PC-O11, D5 PWM, EK-CoolStream PE 360, XSPC TX360
Memory	Micron DDR5-5600 ECC Unbuffered Memory (2 sticks, 64GB, MTC20C2085S1EC56BD1) + JONSBO NF-1
Video Card(s)	XFX Radeon RX 5700 & EK-Quantum Vector Radeon RX 5700 +XT & Backplate
Storage	Samsung 4TB 980 PRO, 2 x Optane 905p 1.5TB (striped), AMD Radeon RAMDisk
Display(s)	2 x 4K LG 27UL600-W (and HUANUO Dual Monitor Mount)
Case	Lian Li PC-O11 Dynamic Black (original model)
Audio Device(s)	Corsair Commander Pro for Fans, RGB, & Temp Sensors (x4)
Power Supply	Corsair RM750x
Mouse	Logitech M575
Keyboard	Corsair Strafe RGB MK.2
Software	Windows 10 Professional (64bit)
Benchmark Scores	RIP Ryzen 9 5950x, ASRock X570 Taichi (v1.06), 128GB Micron DDR4-3200 ECC UDIMM (18ASF4G72AZ-3G2F1)