AMD's Ryzen Cache Analyzed - Improvements; Improveable; CCX Compromises

TheGuruStud · Mar 10, 2017

TheLaughingMan said:
I am going to wait to see Ryzen 5 in action. We have not concrete information about those chips and how they OC. Overclocking 8 cores is a different animal than overclocking 4 cores...historically at least. And I don't think the limit in OC is entirely the architecture, but we will find out.

I assume the arch is fine. It's the LPP process that wasn't intended for such clocks.

Fabiano · Mar 10, 2017

AIDA64 Build 5.80.4089 shows much better reults for Ryzen now.
https://forums.aida64.com/topic/3768-aida64-compatibility-with-amd-ryzen-processors/

nemesis.ie · Mar 10, 2017

Is this some beta d/l as my AIDA is not offering an update from 5.08.40 at the moment? Thanks!

And "2 x Octal core" doesn't seem right. When a 4790k says "quadcore" so maybe it needs some more fixing.

cadaveca · Mar 10, 2017

nemesis.ie said:
Is this some beta d/l as my AIDA is not offering an update from 5.08.40 at the moment? Thanks!

You need engineer edition, as I'm on 5.08.4093 and version above is 5.08.4089.

nemesis.ie · Mar 10, 2017

That's a bit cheeky to not put it out for Extreme!

Thanks Dave.

cadaveca · Mar 10, 2017

nemesis.ie said:
That's a bit cheeky to not put it out for Extreme!

Thanks Dave.

Meh, We get to beta-test the new versions, you get stable, what's the issue?

nemesis.ie · Mar 10, 2017

The issue is I'm perfectly up for beta testing (and do some in other areas too), maybe they should add an opt-in for that.

In other news, my Asrock X370 (pro gaming) ships today.

Enlightnd · Mar 17, 2017

Question, I've read here and in other places that part of the CCX bus congestion issue for games is that PCIe data is also shoved over the CCX bus.

Has anyone done any tests to see if the issue is greater for GPU's on the chipset PCIe lanes vs GPU's on the CPU embedded PCIe lanes?

(EDIT: Fix CPU lanes with PCIe lanes)

uuuaaaaaa · Mar 17, 2017

Enlightnd said:
Question, I've read here and in other places that part of the CCX bus congestion issue for games is that PCIe data is also shoved over the CCX bus.

Has anyone done any tests to see if the issue is greater for GPU's on the chipset PCIe lanes vs GPU's on the CPU embedded CPU lanes?

That would be a cool thing to test!

Shirley Marquez · Mar 17, 2017

airfathaaaaa said:
this is windows load balancing working like it id on nehalems and first gen skylakes

basicly windows treats ryzen as a massive 16 core cpu instead of 8c 16t

The cache design of Ryzen 7 suggests that an even better way to handle it would be to schedule it as a two socket system, each of which is a 4c 8t CPU. The L3 cache is divided into two parts, and performance is much worse if a core on side A needs data from side B or vice versa.

uuuaaaaaa · Mar 17, 2017

Shirley Marquez said:
The cache design of Ryzen 7 suggests that an even better way to handle it would be to schedule it as a two socket system, each of which is a 4c 8t CPU. The L3 cache is divided into two parts, and performance is much worse if a core on side A needs data from side B or vice versa.

I think NUMA would require a separate memory controller for each CCX, which is shared between ccx's on ryzen. But yeah, somewhat of an hybrid thing would be the real deal. For now lets hope that 4000MHz memory support gets there...

EasyListening · Mar 20, 2017

Enlightnd said:
Question, I've read here and in other places that part of the CCX bus congestion issue for games is that PCIe data is also shoved over the CCX bus.

Has anyone done any tests to see if the issue is greater for GPU's on the chipset PCIe lanes vs GPU's on the CPU embedded PCIe lanes?

(EDIT: Fix CPU lanes with PCIe lanes)

GPUs can only use the lanes on the Ryzen CPUs, they don't connect to the Southbridge. So 16x or 8x/8x, off the CPU.

Shirley Marquez said:
The cache design of Ryzen 7 suggests that an even better way to handle it would be to schedule it as a two socket system, each of which is a 4c 8t CPU. The L3 cache is divided into two parts, and performance is much worse if a core on side A needs data from side B or vice versa.

I'm wondering if the higher speed of copy operations on the L3 was specifically tweaked to speed up copies between the two L3s, allowing both CCXs to work from the same data after copying things over, if that would even help... but looks like the new version of AIDA makes this whole CCX intercommunication "bug" a non-issue.

Naples has a ton of PCIe lanes connecting two sockets together on dual socket configs. Somewhere at AMD there must have been people who worked on intercommunication between the 2 CCXs. I don't buy the theory that AMD simply dropped the ball and put out a chip with a glaring architectural flaw. If there are limitations of Ryzen I expect to find compromises that were made after intense discussion. Although they don't have a foundry, they do have the ability to do limited production in house for testing and research purposes. It really feels like people are way underestimating AMD and the quality of their product.

Enlightnd · Mar 20, 2017

I wonder if that is accurate (about the PCIe lanes). I'm in conversation on IRC with several people using pass-trough (for virtualization) and they are explicitly speaking about the issues they have between GPU's on the CPU based bus and ones on a chipset hosted PCIe slot. Seems some boards have crappy IOMMU groupings causing weirdness with GPUs.

EasyListening · Mar 20, 2017

Enlightnd said:
I wonder if that is accurate (about the PCIe lanes). I'm in conversation on IRC with several people using pass-trough (for virtualization) and they are explicitly speaking about the issues they have between GPU's on the CPU based bus and ones on a chipset hosted PCIe slot. Seems some boards have crappy IOMMU groupings causing weirdness with GPUs.

edit: Sorry, I didn't read your post carefully enough. I'll leave the pic up though, maybe someone will find it useful. But, yea, I have no idea what those guys on IRC are talking about. Aren't they mistaken in thinking that one of their GPUs is running off the chipset?

Taken from
https://rog.asus.com/articles/techn...platform-and-its-x370-b350-and-a320-chipsets/

Super XP · Mar 23, 2017

AMD will tighten up this L3 Latence. It will get better and better.

nemesis.ie · Mar 23, 2017

EasyListening said:
edit: Sorry, I didn't read your post carefully enough. I'll leave the pic up though, maybe someone will find it useful. But, yea, I have no idea what those guys on IRC are talking about. Aren't they mistaken in thinking that one of their GPUs is running off the chipset?

No, they are not mistaken, you could for example have 3 GPUs in there.

2 from the CPU and one from the chipset (with the associated latency).

In fact what a lot of the folks using VM want to do is have all 3 cards in separate I/O groups so you can e.g. have one card for your host O/S and the others each dedicated to a VM.
If the groups/UEFI are right, you could have a slower card off the chipset and have that as the host OSes' card (boot graphics) and then two powerful cards connected to the VMs or whatever.

EasyListening · Mar 29, 2017

Super XP said:
AMD will tighten up this L3 Latence. It will get better and better.

anddddddddd, it did.

Man, I am laughing all the way to the bank.

Nephilim666 · Mar 30, 2017

The day there is 4GHz ram, 4GHz chip and a nice high capacity (64GB sounds nice) I will be throwing cash at AMD.

Super XP · Mar 30, 2017

Nephilim666 said:
The day there is 4GHz ram, 4GHz chip and a nice high capacity (64GB sounds nice) I will be throwing cash at AMD.

Seeing how Ram Speed makes a huge performance difference in Ryzen, yes Agreed.

msroadkill612 · Mar 26, 2018

Shirley Marquez said:
The cache design of Ryzen 7 suggests that an even better way to handle it would be to schedule it as a two socket system, each of which is a 4c 8t CPU. The L3 cache is divided into two parts, and performance is much worse if a core on side A needs data from side B or vice versa.

What an interesting suggestion.

Your paradigm of splitting, for coding purposes, the 8 cores into discrete 4 core ccxS & 8MB L3 cache blocks. & then minimising interaction between them, could speed some apps considerably.

I am a newb~, but i mused similarly in the context of a poor mans vega pro ssg (a 16GB $5000+ Vega w/ an onboard 4x 960 pro raid array).

if you install an Affordable 8 lane vega and an 8 lane 2x nvme adapter, so both link to the same 16 lane ccx (as a 16 lane card does e.g.) , then the gpu and the 2x nvme raid array may be able to talk very directly, and ~share the same 8MB cpu L3 cache. It doesnt bypass the shared pcie bus like Vega SSG, but it could be minimal latency, and enhanced by specialised large block size formatting for; swapping, workspace, temp files and graphics.

Vega 56/64 of course, have a dedicated HBCC subsystem for such gpu cache extension using nvme arrays. Done right, it promises a pretty good illusion of ~unlimited gpu memory/address space. Cool indeed.

As you see, a belated post from me. We now have evidence in the perf figures of single ccx zen/vega apuS. Yes, inter ccx interconnects have dragged Ryzen ~IPC down.

TheGuruStud · Mar 26, 2018

Shirley Marquez said:
The cache design of Ryzen 7 suggests that an even better way to handle it would be to schedule it as a two socket system, each of which is a 4c 8t CPU. The L3 cache is divided into two parts, and performance is much worse if a core on side A needs data from side B or vice versa.

Devs probably won't have a choice. It's only a matter of time before intel announces their copy of Ryzen.

Processor	OCed 5800X3D
Motherboard	Asucks C6H
Cooling	Air
Memory	32GB
Video Card(s)	OCed 6800XT
Storage	NVMees
Display(s)	32" Dull curved 1440
Case	Freebie glass idk
Audio Device(s)	Sennheiser
Power Supply	Don't even remember

System Name	7950X
Processor	R9 7950X
Motherboard	Asus X670 Hero
Cooling	Noctua DH15 black edition
Memory	64GB (4 x 16GB) 6000 MT/s
Video Card(s)	Radeon RX7900XT
Storage	Firecuda 540
Display(s)	Samsung Q95T
Audio Device(s)	On board
Power Supply	Seasonic Prime 1000W
Mouse	Roccat Leadr
Keyboard	K95 RGB
Software	Windows 11 Pro x64, insider preview dev channel
Benchmark Scores	#1 worldwide on 3D Mark 99, back in the (P133) days. :)

System Name	7950X
Processor	R9 7950X
Motherboard	Asus X670 Hero
Cooling	Noctua DH15 black edition
Memory	64GB (4 x 16GB) 6000 MT/s
Video Card(s)	Radeon RX7900XT
Storage	Firecuda 540
Display(s)	Samsung Q95T
Audio Device(s)	On board
Power Supply	Seasonic Prime 1000W
Mouse	Roccat Leadr
Keyboard	K95 RGB
Software	Windows 11 Pro x64, insider preview dev channel
Benchmark Scores	#1 worldwide on 3D Mark 99, back in the (P133) days. :)

System Name	7950X
Processor	R9 7950X
Motherboard	Asus X670 Hero
Cooling	Noctua DH15 black edition
Memory	64GB (4 x 16GB) 6000 MT/s
Video Card(s)	Radeon RX7900XT
Storage	Firecuda 540
Display(s)	Samsung Q95T
Audio Device(s)	On board
Power Supply	Seasonic Prime 1000W
Mouse	Roccat Leadr
Keyboard	K95 RGB
Software	Windows 11 Pro x64, insider preview dev channel
Benchmark Scores	#1 worldwide on 3D Mark 99, back in the (P133) days. :)

System Name	No name / Purple Haze
Processor	Phenom II 1100T @ 3.8Ghz / Pentium 4 3.4 EE Gallatin @ 3.825Ghz
Motherboard	MSI 970 Gaming/ Abit IC7-MAX3
Cooling	CM Hyper 212X / Scythe Andy Samurai Master (CPU) - Modded Ati Silencer 5 rev. 2 (GPU)
Memory	8GB GEIL GB38GB2133C10ADC + 8GB G.Skill F3-14900CL9-4GBXL / 2x1GB Crucial Ballistix Tracer PC4000
Video Card(s)	Asus R9 Fury X Strix (4096 SP's/1050 Mhz)/ PowerColor X850XT PE @ (600/1230) AGP + (HD3850 AGP)
Storage	Samsung 250 GB / WD Caviar 160GB
Display(s)	Benq XL2411T
Audio Device(s)	motherboard / Creative Sound Blaster X-Fi XtremeGamer Fatal1ty Pro + Front panel
Power Supply	Tagan BZ 900W / Corsair HX620w
Mouse	Zowie AM
Keyboard	Qpad MK-50
Software	Windows 7 Pro 64Bit / Windows XP
Benchmark Scores	64CU Fury: http://www.3dmark.com/fs/11269229 / X850XT PE http://www.3dmark.com/3dm05/5532432

AMD's Ryzen Cache Analyzed - Improvements; Improveable; CCX Compromises

TheGuruStud

Fabiano

nemesis.ie

cadaveca

My name is Dave

nemesis.ie

cadaveca

My name is Dave

nemesis.ie

Enlightnd

New Member

uuuaaaaaa

Shirley Marquez

uuuaaaaaa

EasyListening

Enlightnd

New Member

EasyListening

Super XP

nemesis.ie

EasyListening

Nephilim666

Super XP

msroadkill612

TheGuruStud

System Name	RiseZEN Gaming PC
Processor	AMD Ryzen 7 5800X @ Auto
Motherboard	Asus ROG Strix X570-E Gaming ATX Motherboard
Cooling	Corsair H115i Elite Capellix AIO, 280mm Radiator, Dual RGB 140mm ML Series PWM Fans
Memory	G.Skill TridentZ 64GB (4 x 16GB) DDR4 3200
Video Card(s)	ASUS DUAL RX 6700 XT DUAL-RX6700XT-12G
Storage	Corsair MP510 480GB M.2 - 2 x WD_BLACK 1TB SN850X M.2 1TB - Lexar NQ790 M.2 2TB
Display(s)	ASUS ROG Strix 34” XG349C 144Hz 1440p + Asus ROG 27" MG278Q 144Hz WQHD 1440p
Case	Corsair Obsidian Series 450D Gaming Case
Audio Device(s)	SteelSeries 5Hv2 w/ Sound Blaster Z SE
Power Supply	Corsair RM750x Power Supply
Mouse	Razer Death-Adder + Viper 8K HZ Ambidextrous Gaming Mouse - Ergonomic Left Hand Edition
Keyboard	Logitech G910 Orion Spectrum RGB Gaming Keyboard
Software	Windows 10 Pro - 64-Bit Edition (Back to Win 10 because 11 is garbage)
Benchmark Scores	I'm the Doctor, Doctor Who. The Definition of Gaming is PC Gaming...