• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD's Ryzen Cache Analyzed - Improvements; Improveable; CCX Compromises

Raevenlord

News Editor
Joined
Aug 12, 2016
Messages
3,755 (1.24/day)
Location
Portugal
System Name The Ryzening
Processor AMD Ryzen 9 5900X
Motherboard MSI X570 MAG TOMAHAWK
Cooling Lian Li Galahad 360mm AIO
Memory 32 GB G.Skill Trident Z F4-3733 (4x 8 GB)
Video Card(s) Gigabyte RTX 3070 Ti
Storage Boot: Transcend MTE220S 2TB, Kintson A2000 1TB, Seagate Firewolf Pro 14 TB
Display(s) Acer Nitro VG270UP (1440p 144 Hz IPS)
Case Lian Li O11DX Dynamic White
Audio Device(s) iFi Audio Zen DAC
Power Supply Seasonic Focus+ 750 W
Mouse Cooler Master Masterkeys Lite L
Keyboard Cooler Master Masterkeys Lite L
Software Windows 10 x64
AMD's Ryzen 7 lower than expected performance in some applications seems to stem from a particular problem: memory. Before AMD's Ryzen chips were even out, reports pegged AMD as having confirmed that most of the tweaks and programming for the new architecture had been done in order to improve core performance to its max - at the expense of memory compatibility and performance. Apparently, and until AMD's entire Ryzen line-up is completed with the upcoming Ryzen 5 and Ryzen 3 processors, the company will be hard at work on improving Ryzen's cache handling and memory latency.

Hardware.fr has done a pretty good job in exploring Ryzen's cache and memory subsystem deficiencies through the use of AIDA 64, in what would otherwise be an exceptional processor design. Namely, the fact that there seems to be some problem with Ryzen's L3 cache and memory subsystem implementation. Paired with the same memory configuration and at the same 3 GHz clocks, for instance, Ryzen's memory tests show memory latency results that are up to 30 ns higher (at 90 ns) than the average latency found on Intel's i7 6900K or even AMD's FX 8350 (both at around 60 ns).





Update: The lack of information regarding the test system could have elicited some gray areas in the interpretation of the results. Hardware.fr tests, and below results, were obtained by setting the 8-core chips at 3 GHz, with SMT and HT deactivated. Memory for the Ryzen and Intel platforms was DDR4-2400 with 15-15-15-35 timings, and memory for the AMD FX platform was DDR3-1600 operating at 9-9-9-24 timings. Both memory configurations were set at 4x 4 GB, totaling 16 GB of memory.

From some more testing results, we see that Intel's L1 cache is still leagues ahead from AMD's implementation; that AMD's L2 is overall faster than Intel's, though it does incur on a roughly 2 ns latency penalty; and that AMD's L3 memory is very much behind Intel's in all metrics but L3 cache copies, with latency being almost 3x greater than on Intel's 6900K.



The problem is revealed through an increasing work size. In the case of the 6900K, which has a 32 KB L1 cache, performance is greatest until that workload size. Higher-sized workloads that don't fit on the L1 cache then "spill" towards the 6900K's 256 KB L2 cache; workloads higher than 256 KB and lower than 16 MB are then submitted to the 6900 K's 20 MB L3 cache, with any workloads larger than 16 MB then forcing the processor to access the main system memory, with increasing latency in access times until it reaches the RAM's ~70 ns access times.



However, on AMD's Ryzen 1800X, latency times are a wholly different beast. Everything is fine in the L1 and L2 caches (32 KB and 512 KB, respectively). However, when moving towards the 1800X's 16 MB L3 cache, the behavior is completely different. Up to 4 MB cache utilization, we see an expected increase in latency; however, latency goes through the roof way before the chip's 16 MB of L3 cache is completely filled. This clearly derives from AMD's Ryzen modularity, with each CCX complex (made up of 4 cores and 8 MB L3 cache, besides all the other duplicated logic) being able to access only 8 MB of L3 cache at any point in time.



The difference in access speeds between 4 MB and 8 MB workloads can be explained through AMD's own admission that Ryzen's core design incurs in different access times depending on which parts of the L3 cache are accessed by the CCX. The fact that this memory is "mostly exclusive" - which means that other information may be stored on it that's not of immediate use to the task at hand - can be responsible for some memory accesses on its own. Since the L3 cache is essentially a victim cache, meaning that it is filled with the information that isn't able to fit onto the chips' L1 or L2 cache levels, this would mean that each CCX can only access up to 8 MB of L3 cache if any given workload uses no more than 4 cores from a given CCX. However, even if we were to distribute workload in-between two different cores from each CCX, so as to be able to access the entirety of the 1800X's 16 MB cache... we'd still be somewhat constrained by the inter-CCX bandwidth achieved by AMD's Data Fabric interconnect... 22 GB/s, which is much lower than the L3 cache's 175 GB/s - and even lower than RAM bandwidth. That the Data Fabric interconnect also has to carry data from AMD's IO Hub PCIe lanes also potentially interferes with the (already meagre) available bandwidth

AMD's Zen architecture is surely an interesting beast, and these kinds of results really go to show the amount of work, of give-and-take design that AMD had to go through in order to achieve a cost-effective, scalable, and at the same time performant architecture through its CCX modules. However, this kind of behavior may even go so far as to give us some answers with regards to Ryzen's lower than expected gaming performance, since games are well-known to be sensitive to a processor's cache performance profile.

View at TechPowerUp Main Site
 
Last edited:
Joined
Sep 26, 2012
Messages
871 (0.20/day)
Location
Australia
System Name ATHENA
Processor AMD 7950X
Motherboard ASUS Crosshair X670E Extreme
Cooling ASUS ROG Ryujin III 360, 13 x Lian Li P28
Memory 2x32GB Trident Z RGB 6000Mhz CL30
Video Card(s) ASUS 4090 STRIX
Storage 3 x Kingston Fury 4TB, 4 x Samsung 870 QVO
Display(s) Acer X38S, Wacom Cintiq Pro 15
Case Lian Li O11 Dynamic EVO
Audio Device(s) Topping DX9, Fluid FPX7 Fader Pro, Beyerdynamic T1 G2, Beyerdynamic MMX300
Power Supply Seasonic PRIME TX-1600
Mouse Xtrfy MZ1 - Zy' Rail, Logitech MX Vertical, Logitech MX Master 3
Keyboard Logitech G915 TKL
VR HMD Oculus Quest 2
Software Windows 11 + Universal Blue
One does wonder if the 4 core parts will suffer the same fate since it will be one straight core complex.
 
Joined
Jul 9, 2015
Messages
3,413 (1.00/day)
System Name M3401 notebook
Processor 5600H
Motherboard NA
Memory 16GB
Video Card(s) 3050
Storage 500GB SSD
Display(s) 14" OLED screen of the laptop
Software Windows 10
Benchmark Scores 3050 scores good 15-20% lower than average, despite ASUS's claims that it has uber cooling.
with latency being almost 3x greater than on Intel's 6900K.
Huh?
69.3 vs 98 is... 3 times?

PS
Are they testing "Core from the left quad accessing L3 of the right quad" scenario? (CCX in the title hints at that, but nothing in the chaotic text of OP talks about it.
 
Joined
Feb 3, 2017
Messages
3,747 (1.31/day)
Processor Ryzen 7800X3D
Motherboard ROG STRIX B650E-F GAMING WIFI
Memory 2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s) INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage 2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s) 42" LG C2 OLED, 27" ASUS PG279Q
Case Thermaltake Core P5
Power Supply Fractal Design Ion+ Platinum 760W
Mouse Corsair Dark Core RGB Pro SE
Keyboard Corsair K100 RGB
VR HMD HTC Vive Cosmos
hasn't amd repeatedly said that aida64 does not know how to properly test ryzen cache?
 
Joined
Jan 8, 2017
Messages
568 (0.20/day)
System Name ACME Singularity Unit
Processor Coal-dual 9000
Motherboard Oak Plank
Cooling 4 Snow Yetis huffing and puffing in parallel
Memory Hasty Indian (I/O: 3 smoke signals per minute)
Video Card(s) Bob Ross AI module
Storage Stone Tablet 2.0
Display(s) Where are my glasses?
Case Hand sewn bull hide
Audio Device(s) On demand tribe singing
Power Supply Spin-o-Wheel-matic
Mouse Hamster original
Keyboard Chisel 1.9a (upgraded for Stone Tablet 2.0 compatibility)
Software It's all hard down here
Dumb question! What is this QC/DC next to the broadwell? :)
 
Joined
Apr 30, 2012
Messages
3,881 (0.85/day)
hasn't amd repeatedly said that aida64 does not know how to properly test ryzen cache?

AIDA64 tweeted
AIDA64 said:
AMD hadn't sent us a Ryzen before launch. As soon as we can get one, we will fix the L2+L3 benchmarks

Kind of hard to have a working AIDA64 for Ryzen when the company Tweets it cant fix it until they get a Ryzen chip the same day that article is published.
 
Last edited:

the54thvoid

Super Intoxicated Moderator
Staff member
Joined
Dec 14, 2009
Messages
13,048 (2.39/day)
Location
Glasgow - home of formal profanity
Processor Ryzen 7800X3D
Motherboard MSI MAG Mortar B650 (wifi)
Cooling be quiet! Dark Rock Pro 4
Memory 32GB Kingston Fury
Video Card(s) Gainward RTX4070ti
Storage Seagate FireCuda 530 M.2 1TB / Samsumg 960 Pro M.2 512Gb
Display(s) LG 32" 165Hz 1440p GSYNC
Case Asus Prime AP201
Audio Device(s) On Board
Power Supply be quiet! Pure POwer M12 850w Gold (ATX3.0)
Software W10
So...... Is this AMD's equivalent to Nvidia not doing Async? And can software coding help address this?
 
Joined
Jan 8, 2017
Messages
568 (0.20/day)
System Name ACME Singularity Unit
Processor Coal-dual 9000
Motherboard Oak Plank
Cooling 4 Snow Yetis huffing and puffing in parallel
Memory Hasty Indian (I/O: 3 smoke signals per minute)
Video Card(s) Bob Ross AI module
Storage Stone Tablet 2.0
Display(s) Where are my glasses?
Case Hand sewn bull hide
Audio Device(s) On demand tribe singing
Power Supply Spin-o-Wheel-matic
Mouse Hamster original
Keyboard Chisel 1.9a (upgraded for Stone Tablet 2.0 compatibility)
Software It's all hard down here
Quad vs Dual channel, the first tests results are of memory or simply RAM.

O.K., so it was a dumb question. Can be smart like that, that's me. Thanks for replying :)
 
Joined
Sep 26, 2012
Messages
871 (0.20/day)
Location
Australia
System Name ATHENA
Processor AMD 7950X
Motherboard ASUS Crosshair X670E Extreme
Cooling ASUS ROG Ryujin III 360, 13 x Lian Li P28
Memory 2x32GB Trident Z RGB 6000Mhz CL30
Video Card(s) ASUS 4090 STRIX
Storage 3 x Kingston Fury 4TB, 4 x Samsung 870 QVO
Display(s) Acer X38S, Wacom Cintiq Pro 15
Case Lian Li O11 Dynamic EVO
Audio Device(s) Topping DX9, Fluid FPX7 Fader Pro, Beyerdynamic T1 G2, Beyerdynamic MMX300
Power Supply Seasonic PRIME TX-1600
Mouse Xtrfy MZ1 - Zy' Rail, Logitech MX Vertical, Logitech MX Master 3
Keyboard Logitech G915 TKL
VR HMD Oculus Quest 2
Software Windows 11 + Universal Blue
So...... Is this AMD's equivalent to Nvidia not doing Async? And can software coding help address this?

I think I would want to see some true benchmarks on this first before I drew conclusions. However if I had to, a more aware scheduler could stop or at least reduce those painfully slow interfabric cache calls. But yes, much like Nvidia's async problem, ultimately I think its an architectural limitation.
 

the54thvoid

Super Intoxicated Moderator
Staff member
Joined
Dec 14, 2009
Messages
13,048 (2.39/day)
Location
Glasgow - home of formal profanity
Processor Ryzen 7800X3D
Motherboard MSI MAG Mortar B650 (wifi)
Cooling be quiet! Dark Rock Pro 4
Memory 32GB Kingston Fury
Video Card(s) Gainward RTX4070ti
Storage Seagate FireCuda 530 M.2 1TB / Samsumg 960 Pro M.2 512Gb
Display(s) LG 32" 165Hz 1440p GSYNC
Case Asus Prime AP201
Audio Device(s) On Board
Power Supply be quiet! Pure POwer M12 850w Gold (ATX3.0)
Software W10
I think I would want to see some true benchmarks on this first before I drew conclusions. However if I had to, a more aware scheduler could stop or at least reduce those painfully slow interfabric cache calls. But yes, much like Nvidia's async problem, ultimately I think its an architectural limitation.

I thought so it can be addressed though. Nvidia have an asynchronous warp schedulers, it's just more restrictive than GCN's implementation of it. But where coded properly, it shouldn't cause too much detriment.
I think caching could surely be coded 'sympathetically' to the Ryzen architecture. Then again, I know nothing about coding and I am probably talking out my ass.
 
Last edited:
Joined
Apr 26, 2008
Messages
232 (0.04/day)
System Name 3950X Workstation
Processor AMD Ryzen 9 3950X
Motherboard ASUS Crosshair VIII Impact
Cooling Cryorig C1 with Noctua NF-A12x15
Memory G.Skill F4-3600C16D-32GTZNC
Video Card(s) ASUS GTX 1650 LP OC
Storage 2 x Corsair MP510 1920GB M.2 SSD
Case Realan E-i7
Power Supply G-Unique 400W
Software Win 10 Pro
Benchmark Scores https://smallformfactor.net/forum/threads/the-saga-of-the-little-gem-continues.12877/
All this makes it even more impressive the current Ryzen performance. I mean, it's a chip with basically a handicapped cache/memory implementation but it still trades blows with Intel chips clock-to-clock. This actually makes me think that the real Ryzen IPC (how it handles the instructions) is significantly better than Intel's.

At the end, this is good news for AMD: they have a clear improvement path --> Lower those L3 and system memory latency figures!

It's clear that the CCX design relies on the interconnect bandwidth, so AMD has two paths going forward: 1) either find a way to increase that bandwidth for a truly scalable architecture, or 2) go Intel's route and design a chip that uses a larger CCX (with 16 cores), or 3) Do both.

It seems to me AMD should really do both if they want to also become a player in the server market again. 32-core (2 x CCX), 4-chip configurations with up to 128 cores/system is not too much to ask in the server business...

Or (totally fantasizing now, or am I?), they could truly innovate and ditch the multi-chip system designs but rather build up on the scalability idea to come up with 16-core CCX's that can do up to 8-way (on-chip) interconnects, yielding a full chip with 128 cores. Think about the implications for business clients: a single 128-core chip on a small board, meaning much-easier-to-deal-with systems with much lower power utilization (4 chips on a huge board means huge power overhead). Then, similar to what they do in GPUs, they can trim it down to create a product line-up. I have a feeling this is AMD's way (vision), but it's a goal that's a long way off at the moment...
 
Joined
Apr 12, 2013
Messages
7,527 (1.77/day)
Anyone with a Ryzen willing to test this out ~ change the affinity of AIDA64 to first four cores plus SMT (just select CPU affinity from 0 to 7) using process hacker or process explorer. Just a quick glance at these results might give us some answers.
 
Joined
Aug 8, 2015
Messages
114 (0.03/day)
Location
Finland
System Name Gaming rig
Processor AMD Ryzen 7 5900X
Motherboard Asus X570-Plus TUF /w "passive" chipset mod
Cooling Noctua NH-D15S
Memory Crucial Ballistix Sport LT 2x16GB 3200C16 @3600C16
Video Card(s) MSI RTX 3060 TI Gaming X Trio
Storage Samsung 970 Pro 1TB, Crucial MX500 2TB, Samsung 860 QVO 4TB
Display(s) Samsung C32HG7x
Case Fractal Design Define R5
Audio Device(s) Asus Xonar Essence STX
Power Supply Corsair RM850i 850W
Mouse Logitech G502 Hero
Keyboard Logitech G710+
Software Windows 10 Pro
One does wonder if the 4 core parts will suffer the same fate since it will be one straight core complex.

With only one CCX unit 4 core cpus shouldn't have the same problem.
 

asH9

New Member
Joined
Mar 6, 2017
Messages
1 (0.00/day)
OK, Sooooo Why do HEDT professional programs/benchmarks (Blender...) that are 'Numa aware' (hint hint) run just as well on RyZen as they do on 6900, but gaming benchmarks between the 2 are different (cough HT proprietary cough) ???
 
Joined
Sep 22, 2012
Messages
1,010 (0.23/day)
Location
Belgrade, Serbia
System Name Intel® X99 Wellsburg
Processor Intel® Core™ i7-5820K - 4.5GHz
Motherboard ASUS Rampage V E10 (1801)
Cooling EK RGB Monoblock + EK XRES D5 Revo Glass PWM
Memory CMD16GX4M4A2666C15
Video Card(s) ASUS GTX1080Ti Poseidon
Storage Samsung 970 EVO PLUS 1TB /850 EVO 1TB / WD Black 2TB
Display(s) Samsung P2450H
Case Lian Li PC-O11 WXC
Audio Device(s) CREATIVE Sound Blaster ZxR
Power Supply EVGA 1200 P2 Platinum
Mouse Logitech G900 / SS QCK
Keyboard Deck 87 Francium Pro
Software Windows 10 Pro x64
If Skylake-E and Kaby Lake-E samples are finished I don;t know how much Intel could change to improve his tragic position where his 1700$ worth CPU lost from 500$ AMD with 2 core less and much less power consumption, almost half.
Even if Intel catch AMD that would be with 8 and 10 cores processors and 150W power consumption.
Because of that upgrade on AMD is good choice at the moment.
Special if someone want small PC, mATX mobo, fanless 500W PSU and RX 580 + 1800X.

I don;t want to comment at all rumors about some strange lags, and some hidden problems of AMD.
Their CPU on paper shine, numbers are fantastic. If powerfull Intel fall so low that need to justify his presents with i7-7700K and
4.5GHz in games locked on 2 and 4 cores and on that way distract customers from AMD, than really no word. No one will help you except i7-7700K.
Everyone who sabotage real picture of AMD processor is enemy of enthusiasts and improvements and shoot in own legs.
Because AMD give you CPU capable to beat i7-6950X on LN2 for 500$, you can buy world recorder for 500$, with 2 core less, and far smaller power consumption.

In Windows 10 and DX12 people could get far better performance than Intel Broadwell-E. But Intel didn;t do nothing to provide that. We non stop listen about some walls and no space for improvements. No space to drain same architecture 5 years, everything what they done with X79 and X99 could fit in single socket, but there is space for new generations.
 

PiotrekDG

New Member
Joined
Feb 11, 2017
Messages
2 (0.00/day)
Hi, the memory latency is in "ns" (nano) =1/1000000000 second not "ms" 1/1000 second.

So much YES, that's a millionfold difference. See what difference 30 ns makes, now imagine a million times slower memory.
And it's not a typo, it appears 5 times in the text, while "ns" never appears.
 

C_Wiz

hardware.fr
Joined
Mar 6, 2017
Messages
7 (0.00/day)
Author of the article here, I know the language barrier doesn't make things easy but there are a few innacuracies here in this summary. Some quick points on what we found :

- Memory latency (not L3) is higher (and ns, not ms ;))
- L3 is split in half and communication between the two CCX is thru the same link that links the CCX to the memory controller, PCIe, etc, at a much lower speed.

Plus many other things regarding CCX etc. I don't know how good a job Google Translate does of our article but I'd suggest people interested give it a shot (page 22/23 maybe 24 [we found another issue with game performance that's linked to Windows 10] is what you're looking for).

To answer another question, yes, L3 readings are innacurate in Aida (that's why we show them in orange in the table). We do use another test (a beta benchmark from Aida, too) to check latency at different block sizes, that one is the basis of our analysis.

G.
 
Last edited:
Joined
Dec 31, 2009
Messages
19,371 (3.56/day)
Benchmark Scores Faster than yours... I'd bet on it. :)
I wonder if aida64 was updated... we were told directly from FinalWire not to use it for data until they updated it... AMD didn't send them ryzen pre launch...
 
Joined
Sep 2, 2011
Messages
1,019 (0.21/day)
Location
Porto
System Name No name / Purple Haze
Processor Phenom II 1100T @ 3.8Ghz / Pentium 4 3.4 EE Gallatin @ 3.825Ghz
Motherboard MSI 970 Gaming/ Abit IC7-MAX3
Cooling CM Hyper 212X / Scythe Andy Samurai Master (CPU) - Modded Ati Silencer 5 rev. 2 (GPU)
Memory 8GB GEIL GB38GB2133C10ADC + 8GB G.Skill F3-14900CL9-4GBXL / 2x1GB Crucial Ballistix Tracer PC4000
Video Card(s) Asus R9 Fury X Strix (4096 SP's/1050 Mhz)/ PowerColor X850XT PE @ (600/1230) AGP + (HD3850 AGP)
Storage Samsung 250 GB / WD Caviar 160GB
Display(s) Benq XL2411T
Audio Device(s) motherboard / Creative Sound Blaster X-Fi XtremeGamer Fatal1ty Pro + Front panel
Power Supply Tagan BZ 900W / Corsair HX620w
Mouse Zowie AM
Keyboard Qpad MK-50
Software Windows 7 Pro 64Bit / Windows XP
Benchmark Scores 64CU Fury: http://www.3dmark.com/fs/11269229 / X850XT PE http://www.3dmark.com/3dm05/5532432
Author of the article here, I know the language barrier doesn't make things easy but there are a few innacuracies here in this summary. Some quick points on what we found :

- Memory latency (not L3) is higher (and ns, not ms ;))
- L3 is split in half and communication between the two CCX is thru the same link that links the CCX to the memory controller, PCIe, etc, at a much lower speed.

Plus many other things regarding CCX etc. I don't know how good a job Google Translate does of our article but I'd suggest people interested give it a shot (page 22/23 maybe 24 [we found another issue with game performance that's linked to Windows 10] is what you're looking for).

To answer another question, yes, L3 readings are innacurate in Aida (that's why we show them in orange in the table). We do use another test (a beta benchmark from Aida, too) to check latency at different block sizes, that one is the basis of our analysis.

G.

Thank you for the clarifications!
 
Joined
Oct 2, 2004
Messages
13,791 (1.87/day)
Also be aware that Intel makes one of the best L caches. After all, they have the foundries and both teams working together. AMD doesn't have that luxury so slightly higher latency isn't something strange. And it's not even that horrible to be honest. If it was, then multi-threaded benchmarks would suffer horrendously once L3 gets thrashed by HT cache misses. But it doesn't.
 
Joined
Jul 5, 2013
Messages
27,725 (6.67/day)
AMD's Ryzen 7 lower than expected performance in some applications seems to stem from a particular problem: memory latency. Before AMD's Ryzen chips were even out, reports pegged AMD as having confirmed that most of the tweaks and programming for the new architecture had been done in order to improve core performance to its max - at the expense of memory compatibility and performance. Apparently, and until AMD's entire Ryzen line-up is completed with the upcoming Ryzen 5 and Ryzen 3 processors, the company will be hard at work on improving Ryzen's cache handling and memory latency.

Hardware.fr has done a pretty good job in exploring Ryzen's cache and memory subsystem deficiencies through the use of AIDA 64, in what would otherwise be an exceptional processor design. Namely, the fact that there seems to be some problem with Ryzen's L3 implementation, in that it produces latency results that are up to 30 ns higher than the average, at 90 ns, than the L3 latency found on Intel's i7 6900K or even AMD's FX 8350 (both with latency around 60 ns).





From some more testing results, we see that Intel's L1 cache is still leagues ahead from AMD's implementation; that AMD's L2 is overall faster than Intel's, though it does incur on average a roughly 2 ns latency penalty; and that AMD's L3 memory is very much behind Intel's offerings in all metrics but L3 cache copies, with latency being almost 50% greater than on Intel's 6900K.



The problem is revealed through an increasing work size. In the case of the 6900K, which has a 32 KB L1 cache, performance is greatest until that workload size; higher-sized workloads that don't fit on the L1 cache then "spill" towards the 6900K's 256 KB L2 cache; workloads higher than 256 KB and lower than 16 MB are then submitted to the 6900 K's 20 MB L3 cache, with any workloads higher than 16 MB in size then forcing the processor to access the main system memory, with increasing latency in access times until it reaches the RAM's ~70 ns access times.



However, on AMD's Ryzen 1800X, latency times are a wholly different beast. everything is fine in the L1 and L2 caches (32 KB and 512 KB, respectively). However, when moving towards the 1800X's 16 MB L3 cache, the behavior is completely different. Up to 4 MB cache utilization, we see an expected increase in latency; however, latency goes through the roof way before the chip's 16 MB of L3 cache is completely filled. This clearly derives from AMD's Ryzen modularity, with each CCX complex (made up of 4 cores and 8 MB L3 cache, besides all the other duplicated logic) being able to access only 8 MB of L3 cache at any point in time.



The difference in access speeds between 4 MB and 8 MB workloads can be explained through AMD's own admission that Ryzen's core design incurs in different access times depending on which parts of the L3 cache are access by the CCX. Since the L3 cache is essentially a victim cache, meaning that it is filled with the information that isn't able to fit onto the chips' L1 or L2 cache levels, this would mean that each CCX can only access up to 8 MB of L3 cache if any given workload uses no more than 4 cores from a given CCX. However, even if we were to distribute workload in-between two different cores from each CCX, so as to be able to access the entirety of the 1800X's 16 MB cache... we'd still be somewhat constrained by the inter-CCX bandwidth achieved by AMD's Data Fabric interconnect... 22 GB/s, which is much lower than the L3 cache's 175 GB/s - and even lower than RAM bandwidth.

AMD's Zen architecture is surely an interesting beast, and these kinds of results really go to show the amount of work, of give-and-take design that AMD had to go through in order to achieve a cost-effective, scalable, and at the same time performant architecture through its CCX modules. However, this kind of behavior may even go so far as to give us some answers with regards to Ryzen's lower than expected gaming performance, since games are well-known to be sensitive to a processor's cache performance profile.

Source: Hardware.fr
There were a few problems with this article. The use of "ms"(milliseconds) instead of "ns"(nanoseconds) was fairly glaring. CPU operating reaction speeds have not been measured in "ms" since the early 80's. There were also a few grammatical errors which have been fixed. You're welcome.
 
Last edited:
Joined
Mar 13, 2012
Messages
278 (0.06/day)
Hmmm, is this a permanent design flaw or is this fixable some how?
 
Joined
Apr 10, 2013
Messages
302 (0.07/day)
Location
Michigan, USA
Processor AMD 1700X
Motherboard Crosshair VI Hero
Memory F4-3200C14D-16GFX
Video Card(s) GTX 1070
Storage 960 Pro
Display(s) PG279Q
Case HAF X
Power Supply Silencer MK III 850
Mouse Logitech G700s
Keyboard Logitech G105
Software Windows 10
I had wondered when someone would start expanding on the memory latency issues. The 90+ns latency on these is like an old Core 2 / P35 from 2007. In the AIDA64 memory latency list you have to scroll down to find the poor 1800x... just below a P4 from 2004. :confused:
 
Top