Monday, March 6th 2017

AMD's Ryzen Cache Analyzed - Improvements; Improveable; CCX Compromises

Mar 6th, 2017 03:43 Discuss (120 Comments)

AMD's Ryzen 7 lower than expected performance in some applications seems to stem from a particular problem: memory. Before AMD's Ryzen chips were even out, reports pegged AMD as having confirmed that most of the tweaks and programming for the new architecture had been done in order to improve core performance to its max - at the expense of memory compatibility and performance. Apparently, and until AMD's entire Ryzen line-up is completed with the upcoming Ryzen 5 and Ryzen 3 processors, the company will be hard at work on improving Ryzen's cache handling and memory latency.

Hardware.fr has done a pretty good job in exploring Ryzen's cache and memory subsystem deficiencies through the use of AIDA 64, in what would otherwise be an exceptional processor design. Namely, the fact that there seems to be some problem with Ryzen's L3 cache and memory subsystem implementation. Paired with the same memory configuration and at the same 3 GHz clocks, for instance, Ryzen's memory tests show memory latency results that are up to 30 ns higher (at 90 ns) than the average latency found on Intel's i7 6900K or even AMD's FX 8350 (both at around 60 ns).

Update: The lack of information regarding the test system could have elicited some gray areas in the interpretation of the results. Hardware.fr tests, and below results, were obtained by setting the 8-core chips at 3 GHz, with SMT and HT deactivated. Memory for the Ryzen and Intel platforms was DDR4-2400 with 15-15-15-35 timings, and memory for the AMD FX platform was DDR3-1600 operating at 9-9-9-24 timings. Both memory configurations were set at 4x 4 GB, totaling 16 GB of memory.

From some more testing results, we see that Intel's L1 cache is still leagues ahead from AMD's implementation; that AMD's L2 is overall faster than Intel's, though it does incur on a roughly 2 ns latency penalty; and that AMD's L3 memory is very much behind Intel's in all metrics but L3 cache copies, with latency being almost 3x greater than on Intel's 6900K.

The problem is revealed through an increasing work size. In the case of the 6900K, which has a 32 KB L1 cache, performance is greatest until that workload size. Higher-sized workloads that don't fit on the L1 cache then "spill" towards the 6900K's 256 KB L2 cache; workloads higher than 256 KB and lower than 16 MB are then submitted to the 6900 K's 20 MB L3 cache, with any workloads larger than 16 MB then forcing the processor to access the main system memory, with increasing latency in access times until it reaches the RAM's ~70 ns access times.

However, on AMD's Ryzen 1800X, latency times are a wholly different beast. Everything is fine in the L1 and L2 caches (32 KB and 512 KB, respectively). However, when moving towards the 1800X's 16 MB L3 cache, the behavior is completely different. Up to 4 MB cache utilization, we see an expected increase in latency; however, latency goes through the roof way before the chip's 16 MB of L3 cache is completely filled. This clearly derives from AMD's Ryzen modularity, with each CCX complex (made up of 4 cores and 8 MB L3 cache, besides all the other duplicated logic) being able to access only 8 MB of L3 cache at any point in time.

The difference in access speeds between 4 MB and 8 MB workloads can be explained through AMD's own admission that Ryzen's core design incurs in different access times depending on which parts of the L3 cache are accessed by the CCX. The fact that this memory is "mostly exclusive" - which means that other information may be stored on it that's not of immediate use to the task at hand - can be responsible for some memory accesses on its own. Since the L3 cache is essentially a victim cache, meaning that it is filled with the information that isn't able to fit onto the chips' L1 or L2 cache levels, this would mean that each CCX can only access up to 8 MB of L3 cache if any given workload uses no more than 4 cores from a given CCX. However, even if we were to distribute workload in-between two different cores from each CCX, so as to be able to access the entirety of the 1800X's 16 MB cache... we'd still be somewhat constrained by the inter-CCX bandwidth achieved by AMD's Data Fabric interconnect... 22 GB/s, which is much lower than the L3 cache's 175 GB/s - and even lower than RAM bandwidth. That the Data Fabric interconnect also has to carry data from AMD's IO Hub PCIe lanes also potentially interferes with the (already meagre) available bandwidth

AMD's Zen architecture is surely an interesting beast, and these kinds of results really go to show the amount of work, of give-and-take design that AMD had to go through in order to achieve a cost-effective, scalable, and at the same time performant architecture through its CCX modules. However, this kind of behavior may even go so far as to give us some answers with regards to Ryzen's lower than expected gaming performance, since games are well-known to be sensitive to a processor's cache performance profile.

Source: Hardware.fr

Add your own comment

120 Comments on AMD's Ryzen Cache Analyzed - Improvements; Improveable; CCX Compromises

#76

mastrdrver

Wark0No, it's not the case with final bios :

forum.hardware.fr/hfr/Hardware/hfr/dossier-1800x-retour-sujet_1017196_20.htm#t10089095

When was final BIOS?

Because here's what Stilt originally got which is no where near what they got. Though this is from last Thursday and I've not been keeping up with how often BIOSes have been released.

Logical Processor to Cache Map:
*--------------- Data Cache 0, Level 1, 32 KB, Assoc 8, LineSize 64
*--------------- Instruction Cache 0, Level 1, 64 KB, Assoc 4, LineSize 64
*--------------- Unified Cache 0, Level 2, 512 KB, Assoc 8, LineSize 64
*--------------- Unified Cache 1, Level 3, 16 MB, Assoc 16, LineSize 64
-*-------------- Data Cache 1, Level 1, 32 KB, Assoc 8, LineSize 64
-*-------------- Instruction Cache 1, Level 1, 64 KB, Assoc 4, LineSize 64
-*-------------- Unified Cache 2, Level 2, 512 KB, Assoc 8, LineSize 64
-*-------------- Unified Cache 3, Level 3, 16 MB, Assoc 16, LineSize 64
--*------------- Data Cache 2, Level 1, 32 KB, Assoc 8, LineSize 64
--*------------- Instruction Cache 2, Level 1, 64 KB, Assoc 4, LineSize 64
--*------------- Unified Cache 4, Level 2, 512 KB, Assoc 8, LineSize 64
--*------------- Unified Cache 5, Level 3, 16 MB, Assoc 16, LineSize 64
---*------------ Data Cache 3, Level 1, 32 KB, Assoc 8, LineSize 64
---*------------ Instruction Cache 3, Level 1, 64 KB, Assoc 4, LineSize 64
---*------------ Unified Cache 6, Level 2, 512 KB, Assoc 8, LineSize 64
---*------------ Unified Cache 7, Level 3, 16 MB, Assoc 16, LineSize 64
----*----------- Data Cache 4, Level 1, 32 KB, Assoc 8, LineSize 64
----*----------- Instruction Cache 4, Level 1, 64 KB, Assoc 4, LineSize 64
----*----------- Unified Cache 8, Level 2, 512 KB, Assoc 8, LineSize 64
----*----------- Unified Cache 9, Level 3, 16 MB, Assoc 16, LineSize 64
-----*---------- Data Cache 5, Level 1, 32 KB, Assoc 8, LineSize 64
-----*---------- Instruction Cache 5, Level 1, 64 KB, Assoc 4, LineSize 64
-----*---------- Unified Cache 10, Level 2, 512 KB, Assoc 8, LineSize 64
-----*---------- Unified Cache 11, Level 3, 16 MB, Assoc 16, LineSize 64
------*--------- Data Cache 6, Level 1, 32 KB, Assoc 8, LineSize 64
------*--------- Instruction Cache 6, Level 1, 64 KB, Assoc 4, LineSize 64
------*--------- Unified Cache 12, Level 2, 512 KB, Assoc 8, LineSize 64
------*--------- Unified Cache 13, Level 3, 16 MB, Assoc 16, LineSize 64
-------*-------- Data Cache 7, Level 1, 32 KB, Assoc 8, LineSize 64
-------*-------- Instruction Cache 7, Level 1, 64 KB, Assoc 4, LineSize 64
-------*-------- Unified Cache 14, Level 2, 512 KB, Assoc 8, LineSize 64
-------*-------- Unified Cache 15, Level 3, 16 MB, Assoc 16, LineSize 64
--------*------- Data Cache 8, Level 1, 32 KB, Assoc 8, LineSize 64
--------*------- Instruction Cache 8, Level 1, 64 KB, Assoc 4, LineSize 64
--------*------- Unified Cache 16, Level 2, 512 KB, Assoc 8, LineSize 64
--------*------- Unified Cache 17, Level 3, 16 MB, Assoc 16, LineSize 64
---------*------ Data Cache 9, Level 1, 32 KB, Assoc 8, LineSize 64
---------*------ Instruction Cache 9, Level 1, 64 KB, Assoc 4, LineSize 64
---------*------ Unified Cache 18, Level 2, 512 KB, Assoc 8, LineSize 64
---------*------ Unified Cache 19, Level 3, 16 MB, Assoc 16, LineSize 64
----------*----- Data Cache 10, Level 1, 32 KB, Assoc 8, LineSize 64
----------*----- Instruction Cache 10, Level 1, 64 KB, Assoc 4, LineSize 64
----------*----- Unified Cache 20, Level 2, 512 KB, Assoc 8, LineSize 64
----------*----- Unified Cache 21, Level 3, 16 MB, Assoc 16, LineSize 64
-----------*---- Data Cache 11, Level 1, 32 KB, Assoc 8, LineSize 64
-----------*---- Instruction Cache 11, Level 1, 64 KB, Assoc 4, LineSize 64
-----------*---- Unified Cache 22, Level 2, 512 KB, Assoc 8, LineSize 64
-----------*---- Unified Cache 23, Level 3, 16 MB, Assoc 16, LineSize 64
------------*--- Data Cache 12, Level 1, 32 KB, Assoc 8, LineSize 64
------------*--- Instruction Cache 12, Level 1, 64 KB, Assoc 4, LineSize 64
------------*--- Unified Cache 24, Level 2, 512 KB, Assoc 8, LineSize 64
------------*--- Unified Cache 25, Level 3, 16 MB, Assoc 16, LineSize 64
-------------*-- Data Cache 13, Level 1, 32 KB, Assoc 8, LineSize 64
-------------*-- Instruction Cache 13, Level 1, 64 KB, Assoc 4, LineSize 64
-------------*-- Unified Cache 26, Level 2, 512 KB, Assoc 8, LineSize 64
-------------*-- Unified Cache 27, Level 3, 16 MB, Assoc 16, LineSize 64
--------------*- Data Cache 14, Level 1, 32 KB, Assoc 8, LineSize 64
--------------*- Instruction Cache 14, Level 1, 64 KB, Assoc 4, LineSize 64
--------------*- Unified Cache 28, Level 2, 512 KB, Assoc 8, LineSize 64
--------------*- Unified Cache 29, Level 3, 16 MB, Assoc 16, LineSize 64
---------------* Data Cache 15, Level 1, 32 KB, Assoc 8, LineSize 64
---------------* Instruction Cache 15, Level 1, 64 KB, Assoc 4, LineSize 64
---------------* Unified Cache 30, Level 2, 512 KB, Assoc 8, LineSize 64
---------------* Unified Cache 31, Level 3, 16 MB, Assoc 16, LineSize 64

#77

C_Wiz

hardware.fr

mastrdrverWhen was final BIOS?

The Asus BIOS (5704) is dated 23/02 (it wasn't available publicly then obviously, but same bios is available on Asus's website now, we checked checksums to confirm it's the same). This is the BIOS that includes the "final" (before launch) microcode update from AMD. Cache is shown correctly there as wark0 posted earlier in the thread (here : forum.hardware.fr/hfr/Hardware/hfr/dossier-1800x-retour-sujet_1017196_20.htm#t10089095 )

To be clear, 5704 was not the BIOS given to reviewers (I think 5702 ?) on the motherboards by AMD, you had to flash it yourself but that's pretty common with launchs and AMD gave many heads up on that.

#78

nemesis.ie

Just putting this out there as another data point for the RAM:

www.legitreviews.com/ddr4-memory-scaling-amd-am4-platform-best-memory-kit-amd-ryzen-cpus_192259

#79

Imsochobo

GringsIm seeing too many badly theorycrafted reasons for that bad gaming performance (that disabling smt fixes)

I can confirm that Windows 10 is a big issue and could cause 10FPS throughout most games.
SMT is also an issue but Windows 10 is worse than Windows 7 and Linux in terms of performance, most games have been benched in Windows 10.
I cannot find the same findings with a Xeon 2680 V2, it only drops a fps in windows 10 instead of 10 and in csgo 20 for me.

I can confirm something is iffy.

#80

C_Wiz

hardware.fr

ImsochoboI can confirm that Windows 10 is a big issue and could cause 10FPS throughout most games.
SMT is also an issue but Windows 10 is worse than Windows 7 and Linux in terms of performance, most games have been benched in Windows 10.
I cannot find the same findings with a Xeon 2680 V2, it only drops a fps in windows 10 instead of 10 and in csgo 20 for me.

I can confirm something is iffy.

Again, we confirmed that scheduler isn't configured the same way for Ryzen and Intel CPUs in Windows 10, which explains the discrepencies between SMT OFF and ON in games, check my link above.

#81

airfathaaaaa

the54thvoidSo...... Is this AMD's equivalent to Nvidia not doing Async? And can software coding help address this?

this is windows load balancing working like it id on nehalems and first gen skylakes

basicly windows treats ryzen as a massive 16 core cpu instead of 8c 16t
and that basicly creates all of the other problems that this cpu has because normally windows throw all of the heavy workloads into the physical cores and let the rest on the logical ones but here windows throw everything at everything resulting on the cpu to have to rely on "stealing" ram from the system ram because windows thinks it has a massive 138mb l3

and due to the nature of the smt some times when windows keeps a thread on the cpu (remember amd says that a ccx is a cpu not a core) the data on the l3 gets "lost" and thus windows re issues a new load to the said thread but the data is already on l3 thus resulting on the core parking bug because the cpu needs to pause the new workload to flush the identical one that is already on the l3

#82

Kanan

Tech Enthusiast & Gamer

airfathaaaaathis is windows load balancing working like it id on nehalems and first gen skylakes

basicly windows treats ryzen as a massive 16 core cpu instead of 8c 16t
and that basicly creates all of the other problems that this cpu has because normally windows throw all of the heavy workloads into the physical cores and let the rest on the logical ones but here windows throw everything at everything resulting on the cpu to have to rely on "stealing" ram from the system ram because windows thinks it has a massive 138mb l3

and due to the nature of the smt some times when windows keeps a thread on the cpu (remember amd says that a ccx is a cpu not a core) the data on the l3 gets "lost" and thus windows re issues a new load to the said thread but the data is already on l3 thus resulting on the core parking bug because the cpu needs to pause the new workload to flush the identical one that is already on the l3

How did you come to this conclusion?

#83

Gasaraki

CammOne does wonder if the 4 core parts will suffer the same fate since it will be one straight core complex.

The quad cores might be a beast!

#84

airfathaaaaa

KananHow did you come to this conclusion?

we already have the full picture of the problems

and we already have similiar problems in the past(identical to be honest ) its not really hard to connect the dots especially when we know that the smt taps into all the three caches

also a really good video to watch

#85

geon2k2If this is the case, why on earth didn't AMD just send an email to Microsoft to modify the scheduler in the way they wanted, just before the launch or even better, why they didn't release a driver. In the old days for Athlon X2 there was a driver called dual core optimizer.

Good question. Also even all of that makes a lot of sense might be just load of BS who knows. But like I've said to many times already, for the love of God please somebody disable smt and one of the CCX and bench games and compare. If the scores sucks the same, means thread/cache shuffle has nothing to do with it.

#86

akumod77I got it from Thailand (i think) tech website Zolkorn. Here's the link translate.google.com/translate?hl=en&sl=auto&tl=en&u=http://www.zolkorn.com/reviews/amd-ryzen-7-1800x-vs-intel-core-i7-7700k-mhz-by-mhz-core-by-core/4/

In their review they either the found the answer to the poor gaming performance of Ryzen or they are doing something very wrong with the 7700k lol. Because in their test ryzen is matching 7700k more or less. And that is not bad taking into consideration ryzen will dominate everything else.

#87

mastrdrver

C_WizThe Asus BIOS (5704) is dated 23/02 (it wasn't available publicly then obviously, but same bios is available on Asus's website now, we checked checksums to confirm it's the same). This is the BIOS that includes the "final" (before launch) microcode update from AMD. Cache is shown correctly there as wark0 posted earlier in the thread (here : forum.hardware.fr/hfr/Hardware/hfr/dossier-1800x-retour-sujet_1017196_20.htm#t10089095 )

To be clear, 5704 was not the BIOS given to reviewers (I think 5702 ?) on the motherboards by AMD, you had to flash it yourself but that's pretty common with launchs and AMD gave many heads up on that.

Ok I understand you now, thanks.

How much performance is left in Ryzen once things get tweaked out? 10%?

#88

jahramika

ImsochoboI can confirm that Windows 10 is a big issue and could cause 10FPS throughout most games.
SMT is also an issue but Windows 10 is worse than Windows 7 and Linux in terms of performance, most games have been benched in Windows 10.
I cannot find the same findings with a Xeon 2680 V2, it only drops a fps in windows 10 instead of 10 and in csgo 20 for me.

I can confirm something is iffy.

My windows 10 constantly keeps switching to Power Saving Mode setting.

#89

nemesis.ie

I highly recommend everyone to take a look at this:

Some vary salient points in there and also showing that Bulldozer seems to have overtaken 2500k in gaming over time ...

#90

EarthDog

Aging with this vid... I think its the third thread it was posted in already. :)

#91

TRWOV

It all depends on how you see things really. If you want absolutely the best gaming performance then stick with Intel. If you need a CPU with lots of threads that can also game pretty well then go for the R7s.

My main gaming rig still has a 3770k, haven't found a reason to upgrade yet. Same with my Steambox build, running a 4590. I'll replace all my crunchers with 1700s, that's a given.

#92

Shihab

Getting GTX970 feels for some reason. Also, Bulldozer "modularity."
Anyone tried to disable 4 cores (or 7, to be safe) in BIOS and see how it fares?

TRWOVIt all depends on how you see things really. If you want absolutely the best gaming performance then stick with Intel. If you need a CPU with lots of threads that can also game pretty well but doesn't cost an arm and a leg, then go for the R7s.

Fixed that for you.
Admittedly, first time I saw the Zen reviews I thought that it'd kill intel in anything outside games. Then I remembered that even industry standard productivity software can be -and often is- embarrassingly single threaded, so a combination of both (core count and per-core IPC) is often needed, and here Intel still reigns, as long as price isn't a factor.

nemesis.ieI highly recommend everyone to take a look at this:

Some vary salient points in there and also showing that Bulldozer seems to have overtaken 2500k in gaming over time ...

IMO, saying that that channel is merely an AMD apologist would be an understatment.
(And I have a nagging feeling that I've already said that somewhere around here....)

#93

mastrdrver

nemesis.ieI highly recommend everyone to take a look at this:

Some vary salient points in there and also showing that Bulldozer seems to have overtaken 2500k in gaming over time ...

I think this is a better video with this post about Nehalem compared to Yorkfield:

forum.beyond3d.com/threads/why-nehalem-for-games-is-not-better-than-yorkfield.44382/

#94

mcraygsx

Vlada011If Skylake-E and Kaby Lake-E samples are finished I don;t know how much Intel could change to improve his tragic position where his 1700$ worth CPU lost from 500$ AMD with 2 core less and much less power consumption, almost half.
Even if Intel catch AMD that would be with 8 and 10 cores processors and 150W power consumption.
Because of that upgrade on AMD is good choice at the moment.
Special if someone want small PC, mATX mobo, fanless 500W PSU and RX 580 + 1800X.

I don;t want to comment at all rumors about some strange lags, and some hidden problems of AMD.
Their CPU on paper shine, numbers are fantastic. If powerfull Intel fall so low that need to justify his presents with i7-7700K and
4.5GHz in games locked on 2 and 4 cores and on that way distract customers from AMD, than really no word. No one will help you except i7-7700K.
Everyone who sabotage real picture of AMD processor is enemy of enthusiasts and improvements and shoot in own legs.
Because AMD give you CPU capable to beat i7-6950X on LN2 for 500$, you can buy world recorder for 500$, with 2 core less, and far smaller power consumption.

In Windows 10 and DX12 people could get far better performance than Intel Broadwell-E. But Intel didn;t do nothing to provide that. We non stop listen about some walls and no space for improvements. No space to drain same architecture 5 years, everything what they done with X79 and X99 could fit in single socket, but there is space for new generations.

Well said, enthusiast should be thankful we have a real competition in high end CPU. This was my first AMD CPU and I am clearly impressed by 1800X and amazing by how much money I have thrown at Intel for better IPC. And AMD just gave me a 6900K/6850K equivalent for $499.

#95

FordGT90Concept

"I go fast!1!11!1!"

When one core accesses another core's memory, it behaves like an L4 instead of an L3. The extra latency at L4 is still much better than the system RAM so, I really don't see the cause for the fuss.

In fact, it does look like 1800X's dedicated L3 (~15 ns) is faster than i7-6900K (~17ns).

#96

TheLaughingMan

FordGT90ConceptWhen one core accesses another core's memory, it behaves like an L4 instead of an L3. The extra latency at L4 is still much better than the system RAM so, I really don't see the cause for the fuss.

In fact, it does look like 1800X's dedicated L3 (~15 ns) is faster than i7-6900K (~17ns).

I think people were bored and haven't needed to talk about AMD for like 5 years and it all came out at once. As the workstation CPU that it is, it is great and handling some gaming on the side pretty damn well. If we get 4 core /8 thread CPUs from AMD that still can't clock higher than 4.0 GHz and falling well behind 3000 series i5 CPUs, everyone can panic. Until then its bug squishing and industry adjustment time!

#97

TheGuruStud

TheLaughingManI think people were bored and haven't needed to talk about AMD for like 5 years and it all came out at once. As the workstation CPU that it is, it is great and handling some gaming on the side pretty damn well. If we get 4 core /8 thread CPUs from AMD that still can't clock higher than 4.0 GHz and falling well behind 3000 series i5 CPUs, everyone can panic. Until then its bug squishing and industry adjustment time!

You'll have to wait till the refresh for (hopefully) better clocks.

#98

TheLaughingMan

TheGuruStudYou'll have to wait till the refresh for (hopefully) better clocks.

I am going to wait to see Ryzen 5 in action. We have not concrete information about those chips and how they OC. Overclocking 8 cores is a different animal than overclocking 4 cores...historically at least. And I don't think the limit in OC is entirely the architecture, but we will find out.

#99

FordGT90Concept

"I go fast!1!11!1!"

TheLaughingManI think people were bored and haven't needed to talk about AMD for like 5 years and it all came out at once. As the workstation CPU that it is, it is great and handling some gaming on the side pretty damn well. If we get 4 core /8 thread CPUs from AMD that still can't clock higher than 4.0 GHz and falling well behind 3000 series i5 CPUs, everyone can panic. Until then its bug squishing and industry adjustment time!

CPU matters less and less with the rise of Vulkan and D3D12. I am utterly unconcerned about it.

Proof: www.techspot.com/review/1348-amd-ryzen-gaming-performance/

In games that are well multithreaded, Ryzen does fine. In games that aren't, it does well enough. The higher the resolution and detail, the more Ryzen closes the gap with Intel.

#100

TheGuruStud

TheLaughingManI am going to wait to see Ryzen 5 in action. We have not concrete information about those chips and how they OC. Overclocking 8 cores is a different animal than overclocking 4 cores...historically at least. And I don't think the limit in OC is entirely the architecture, but we will find out.

I assume the arch is fine. It's the LPP process that wasn't intended for such clocks.

Add your own comment

AMD's Ryzen Cache Analyzed - Improvements; Improveable; CCX Compromises

120 Comments on AMD's Ryzen Cache Analyzed - Improvements; Improveable; CCX Compromises

Latest GPU Drivers

New Forum Posts

Popular Reviews

Controversial News Posts

AMD's Ryzen Cache Analyzed - Improvements; Improveable; CCX Compromises

Related News

120 Comments on AMD's Ryzen Cache Analyzed - Improvements; Improveable; CCX Compromises

Latest GPU Drivers

New Forum Posts

Popular Reviews

Controversial News Posts