# AMD's Ryzen Cache Analyzed - Improvements; Improveable; CCX Compromises



## Raevenlord (Mar 6, 2017)

AMD's Ryzen 7 lower than expected performance in some applications seems to stem from a particular problem: memory. Before AMD's Ryzen chips were even out, reports pegged AMD as having confirmed that most of the tweaks and programming for the new architecture had been done in order to improve core performance to its max - at the expense of memory compatibility and performance. Apparently, and until AMD's entire Ryzen line-up is completed with the upcoming Ryzen 5 and Ryzen 3 processors, the company will be hard at work on improving Ryzen's cache handling and memory latency.

Hardware.fr has done a pretty good job in exploring Ryzen's cache and memory subsystem deficiencies through the use of AIDA 64, in what would otherwise be an exceptional processor design. Namely, the fact that there seems to be some problem with Ryzen's L3 cache and memory subsystem implementation. Paired with the same memory configuration and at the same 3 GHz clocks, for instance, Ryzen's memory tests show memory latency results that are up to 30 ns higher (at 90 ns) than the average latency found on Intel's i7 6900K or even AMD's FX 8350 (both at around 60 ns).



 



Update: The lack of information regarding the test system could have elicited some gray areas in the interpretation of the results. Hardware.fr tests, and below results, were obtained by setting the 8-core chips at 3 GHz, with SMT and HT deactivated. Memory for the Ryzen and Intel platforms was DDR4-2400 with 15-15-15-35 timings, and memory for the AMD FX platform was DDR3-1600 operating at 9-9-9-24 timings. Both memory configurations were set at 4x 4 GB, totaling 16 GB of memory.

From some more testing results, we see that Intel's L1 cache is still leagues ahead from AMD's implementation; that AMD's L2 is overall faster than Intel's, though it does incur on a roughly 2 ns latency penalty; and that AMD's L3 memory is very much behind Intel's in all metrics but L3 cache copies, with latency being almost 3x greater than on Intel's 6900K.



 

 

 

The problem is revealed through an increasing work size. In the case of the 6900K, which has a 32 KB L1 cache, performance is greatest until that workload size. Higher-sized workloads that don't fit on the L1 cache then "spill" towards the 6900K's 256 KB L2 cache; workloads higher than 256 KB and lower than 16 MB are then submitted to the 6900 K's 20 MB L3 cache, with any workloads larger than 16 MB then forcing the processor to access the main system memory, with increasing latency in access times until it reaches the RAM's ~70 ns access times.



 

However, on AMD's Ryzen 1800X, latency times are a wholly different beast. Everything is fine in the L1 and L2 caches (32 KB and 512 KB, respectively). However, when moving towards the 1800X's 16 MB L3 cache, the behavior is completely different. Up to 4 MB cache utilization, we see an expected increase in latency; however, latency goes through the roof way before the chip's 16 MB of L3 cache is completely filled. This clearly derives from AMD's Ryzen modularity, with each CCX complex (made up of 4 cores and 8 MB L3 cache, besides all the other duplicated logic) being able to access only 8 MB of L3 cache at any point in time.



 

The difference in access speeds between 4 MB and 8 MB workloads can be explained through AMD's own admission that Ryzen's core design incurs in different access times depending on which parts of the L3 cache are accessed by the CCX. The fact that this memory is "mostly exclusive" - which means that other information may be stored on it that's not of immediate use to the task at hand - can be responsible for some memory accesses on its own. Since the L3 cache is essentially a victim cache, meaning that it is filled with the information that isn't able to fit onto the chips' L1 or L2 cache levels, this would mean that each CCX can only access up to 8 MB of L3 cache if any given workload uses no more than 4 cores from a given CCX. However, even if we were to distribute workload in-between two different cores from each CCX, so as to be able to access the entirety of the 1800X's 16 MB cache... we'd still be somewhat constrained by the inter-CCX bandwidth achieved by AMD's Data Fabric interconnect... 22 GB/s, which is much lower than the L3 cache's 175 GB/s - and even lower than RAM bandwidth. That the Data Fabric interconnect also has to carry data from AMD's IO Hub PCIe lanes also potentially interferes with the (already meagre) available bandwidth

AMD's Zen architecture is surely an interesting beast, and these kinds of results really go to show the amount of work, of give-and-take design that AMD had to go through in order to achieve a cost-effective, scalable, and at the same time performant architecture through its CCX modules. However, this kind of behavior may even go so far as to give us some answers with regards to Ryzen's lower than expected gaming performance, since games are well-known to be sensitive to a processor's cache performance profile.

*View at TechPowerUp Main Site*


----------



## Camm (Mar 6, 2017)

One does wonder if the 4 core parts will suffer the same fate since it will be one straight core complex.


----------



## medi01 (Mar 6, 2017)

Raevenlord said:


> with latency being almost 3x greater than on Intel's 6900K.


Huh?
69.3 vs 98 is... 3 times?

PS
Are they testing "Core from the left quad accessing L3 of the right quad" scenario? (CCX in the title hints at that, but nothing in the chaotic text of OP talks about it.


----------



## londiste (Mar 6, 2017)

hasn't amd repeatedly said that aida64 does not know how to properly test ryzen cache?


----------



## Aenra (Mar 6, 2017)

Dumb question! What is this QC/DC next to the broadwell?


----------



## R0H1T (Mar 6, 2017)

Aenra said:


> Dumb question! What is this QC/DC next to the broadwell?


Quad vs Dual channel, the first tests results are of memory or simply RAM.


----------



## Xzibit (Mar 6, 2017)

londiste said:


> hasn't amd repeatedly said that aida64 does not know how to properly test ryzen cache?



*AIDA64 tweeted*


			
				AIDA64 said:
			
		

> AMD hadn't sent us a Ryzen before launch. As soon as we can get one, *we will fix the L2+L3 benchmarks*



Kind of hard to have a working AIDA64 for Ryzen when the company Tweets it cant fix it until they get a Ryzen chip the same day that article is published.


----------



## the54thvoid (Mar 6, 2017)

So...... Is this AMD's equivalent to Nvidia not doing Async? And can software coding help address this?


----------



## Aenra (Mar 6, 2017)

R0H1T said:


> Quad vs Dual channel, the first tests results are of memory or simply RAM.



O.K., so it was a dumb question. Can be smart like that, that's me. Thanks for replying


----------



## Camm (Mar 6, 2017)

the54thvoid said:


> So...... Is this AMD's equivalent to Nvidia not doing Async? And can software coding help address this?



I think I would want to see some true benchmarks on this first before I drew conclusions. However if I had to, a more aware scheduler could stop or at least reduce those painfully slow interfabric cache calls. But yes, much like Nvidia's async problem, ultimately I think its an architectural limitation.


----------



## the54thvoid (Mar 6, 2017)

Camm said:


> I think I would want to see some true benchmarks on this first before I drew conclusions. However if I had to, a more aware scheduler could stop or at least reduce those painfully slow interfabric cache calls. But yes, much like Nvidia's async problem, ultimately I think its an architectural limitation.



I thought so it can be addressed though. Nvidia have an asynchronous warp schedulers, it's just more restrictive than GCN's implementation of it. But where coded properly, it shouldn't cause too much detriment.
I think caching could surely be coded 'sympathetically' to the Ryzen architecture. Then again, I know nothing about coding and I am probably talking out my ass.


----------



## theGryphon (Mar 6, 2017)

All this makes it even more impressive the current Ryzen performance. I mean, it's a chip with basically a handicapped cache/memory implementation but it still trades blows with Intel chips clock-to-clock. This actually makes me think that the real Ryzen IPC (how it handles the instructions) is significantly better than Intel's. 

At the end, this is good news for AMD: they have a clear improvement path --> Lower those L3 and system memory latency figures!

It's clear that the CCX design relies on the interconnect bandwidth, so AMD has two paths going forward: 1) either find a way to increase that bandwidth for a truly scalable architecture, or 2) go Intel's route and design a chip that uses a larger CCX (with 16 cores), or 3) Do both.

It seems to me AMD should really do both if they want to also become a player in the server market again. 32-core (2 x CCX),  4-chip configurations with up to 128 cores/system is not too much to ask in the server business... 

Or (totally fantasizing now, or am I?), they could truly innovate and ditch the multi-chip system designs but rather build up on the scalability idea to come up with 16-core CCX's that can do up to 8-way (on-chip) interconnects, yielding a full chip with 128 cores. Think about the implications for business clients: a single 128-core chip on a small board, meaning much-easier-to-deal-with systems with much lower power utilization (4 chips on a huge board means huge power overhead). Then, similar to what they do in GPUs, they can trim it down to create a product line-up. I have a feeling this is AMD's way (vision), but it's a goal that's a long way off at the moment...


----------



## R0H1T (Mar 6, 2017)

Anyone with a Ryzen willing to test this out ~ change the *affinity* of *AIDA64* to first four cores plus SMT (just select CPU affinity from 0 to 7) using process hacker or process explorer. Just a quick glance at these results might give us some answers.


----------



## Deeveo (Mar 6, 2017)

Camm said:


> One does wonder if the 4 core parts will suffer the same fate since it will be one straight core complex.



With only one CCX unit 4 core cpus shouldn't have the same problem.


----------



## asH9 (Mar 6, 2017)

OK, Sooooo Why do HEDT professional programs/benchmarks (Blender...) that are _*'Numa aware'*_ *(hint hint)* run just as well on RyZen as they do on 6900, but gaming benchmarks between the 2 are different (*cough* *HT* *proprietary cough)* ???


----------



## niboar (Mar 6, 2017)

Hi, the memory latency is in "ns" (nano) =1/1000000000 second not "ms" 1/1000 second.


----------



## Vlada011 (Mar 6, 2017)

If Skylake-E and Kaby Lake-E samples are finished I don;t know how much Intel could change to improve his tragic position where his 1700$ worth CPU lost from 500$ AMD with 2 core less and much less power consumption, almost half.
Even if Intel catch AMD that would be with 8 and 10 cores processors and 150W power consumption.
Because of that upgrade on AMD is good choice at the moment.
Special if someone want small PC, mATX mobo, fanless 500W PSU and RX 580 + 1800X.

I don;t want to comment at all rumors about some strange lags, and some hidden problems of AMD.
Their CPU on paper shine, numbers are fantastic. If powerfull Intel fall so low that need to justify his presents with i7-7700K and
4.5GHz in games locked on 2 and 4 cores and on that way distract customers from AMD, than really no word. No one will help you except i7-7700K.
Everyone who sabotage real picture of AMD processor is enemy of enthusiasts and improvements and shoot in own legs.
Because AMD give you CPU capable to beat i7-6950X on LN2 for 500$, you can buy world recorder for 500$, with 2 core less, and far smaller power consumption.

In Windows 10 and DX12 people could get far better performance than Intel Broadwell-E. But Intel didn;t do nothing to provide that. We non stop listen about some walls and no space for improvements. No space to drain same architecture 5 years, everything what they done with X79 and X99 could fit in single socket, but there is space for new generations.


----------



## PiotrekDG (Mar 6, 2017)

niboar said:


> Hi, the memory latency is in "ns" (nano) =1/1000000000 second not "ms" 1/1000 second.



So much YES, that's a millionfold difference. See what difference 30 ns makes, now imagine a million times slower memory.
And it's not a typo, it appears 5 times in the text, while "ns" never appears.


----------



## C_Wiz (Mar 6, 2017)

Author of the article here, I know the language barrier doesn't make things easy but there are a few innacuracies here in this summary. Some quick points on what we found :

- Memory latency (not L3) is higher (and ns, not ms )
- L3 is split in half and communication between the two CCX is thru the same link that links the CCX to the memory controller, PCIe, etc, at a much lower speed.

Plus many other things regarding CCX etc. I don't know how good a job Google Translate does of our article but I'd suggest people interested give it a shot (page 22/23 maybe 24  [we found another issue with game performance that's linked to Windows 10] is what you're looking for).

To answer another question, yes, L3 readings are innacurate in Aida (that's why we show them in orange in the table). We do use another test (a beta benchmark from Aida, too) to check latency at different block sizes, that one is the basis of our analysis.

G.


----------



## EarthDog (Mar 6, 2017)

I wonder if aida64 was updated... we were told directly from FinalWire not to use it for data until they updated it... AMD didn't send them ryzen pre launch...


----------



## uuuaaaaaa (Mar 6, 2017)

C_Wiz said:


> Author of the article here, I know the language barrier doesn't make things easy but there are a few innacuracies here in this summary. Some quick points on what we found :
> 
> - Memory latency (not L3) is higher (and ns, not ms )
> - L3 is split in half and communication between the two CCX is thru the same link that links the CCX to the memory controller, PCIe, etc, at a much lower speed.
> ...



Thank you for the clarifications!


----------



## RejZoR (Mar 6, 2017)

Also be aware that Intel makes one of the best L caches. After all, they have the foundries and both teams working together. AMD doesn't have that luxury so slightly higher latency isn't something strange. And it's not even that horrible to be honest. If it was, then multi-threaded benchmarks would suffer horrendously once L3 gets thrashed by HT cache misses. But it doesn't.


----------



## lexluthermiester (Mar 6, 2017)

Raevenlord said:


> AMD's Ryzen 7 lower than expected performance in some applications seems to stem from a particular problem: memory latency. Before AMD's Ryzen chips were even out, reports pegged AMD as having confirmed that most of the tweaks and programming for the new architecture had been done in order to improve core performance to its max - at the expense of memory compatibility and performance. Apparently, and until AMD's entire Ryzen line-up is completed with the upcoming Ryzen 5 and Ryzen 3 processors, the company will be hard at work on improving Ryzen's cache handling and memory latency.
> 
> Hardware.fr has done a pretty good job in exploring Ryzen's cache and memory subsystem deficiencies through the use of AIDA 64, in what would otherwise be an exceptional processor design. Namely, the fact that there seems to be some problem with Ryzen's L3 implementation, in that it produces latency results that are up to 30 ns higher than the average, at 90 ns, than the L3 latency found on Intel's i7 6900K or even AMD's FX 8350 (both with latency around 60 ns).
> 
> ...


There were a few problems with this article. The use of "ms"(milliseconds) instead of "ns"(nanoseconds) was fairly glaring. CPU operating reaction speeds have not been measured in "ms" since the early 80's. There were also a few grammatical errors which have been fixed. You're welcome.


----------



## fynxer (Mar 6, 2017)

Hmmm, is this a permanent design flaw or is this fixable some how?


----------



## ssdpro (Mar 6, 2017)

I had wondered when someone would start expanding on the memory latency issues.  The 90+ns latency on these is like an old Core 2 / P35 from 2007.  In the AIDA64 memory latency list you have to scroll down to find the poor 1800x... just below a P4 from 2004.


----------



## medi01 (Mar 6, 2017)

fynxer said:


> Hmmm, is this a permanent design flaw or is this fixable some how?



Could you specify what "this" actually is?


----------



## ssdpro (Mar 6, 2017)

EarthDog said:


> I wonder if aida64 was updated... we were told directly from FinalWire not to use it for data until they updated it... AMD didn't send them ryzen pre launch...



See: https://forums.aida64.com/topic/3768-aida64-compatibility-with-amd-ryzen-processors/



> 3) L1 cache bandwidth and latency scores, as well as memory bandwidth and latency scores are already accurately measured.



1800x sits right between a Celeron J1900 (2013) and a Opteron 2378 (2008).


----------



## XiGMAKiD (Mar 6, 2017)

medi01 said:


> Huh?
> 69.3 vs 98 is... 3 times?
> 
> PS
> Are they testing "Core from the left quad accessing L3 of the right quad" scenario? (CCX in the title hints at that, but nothing in the chaotic text of OP talks about it.



You're looking at the wrong table, that's system memory latency. What OP means is L3 cache latency, 17.3 vs 46.6


----------



## Assimilator (Mar 6, 2017)

C_Wiz said:


> L3 is split in half and communication between the two CCX is thru the same link that links the CCX to the memory controller, PCIe, etc, at a much lower speed.



Interesting - this should mean the 4c/8t Ryzen parts won't suffer from this penalty, so their performance should be correspondingly better.


----------



## lexluthermiester (Mar 6, 2017)

ssdpro said:


> 1800x sits right between a Celeron J1900 (2013) and a Opteron 2378 (2008).



Citation please.


----------



## ssdpro (Mar 6, 2017)

lexluthermiester said:


> Citation please.


Download AIDA64's most recent beta and run the latency benchmark for yourself.  If you do not have a 1800X or an AIDA64 license you can take the 98ns figure cited in this article, download the trial of AIDA64 (beta), and view the data yourself.


----------



## C_Wiz (Mar 6, 2017)

ssdpro said:


> Download AIDA64's most recent beta and run the latency benchmark for yourself.  If you do not have a 1800X or an AIDA64 license you can take the 98ns figure cited in this article, download the trial of AIDA64 (beta), and view the data yourself.


Nope, I'm sorry but this summary of our article is missing so many facts/have so many innacuracies that it's very misleading at this point. I hope it will be fixed soon. 

Again :
- You can't compare L3 values (especially L3 latency), they are wrong (in orange, for a reason)
- FYI, the table that they took from our article of RAM latency is done at 3 GHz with SMT and HT off. Real RAM latency @ stock is around 89.6 with DDR4-2400. That's still much higher than other CPUs with same RAM, but you can't compare a 3 GHz value to other CPUs @ stock. 

Hopefully this news will get fixed, please check the original article with Google Translate if you want more details.


----------



## ssdpro (Mar 6, 2017)

C_Wiz said:


> Nope, I'm sorry but this summary of our article is missing so many facts/have so many innacuracies that it's very misleading at this point. I hope it will be fixed soon.
> 
> Again :
> - You can't compare L3 values (especially L3 latency), they are wrong (in orange, for a reason)
> ...


I found the summary to be consistent with actual tests of the CPU with ram at 2666.  If you think Techpowerup's summary of your article made some manipulation of the data, I guess that is between you and them.  You can simply run AIDA64 tests and find similar results.  I actually found 92ns for memory latency.

Whether we are splitting hairs at the 98ns in this article, the 92ns, this recent 89.6ns you reference, what we have is some pretty bad latency comparing to AMD's other offerings or Intel products.  As a result of these findings, coupled with gaming performance, we have a stock that continues it's slide.


----------



## r9 (Mar 6, 2017)

Camm said:


> One does wonder if the 4 core parts will suffer the same fate since it will be one straight core complex.



More or less what I wanted to know yesterday.



r9 said:


> Can somebody disable 4 cores and SMT and do some game benchmarks. Just to get a glimpse from what to expect from the Ryzen 3 cpus.
> Also that would take out Windows scheduler optimization from the equation.
> The issue with scheduler not distinguishing between actual and SMT cores, assigning threads to SMT that are four time slower than actual cores.
> Moving threads between CCX and causing bottlenecking from split L3 cache and slow inter cache link.
> ...


----------



## Captain_Tom (Mar 6, 2017)

This greatly explains the gaming performance.   In other words Zen shouldn't perform worse (In IPC) than intel if either 1) the game only uses 4 threads, or 2) the game uses 8 or more threads.

Most modern games only really use 6 threads (While jumping to 8 when necessary) depending on the workload, and thus AMD loses in most games.


Makes me once again say that AMD should try to make a 4.5 - 5.0 GHz 4c/8t  Ryzen 7 chip for $275.  They need a version made specifically for high-FPS gamers.


----------



## C_Wiz (Mar 6, 2017)

ssdpro said:


> I found the summary to be consistent with actual tests of the CPU with ram at 2666.  If you think Techpowerup's summary of your article made some manipulation of the data, I guess that is between you and them.  You can simply run AIDA64 tests and find similar results.  I actually found 92ns for memory latency.


I'm saying there are many errors in the summary, such as quoting latency in milliseconds instead of nanoseconds, and a lot of context missing by quoting our tables for example without giving the actual configuration of said test. A lot can be put to barrier language and mistranslation by Google Translate. I'm simply trying to give readers here some more accurate information.

We alerted tpu this morning of the discrepancies, I have 0 doubt they will fix the summary


----------



## Captain_Tom (Mar 6, 2017)

theGryphon said:


> All this makes it even more impressive the current Ryzen performance. I mean, it's a chip with basically a handicapped cache/memory implementation but it still trades blows with Intel chips clock-to-clock. This actually makes me think that the real Ryzen IPC (how it handles the instructions) is significantly better than Intel's.
> 
> At the end, this is good news for AMD: they have a clear improvement path --> Lower those L3 and system memory latency figures!
> 
> ...




If I had to guess AMD will go the improved interconnect route.  It is just cheaper (And infinitely scale-able) to make a system of essentially taping multiple clusters together.

In fact I am pretty sure they plan to build up their Navi GPU's in the same way (Interconnected clusters) so that they can make some monster 400w single-gpu chips.


----------



## ssdpro (Mar 6, 2017)

C_Wiz said:


> I'm saying there are many errors in the summary, such as quoting latency in milliseconds instead of nanoseconds


Now that I did see.  I don't think TPU was doing that with malicious intent... I think that is more in the "brain fart" category on their part. 

I have visited your site and understand it would be more appropriate for TPU to outline the precise configuration to better represent the data.  I believe the conclusion remains the same - latency is higher than we would like. 

Just to make sure no one confuses anything (check my previous posts if necessary), I think this product is impressive and a remarkable value.  It fell a little below AMD's hype and our expectations but is a remarkable achievement for a company previously on the verge. Even as is, it has provided some competition for Intel and with some tuning may do some decent disruption.


----------



## TRWOV (Mar 6, 2017)

L3 performance has been AMD's achilles heel for quite some time, kind of surprised that they haven't corrected this yet. I suppose that a Windows patch to make it "Ryzen aware" will have to be developed (just as it was the case with P4's HT, Athlon 64, Core Duo, Bulldozer, etc., etc) in order to minimize the impact on real world performance.

Considering all the contains that AMD has decked against them (budget, marketshare, less workforce, etc., etc.) it's amazing what they managed to do. I for sure will replace all my crunchers with 1700s, that's a given. 

I'll keep my 4590 and 3770K for gaming tough. Maybe I'll replace them with 4 core R5s down the line but they still do their work just fine.


----------



## Steevo (Mar 6, 2017)

TRWOV said:


> L3 performance has been AMD's achilles heel for quite some time, kind of surprised that they haven't corrected this yet. I suppose that a Windows patch to make it "Ryzen aware" will have to be developed (just as it was the case with P4's HT, Athlon 64, Core Duo, Bulldozer, etc., etc) in order to minimize the impact on real world performance.
> 
> Considering all the contains that AMD has decked against them (budget, marketshare, less workforce, etc., etc.) it's amazing what they managed to do. I for sure will replace all my crunchers with 1700s, that's a given.




This, I called the memory issues when we kept seeing AMD test systems with 8 or 16GB of slower RAM only. The cache issues are a continuation of their plague that effected prior designs and held them back, but they seem to have overcome or at least masked the issues with over engineering in other parts of the chip, but the gaming results, and other very out of order operations will continue to show the cache weakness.


The only thing I am unsure about reading other reports is how well thread handling will improve the efficiency of the chip, it appears that the windows task scheduler is doing a poor job as its unaware of the nuances of the hardware, and may send threads to other CCX's and the huge increase in cache latency is what hurts the most, so keeping threads in the same CCX and or treating some threads as affinity bound should help the performance, the implied AI in this situation ( I haven't seen any definitive tests to show that program performance increases over runs) may be able to work as intended, or perhaps we are already seeing its effects in the already good but not great performance.


----------



## Dimi (Mar 6, 2017)

I am just going to wait for skylake-x and if its not affordable enough i'll go for a 6850K and OC it once they go back to under 500$. I'm thinking of using a few nvme drives so Ryzen with its 24pcie lanes does not offer what i'm looking for right now.

I've seen some benchmarks of the 1800x performing WORSE than a 7700K while streaming a game while doing other tasks.

They tried, i had hopes but i'm gonna give this one a pass.


----------



## r9 (Mar 6, 2017)

Captain_Tom said:


> This greatly explains the gaming performance.   In other words Zen shouldn't perform worse (In IPC) than intel if either 1) the game only uses 4 threads, or 2) the game uses 8 or more threads.
> 
> Most modern games only really use 6 threads (While jumping to 8 when necessary) depending on the workload, and thus AMD loses in most games.
> 
> ...



Missing the point there. It can be 2 threads and still bottleneck if the software tries to move the thread from  CCX0 to CCX1. 
Which is something that Games and OS do quite often to balance load among cores.
By doing that will have to move the data from CCX0 L3 Cache to CCX1 L3 Cache which will cause the bottleneck because of the ultra slow L3 interconnect.
The solution should be in sight, they just to make the Windows scheduler aware of the design and move thread only in the CCX that thread originates.
That way it eliminates moving data between L3 caches for both modules.

This hopefully can be confirmed benching a game that doesn't use more than 4 threads and disable SMT and one of the CCX on the Ryzen 7.
That eliminates all the above scenarios.


----------



## Joss (Mar 6, 2017)

r9 said:


> The solution should be in sight, they just to make the Windows scheduler aware of the design and move thread only in the CCX that thread originates.
> That way it eliminates moving data between L3 caches for both modules


yeap, that makes sense.
It would make the solution software only, exciting.


----------



## Raevenlord (Mar 6, 2017)

niboar said:


> Hi, the memory latency is in "ns" (nano) =1/1000000000 second not "ms" 1/1000 second.





PiotrekDG said:


> And it's not a typo, it appears 5 times in the text, while "ns" never appears.



It isn't a typo; I filed that under the recently created "laughable brain farts" category of my own posting analysis. Thank you for calling my attention to that =)



C_Wiz said:


> Author of the article here, I know the language barrier doesn't make things easy but there are a few innacuracies here in this summary. Some quick points on what we found :
> 
> - Memory latency (not L3) is higher (and ns, not ms )
> - L3 is split in half and communication between the two CCX is thru the same link that links the CCX to the memory controller, PCIe, etc, at a much lower speed.
> ...



Hello =) Thank you for taking the time to comment and try and improve understanding on some of these issues. The language barrier is certainly part of the problem. And congrats on such an in-depth look at what makes RYzen tick!

I'll take the time to read and pour through your comments and some of the questions pose to see if I can shed some light on some other things.



C_Wiz said:


> - You can't compare L3 values (especially L3 latency), they are wrong (in orange, for a reason)



I can compare them between your own results, which where all done with the same configuration between the 6900K and the 1800X, right? That's what I compare in the article.




C_Wiz said:


> I'm saying there are many errors in the summary, such as quoting latency in milliseconds instead of nanoseconds, and a lot of context missing by quoting our tables for example without giving the actual configuration of said test. A lot can be put to barrier language and mistranslation by Google Translate. I'm simply trying to give readers here some more accurate information.
> 
> We alerted tpu this morning of the discrepancies, I have 0 doubt they will fix the summary



Latency in milliseconds or microseconds doesn't really change anything: the discrepancy remains the same, and the units of measurement remained constant. It's a "brain-farted" technicality, which doesn't affect the overall picture. Unfortunate, yes, but doesn't change anything in the grand scheme of things.

Regarding the absent configuration, a stark neglect on my part, which I will update accordingly, so thanks for bringing that to my attention =) Time isn't as we would like, hence why only now I'm here and improving the article.



ssdpro said:


> Now that I did see.  I don't think TPU was doing that with malicious intent... I think that is more in the "brain fart" category on their part.
> 
> I have visited your site and understand it would be more appropriate for TPU to outline the precise configuration to better represent the data.  I believe the conclusion remains the same - latency is higher than we would like.



^

This. I will, however, edit the piece including the noted configuration.



lexluthermiester said:


> There were a few problems with this article. The use of "ms"(milliseconds) instead of "ns"(nanoseconds) was fairly glaring. CPU operating reaction speeds have not been measured in "ms" since the early 80's. There were also a few grammatical errors which have been fixed. You're welcome.



I will ignore the delivery of your criticism and focus on the content. Thank you for it.




Xzibit said:


> *AIDA64 tweeted*
> 
> 
> Kind of hard to have a working AIDA64 for Ryzen when the company Tweets it cant fix it until they get a Ryzen chip the same day that article is published.



For me, that was the whole point of the post. AIDA 64 is a benchmarking utility, but until it has been "fixed", as in, properly optimized for Ryzen, I think it presents itself as a great opportunity to see Ryzen's behavior on non-optimized workloads (ie, what all games currently are).


----------



## geon2k2 (Mar 6, 2017)

r9 said:


> More or less what I wanted to know yesterday.




I would also want to know if the 4 core 8 thread part will be affected. 
Anyway that is the most interesting part from this launch, the 16 core, while it is nice and powerful is too much for current software.


----------



## trparky (Mar 6, 2017)

Can these issues be fixed in software or is a design flaw that simply can't be fixed until the next version of Ryzen? As a person who hoped and prayed that AMD would be able to give Intel a much deserved kick to their balls, all of this news about Ryzen's performance (or lack thereof) is a major let down to me.


----------



## akumod77 (Mar 6, 2017)

Why not compare any Ryzen againts i7 7700k at same clock speed, mem timings, core/thread count?

For eg, because Ryzen won't oc much. Clock them both @ 3.9ghz, 4c/8t. I know we are gimping the i7 7700k but i'm just curious to know the result of "almost the same" setup would be. Gaming & productivity benches needed.


----------



## r9 (Mar 6, 2017)

akumod77 said:


> Why not compare any Ryzen againts i7 7700k at same clock speed, mem timings, core/thread count?
> 
> For eg, because Ryzen won't oc much. Clock them both @ 3.9ghz, 4c/8t. I know we are gimping the i7 7700k but i'm just curious to know the result of "almost the same" setup would be. Gaming & productivity benches needed.



Where did you get this graphs from ?


----------



## eidairaman1 (Mar 6, 2017)

2400 DDR4 is slower than my 2133 DDR3 at 2400 with my timings below.


----------



## lexluthermiester (Mar 6, 2017)

Raevenlord said:


> I will ignore the delivery of your criticism and focus on the content. Thank you for it.


My delivery was intended as constructive, helpful criticism. Don't let it bruise you're ego.  



Raevenlord said:


> For me, that was the whole point of the post. AIDA 64 is a benchmarking utility, but until it has been "fixed", as in, properly optimized for Ryzen, I think it presents itself as a great opportunity to see Ryzen's behavior on non-optimized workloads (ie, what all games currently are).


If AIDA 64 and game engines worked in similar ways, that logic would be flawless. But they don't, so that logic fails. What is needed is a utility that works the hardware it's testing properly to give accurate results and information.


----------



## akumod77 (Mar 6, 2017)

r9 said:


> Where did you get this graphs from ?



I got it from Thailand (i think) tech website Zolkorn. Here's the link http://translate.google.com/transla...ntel-core-i7-7700k-mhz-by-mhz-core-by-core/4/


----------



## Grings (Mar 6, 2017)

Im seeing too many badly theorycrafted reasons for that bad gaming performance (that disabling smt fixes)


----------



## Joss (Mar 6, 2017)

akumod77 said:


> I got it from Thailand (i think) tech website Zolkorn. Here's the link http://translate.google.com/translate?hl=en&sl=auto&tl=en&u=http://www.zolkorn.com/reviews/amd-ryzen-7-1800x-vs-intel-core-i7-7700k-mhz-by-mhz-core-by-core/4/


That is a very interesting test, thanks.


----------



## geon2k2 (Mar 6, 2017)

r9 said:


> Missing the point there. It can be 2 threads and still bottleneck if the software tries to move the thread from  CCX0 to CCX1.
> Which is something that Games and OS do quite often to balance load among cores.
> By doing that will have to move the data from CCX0 L3 Cache to CCX1 L3 Cache which will cause the bottleneck because of the ultra slow L3 interconnect.
> The solution should be in sight, they just to make the Windows scheduler aware of the design and move thread only in the CCX that thread originates.
> ...



If this is the case, why on earth didn't AMD just send an email to Microsoft to modify the scheduler in the way they wanted, just before the launch or even better, why they didn't release a driver. In the old days for Athlon X2 there was a driver called dual core optimizer.


----------



## Kanan (Mar 6, 2017)

geon2k2 said:


> If this is the case, why on earth didn't AMD just send an email to Microsoft to modify the scheduler in the way they wanted, just before the launch or even better, why they didn't release a driver. In the old days for Athlon X2 there was a driver called dual core optimizer.


Too busy optimizing Ryzen.


----------



## mouacyk (Mar 6, 2017)

Kanan said:


> Too busy optimizing Ryzen.


And not enough time beta-testing their new CPU with production software/games to see how they perform in the real world?  Engineers and their clean rooms... puhhh!


----------



## TheLaughingMan (Mar 6, 2017)

geon2k2 said:


> If this is the case, why on earth didn't AMD just send an email to Microsoft to modify the scheduler in the way they wanted, just before the launch or even better, why they didn't release a driver. In the old days for Athlon X2 there was a driver called dual core optimizer.



Well that was different. That was created to correct an issue with older game titles that were programmed to run on 1 thread only, but left the scheduling up to Windows. *My memory is a little fuzzy* but I recall the two cores without the optimizer would occasion get into a race case which would stall the game and sometimes crash it. The optimizer fixed that issue and allowed for games and programs that didn't care if there was more than 1 core to run smoothly.

As I understand it**

Now the issue seems to be that as tasks are migrating from core to core is causing a lost in data access. If a task was on core 1 CCX 1 and got picked up by core 7 on CCX 2, all of the data related to that task in the L3 cache is gone. At that point the chip can either copy over the data from the L3 cache on CCX1 through the data fabric (22 GB/s shared with the memory controller) or let it get created naturally as the task starts from scratch with code paths in the L1 and L2 and finally the L3 when needed. Now multiple that happening 1000 fold because a game doesn't care what core is handling the math. This is what potentially is causing the drop in performance for some games.

What they have to do is either limit which logical cores can pick up a task or rework how the CPU handles these kinds of data shifts. Fixing the data fabric bandwidth, unifying the L3 cache into a true single 16 MB cache, of fixing the latency issue with memory would need to be fixed in future generations of the chip.


----------



## Ramin Rostami (Mar 6, 2017)

Above writen is wrong and fake

Archtectural and design completly diffrance  in ryzen cpu and intel

There are many reson like

L3 cash in 6900 is 1 unit  but L3 cash is 16 unit  (1 for each block)


----------



## TheGuruStud (Mar 6, 2017)

This stuff is cute. I already saw a cache benchmark where ryzen annihilated Intel's speeds (almost 2x sometimes). There was some latency issue, but it looked nothing like this goofball crap.


----------



## Steevo (Mar 6, 2017)

Ramin Rostami said:


> *Above writen is wrong and fake
> 
> Archtectural and design completly diffrance in ryzen cpu and intel
> 
> ...




Please don't repost. Also, making your post bold will do nothing. Source of your information would be good, otherwise it will be ignored and continued reposting may anger the ban stick gods. 


Lastly, why would AMD make their cache non-contiguous when that would create extra overhead and slow down the read and write speed as it would have to wait for data lines to be cleared (grounded or terminated to make sure no residual voltage created a false bit) between each segment being read or written to? Cache works on the same physics as all other memory systems.


----------



## nem.. (Mar 6, 2017)

*Article translated

Nothing better than getting up on a Sunday and keep talking about Ryzen, and if we already know that the platform is plagued with problems with RAM, because manufacturers have released their motherboards with BIOS beta, they are causing numerous problems with memory RAM, we now know that these problems greatly affect CPU performance.

These problems, present in all assemblers, such as MSI, Gigabyte, ASRock or Asus, has to do with RAM, which has problems working at its maximum speed, so depending on the motherboard may only work at 1866 MHz or 2400 MHz (as in our case) instead of the 3400 MHz that reaches the memory, and it is when we know that the memory influences much in the final performance, being able to see as in the software of benchmarking Geekbench, in the mononuclear test , Moving from a memory of 2133 MHz @ 3466 MHz implies no less than 10 percent extra performance.




 


In games, the same thing happens. With The Witcher 3, along with a GeForce GTX 1080 graphics, Full HD resolution and maximum graphics quality (with Hairworks off), we can see how the game reaches 92.5 FPS with a RAM DDR4 @ 2133 MHz , While if this memory is 3200 MHz we see how we get a gain of 14.9 FPS that makes us reach 107.4 FPS.

In this way, the first Ryzen reviews all score the same when using the same boards or memory modules, so it will not be until the motherboard manufacturers solve the problems when we can see the actual performance of the CPUs, Unlike Intel, the speed of memory does imply large differences in performance.





thnx google translate xd
Source:
Http://www.eteknix.com/memory-speed-large-impact-ryzen-performance/
Https://elchapuzasinformatico.com/2017/03/la-velocidad-la-ram-fuerte-impacto-rendimiento-amd-ryzen/ ..


----------



## Grings (Mar 6, 2017)

Ramin Rostami said:


> Above writen (high latency on L3 cash of rysen) is 100% wrong and fake
> 
> Archtectural and design completly diffrance in ryzen cpu and intel
> 
> ...



This is a forum, not a scrolling live chat, we can see your first (and second) post already


----------



## Steevo (Mar 6, 2017)

Ramin Rostami said:


> Sorry i am new here
> 
> In rysen L3 cash not share cash in 1 unit(like intel)
> In rysen L3 cash is 16 unit so read from 16 diffrent unit takes more than 1
> ...



I already explained it, irregardless of memory technology what matters are

1) Data lines (physical traces or wires allowing data to send or receive) and how many there are, and we have no idea as the architecture is not detailed to the public.
2) Clock rate (How fast the data is accessed) again we don't know the specifics, but through extrapolating data and using exploratory data sets we can gain some insight.
3) Latency (how many clock cycles between a request to read and the actual data showing up on the data lines) and we can figure this out if we know other things about the processor or by allowing a program to run and copy differing data sizes into a core or cache and then asking it to be read back and counting the number of clock cycles between. 


What we don't have access to yet is a few key components and some software control that may allow us to force data into one cache, and allow us to copy between caches to determine the extra latency introduced by the transfer. 

1 or 16 does NOT matter.


----------



## ender79 (Mar 6, 2017)

Ok, in fact ryzen R7 is a Intel Core2Quad in AMD vision. 
So you ask AMD for a octa core processor and we have now a dual quad core .
The design problem with 2 CCX interconnected trough an internal bus is a crime for gaming, maybe 10 years ago was a clear innovation.

Anyway this design can't be fixed by a BIOS update, even with more accurate values from AIDA64 beta and lower latency , the gaming on 8 core Ryzen will have to suffer. Games can be patched to use the main thread on one CCX and AI and others on second CCX, but that won't fix 100% the gaming experience.

Rysen 4 core probably will not suffer from the same issue, even if I read well the graph, the problem exist over 4M L3 also , but 4 core ryzen will be the best option for budget gamers.


----------



## geon2k2 (Mar 6, 2017)

My understanding from hardware.fr is that the CCX complex runs at the same frequency as the memory, and somehow the bandwidth is shared between inter module communication and memory access.

This is the reason for which higher memory frequency will provide much better results as the bandwidth for inter-module communication increases with frequency.  From 2133 to 3200 the bandwidth for internal communication increases from 34GB/s to 51GB/s, and that's why the witcher 3 benchmark posted above scales so well, not necessarily due to faster memory, which by itself has little impact as we saw numerous times, but because the communication between modules increases drastically with better memory frequency.

I'm waiting with more interes the R5 1400x. The the 1600x (6 core) will probably be plagued by the same issue.


----------



## qubit (Mar 6, 2017)

Raevenlord said:


> Namely, the fact that there seems to be some problem with Ryzen's L3 cache and memory subsystem implementation.


This seems to be the core of the problem and I'm not sure if a microcode update can fix this. If not, then a Ryzen version 2 silicon will be needed, which is a shame since it gives Intel the chance to stay one step ahead of them again. The best competition will happen if AMD can leapfrog Intel and that hasn't happened since 2005 with the Athlon64.

I hope AMD give customers a generous trade-in program if such a revision is released. This will significantly boost confidence in the brand and make customers feel well looked after.


----------



## Steevo (Mar 6, 2017)

Without further testing (whats the penalty matrix for thread core change*) we are all guessing, educated guesses for the most part, but guessing.

Thread handling is the job of the OS primarily unless the executables are updated to handle their own, and most just look to see how many threads to spawn based on cores and no further. But like Bulldozer if the OS assigns threads based on the CCX architecture there will be little or no penalty as the thread won't move unless it can benefit from the move.

* CCX0 C1 --> CCX4 C2 move costs 46 cycles of wait time for example, but CCX0 C0 to CCX1 C1 only costs 22 cycles. How many wasted cycles between every CCX and every core in every CCX. Does the penalty increase or remain the same depending on other CCX/cores being busy? 


More information is needed before we can make broad statements, a better and more efficient thread handling/dispatching algorithm may increase the performance a lot.


----------



## TheLaughingMan (Mar 7, 2017)

These are not the chips for pure gamers either way. We need to wait and see what the Ryzen 5 1400 and 1500 have to offer as the 4 core / 8 thread chips. Will they clock higher, will they scale in performance better, and they will also have the benefit of being released on a platform that has some time to mature and work out some of these kinks.


----------



## mastrdrver (Mar 7, 2017)

qubit said:


> This seems to be the core of the problem and I'm not sure if a microcode update can fix this. If not, then a Ryzen version 2 silicon will be needed, which is a shame since it gives Intel the chance to stay one step ahead of them again. The best competition will happen if AMD can leapfrog Intel and that hasn't happened since 2005 with the Athlon64.
> 
> I hope AMD give customers a generous trade-in program if such a revision is released. This will significantly boost confidence in the brand and make customers feel well looked after.



And Windows apparently has problems allocating system resources correctly for Ryzen:

https://forums.anandtech.com/threads/ryzen-strictly-technical.2500572/page-2#post-38770630


----------



## Renald (Mar 7, 2017)

To add some info on this test, Hardware.fr is saying in this article that they perfectly knows that AIDA64 is not suitable for Ryzen, but they ran the test anyway just to have some base of work and show the real problem which is demonstrated in the graph later in this article.

After confirming that there was a problem with L3 cache, they ran a custom-made program which was intended to perform cache usage incrementally higher and test latency. They clearly stated that the creator of the program did not checked it before using it on Ryzen, just to be clear.

The result in ns shown in green are the results of this custom-made test.
Sum up : as soon as a thread is moved from a core to another, it's cache is immediately pushed into this high latency cache, creating a huge drop in performance. The CCX architecture is not helping, neither bandwidth between them.

Since then, they try to isolate this behavior to prove Ryzen sensibility to core parking(which is not yet still proven or "that simple"). Not an easy task it seems, because of Windows 10 scheduler. And still, many manufacturers are not yet ready, and come to mess around with not suitable drivers and patches... Kinda annoying :/

Sorry for my poor English, and HFR team, just whip me if I'm saying some crappy things. I'm still (and you too as I understood), not quite sure what is the real underlying problem in the end


----------



## Wark0 (Mar 7, 2017)

nem.. said:


> *Article translated
> 
> Nothing better than getting up on a Sunday and keep talking about Ryzen, and if we already know that the platform is plagued with problems with RAM, because manufacturers have released their motherboards with BIOS beta, they are causing numerous problems with memory RAM, we now know that these problems greatly affect CPU performance.
> 
> ...



Oh no... not this thing again ... this figures are from MSI that reused the same benchmark from an older MSI slide with benchmark on ... Z270 ... :lol:


----------



## Wark0 (Mar 7, 2017)

mastrdrver said:


> And Windows apparently has problems allocating system resources correctly for Ryzen:
> 
> https://forums.anandtech.com/threads/ryzen-strictly-technical.2500572/page-2#post-38770630


No, it's not the case with final bios :

http://forum.hardware.fr/hfr/Hardware/hfr/dossier-1800x-retour-sujet_1017196_20.htm#t10089095


----------



## Renald (Mar 7, 2017)

mastrdrver said:


> And Windows apparently has problems allocating system resources correctly for Ryzen:
> 
> https://forums.anandtech.com/threads/ryzen-strictly-technical.2500572/page-2#post-38770630


(Sorry for double post)
They looked into it at Hardware.fr and said that the guy is not saying the truth, and that Core0/1/2/3 are sharing 8Mb and Core4/5/6/7 are sharing another 8Mb.
http://www.hardware.fr/articles/956-1/amd-ryzen-7-1800x-test-retour-amd.html ==> Page 80 if you speak french

This line : "****---- Unified Cache 1, Level 3, 8 MB, Assoc 16" Says : Core 0/1/2/3 share 8MB of L3
Or here "**------ Unified Cache 0, Level 2, 512 KB, Assoc 8" Says : Core 0/1 share 512KB of L2
and so on.

So it's some crappy analyze here, and they are pretty technical guys so they won't affirm it's crap unless it really is.


----------



## Wark0 (Mar 7, 2017)

Renald said:


> Since then, they try to isolate this behavior to prove Ryzen sensibility to core parking(which is not yet still proven or "that simple"). Not an easy task it seems, because of Windows 10 scheduler. And still, many manufacturers are not yet ready, and come to mess around with not suitable drivers and patches... Kinda annoying :/



Ryzen with SMT is not more sensible to core parking than Intel with SMT. Both don't like scheduler with core parking.

The fact is that on Windows 10, Core Parking is OFF in balanced mode with an Intel CPU (BDW-E, SKL for example), and ON on Ryzen. So you have to switch it off manually in order to compare apple to apple.


----------



## C_Wiz (Mar 7, 2017)

Raevenlord said:


> I can compare them between your own results, which where all done with the same configuration between the 6900K and the 1800X, right? That's what I compare in the article.


Yes our values are comparable, you got that right don't worry. 

I was talking to someone else at that time who was comparing that value (98ns) to Aida64 stock values, which was not ok because of the clock being different .




Raevenlord said:


> Time isn't as we would like, hence why only now I'm here and improving the article.


 Yeah we've been working non-stop on Ryzen post launch, trying to figure out the issues we were seeing, time is way too short and sleep levels way too low 




Raevenlord said:


> For me, that was the whole point of the post. AIDA 64 is a benchmarking utility, but until it has been "fixed", as in, properly optimized for Ryzen, I think it presents itself as a great opportunity to see Ryzen's behavior on non-optimized workloads (ie, what all games currently are).


Yeah again, to be 100% clear, we double checked with FinalWire what was accurate and wasn't according to them, our deep dive is exactly about trying to work around why some of the values reported aren't accurate, and go from there. It's by starting to track this down that we got to the CCX interconnect bandwidth limitation, the way split caches are handled in single thread etc etc.

Thanks for fixing your summary !

G.


----------



## C_Wiz (Mar 7, 2017)

Grings said:


> Im seeing too many badly theorycrafted reasons for that bad gaming performance (that disabling smt fixes)


We actually covered that in an update on sunday if you are interested : http://www.hardware.fr/articles/956-8/retour-smt-mode-high-performance.html

Long story short, some Windows 10 Anniversary Update scheduler settings aren't set the same way for Ryzen and Intel CPUs. We tested that and updated our article accordingly.



geon2k2 said:


> My understanding from hardware.fr is that the CCX complex runs at the same frequency as the memory, and somehow the bandwidth is shared between inter module communication and memory access.
> 
> This is the reason for which higher memory frequency will provide much better results as the bandwidth for inter-module communication increases with frequency.  From 2133 to 3200 the bandwidth for internal communication increases from 34GB/s to 51GB/s, and that's why the witcher 3 benchmark posted above scales so well, not necessarily due to faster memory, which by itself has little impact as we saw numerous times, but because the communication between modules increases drastically with better memory frequency.


Actually the Witcher 3 "bench" is from a MSI/Intel advert (if I remember correctly).

But your overall point is exactly correct : data fabric clock is set with memory (so DDR4-2400 = 1200 MHz clock for that bus), so if you are limited there, you'll see a componding effect by pushing memory higher.


----------



## mastrdrver (Mar 7, 2017)

Wark0 said:


> No, it's not the case with final bios :
> 
> http://forum.hardware.fr/hfr/Hardware/hfr/dossier-1800x-retour-sujet_1017196_20.htm#t10089095



When was final BIOS? 

Because here's what Stilt originally got which is no where near what they got. Though this is from last Thursday and I've not been keeping up with how often BIOSes have been released.


```
Logical Processor to Cache Map:
*---------------  Data Cache          0, Level 1,   32 KB, Assoc   8, LineSize  64
*---------------  Instruction Cache   0, Level 1,   64 KB, Assoc   4, LineSize  64
*---------------  Unified Cache       0, Level 2,  512 KB, Assoc   8, LineSize  64
*---------------  Unified Cache       1, Level 3,   16 MB, Assoc  16, LineSize  64
-*--------------  Data Cache          1, Level 1,   32 KB, Assoc   8, LineSize  64
-*--------------  Instruction Cache   1, Level 1,   64 KB, Assoc   4, LineSize  64
-*--------------  Unified Cache       2, Level 2,  512 KB, Assoc   8, LineSize  64
-*--------------  Unified Cache       3, Level 3,   16 MB, Assoc  16, LineSize  64
--*-------------  Data Cache          2, Level 1,   32 KB, Assoc   8, LineSize  64
--*-------------  Instruction Cache   2, Level 1,   64 KB, Assoc   4, LineSize  64
--*-------------  Unified Cache       4, Level 2,  512 KB, Assoc   8, LineSize  64
--*-------------  Unified Cache       5, Level 3,   16 MB, Assoc  16, LineSize  64
---*------------  Data Cache          3, Level 1,   32 KB, Assoc   8, LineSize  64
---*------------  Instruction Cache   3, Level 1,   64 KB, Assoc   4, LineSize  64
---*------------  Unified Cache       6, Level 2,  512 KB, Assoc   8, LineSize  64
---*------------  Unified Cache       7, Level 3,   16 MB, Assoc  16, LineSize  64
----*-----------  Data Cache          4, Level 1,   32 KB, Assoc   8, LineSize  64
----*-----------  Instruction Cache   4, Level 1,   64 KB, Assoc   4, LineSize  64
----*-----------  Unified Cache       8, Level 2,  512 KB, Assoc   8, LineSize  64
----*-----------  Unified Cache       9, Level 3,   16 MB, Assoc  16, LineSize  64
-----*----------  Data Cache          5, Level 1,   32 KB, Assoc   8, LineSize  64
-----*----------  Instruction Cache   5, Level 1,   64 KB, Assoc   4, LineSize  64
-----*----------  Unified Cache      10, Level 2,  512 KB, Assoc   8, LineSize  64
-----*----------  Unified Cache      11, Level 3,   16 MB, Assoc  16, LineSize  64
------*---------  Data Cache          6, Level 1,   32 KB, Assoc   8, LineSize  64
------*---------  Instruction Cache   6, Level 1,   64 KB, Assoc   4, LineSize  64
------*---------  Unified Cache      12, Level 2,  512 KB, Assoc   8, LineSize  64
------*---------  Unified Cache      13, Level 3,   16 MB, Assoc  16, LineSize  64
-------*--------  Data Cache          7, Level 1,   32 KB, Assoc   8, LineSize  64
-------*--------  Instruction Cache   7, Level 1,   64 KB, Assoc   4, LineSize  64
-------*--------  Unified Cache      14, Level 2,  512 KB, Assoc   8, LineSize  64
-------*--------  Unified Cache      15, Level 3,   16 MB, Assoc  16, LineSize  64
--------*-------  Data Cache          8, Level 1,   32 KB, Assoc   8, LineSize  64
--------*-------  Instruction Cache   8, Level 1,   64 KB, Assoc   4, LineSize  64
--------*-------  Unified Cache      16, Level 2,  512 KB, Assoc   8, LineSize  64
--------*-------  Unified Cache      17, Level 3,   16 MB, Assoc  16, LineSize  64
---------*------  Data Cache          9, Level 1,   32 KB, Assoc   8, LineSize  64
---------*------  Instruction Cache   9, Level 1,   64 KB, Assoc   4, LineSize  64
---------*------  Unified Cache      18, Level 2,  512 KB, Assoc   8, LineSize  64
---------*------  Unified Cache      19, Level 3,   16 MB, Assoc  16, LineSize  64
----------*-----  Data Cache         10, Level 1,   32 KB, Assoc   8, LineSize  64
----------*-----  Instruction Cache  10, Level 1,   64 KB, Assoc   4, LineSize  64
----------*-----  Unified Cache      20, Level 2,  512 KB, Assoc   8, LineSize  64
----------*-----  Unified Cache      21, Level 3,   16 MB, Assoc  16, LineSize  64
-----------*----  Data Cache         11, Level 1,   32 KB, Assoc   8, LineSize  64
-----------*----  Instruction Cache  11, Level 1,   64 KB, Assoc   4, LineSize  64
-----------*----  Unified Cache      22, Level 2,  512 KB, Assoc   8, LineSize  64
-----------*----  Unified Cache      23, Level 3,   16 MB, Assoc  16, LineSize  64
------------*---  Data Cache         12, Level 1,   32 KB, Assoc   8, LineSize  64
------------*---  Instruction Cache  12, Level 1,   64 KB, Assoc   4, LineSize  64
------------*---  Unified Cache      24, Level 2,  512 KB, Assoc   8, LineSize  64
------------*---  Unified Cache      25, Level 3,   16 MB, Assoc  16, LineSize  64
-------------*--  Data Cache         13, Level 1,   32 KB, Assoc   8, LineSize  64
-------------*--  Instruction Cache  13, Level 1,   64 KB, Assoc   4, LineSize  64
-------------*--  Unified Cache      26, Level 2,  512 KB, Assoc   8, LineSize  64
-------------*--  Unified Cache      27, Level 3,   16 MB, Assoc  16, LineSize  64
--------------*-  Data Cache         14, Level 1,   32 KB, Assoc   8, LineSize  64
--------------*-  Instruction Cache  14, Level 1,   64 KB, Assoc   4, LineSize  64
--------------*-  Unified Cache      28, Level 2,  512 KB, Assoc   8, LineSize  64
--------------*-  Unified Cache      29, Level 3,   16 MB, Assoc  16, LineSize  64
---------------*  Data Cache         15, Level 1,   32 KB, Assoc   8, LineSize  64
---------------*  Instruction Cache  15, Level 1,   64 KB, Assoc   4, LineSize  64
---------------*  Unified Cache      30, Level 2,  512 KB, Assoc   8, LineSize  64
---------------*  Unified Cache      31, Level 3,   16 MB, Assoc  16, LineSize  64
```


----------



## C_Wiz (Mar 7, 2017)

mastrdrver said:


> When was final BIOS?


The Asus BIOS (5704) is dated 23/02 (it wasn't available publicly then obviously, but same bios is available on Asus's website now, we checked checksums to confirm it's the same). This is the BIOS that includes the "final" (before launch) microcode update from AMD. Cache is shown correctly there as wark0 posted earlier in the thread (here : http://forum.hardware.fr/hfr/Hardware/hfr/dossier-1800x-retour-sujet_1017196_20.htm#t10089095 )

To be clear, 5704 was not the BIOS given to reviewers (I think 5702 ?) on the motherboards by AMD, you had to flash it yourself but that's pretty common with launchs and AMD gave many heads up on that.


----------



## nemesis.ie (Mar 7, 2017)

Just putting this out there as another data point for the RAM:

http://www.legitreviews.com/ddr4-me...latform-best-memory-kit-amd-ryzen-cpus_192259


----------



## Imsochobo (Mar 7, 2017)

Grings said:


> Im seeing too many badly theorycrafted reasons for that bad gaming performance (that disabling smt fixes)



I can confirm that Windows 10 is a big issue and could cause 10FPS throughout most games.
SMT is also an issue but Windows 10 is worse than Windows 7 and Linux in terms of performance, most games have been benched in Windows 10.
I cannot find the same findings with a Xeon 2680 V2, it only drops a fps in windows 10 instead of 10 and in csgo 20 for me.

I can confirm something is iffy.


----------



## C_Wiz (Mar 7, 2017)

Imsochobo said:


> I can confirm that Windows 10 is a big issue and could cause 10FPS throughout most games.
> SMT is also an issue but Windows 10 is worse than Windows 7 and Linux in terms of performance, most games have been benched in Windows 10.
> I cannot find the same findings with a Xeon 2680 V2, it only drops a fps in windows 10 instead of 10 and in csgo 20 for me.
> 
> I can confirm something is iffy.


Again, we confirmed that scheduler isn't configured the same way for Ryzen and Intel CPUs in Windows 10, which explains the discrepencies between SMT OFF and ON in games, check my link above.


----------



## airfathaaaaa (Mar 7, 2017)

the54thvoid said:


> So...... Is this AMD's equivalent to Nvidia not doing Async? And can software coding help address this?



this is windows load balancing working like it id on nehalems and first gen skylakes

basicly windows treats ryzen as a massive 16 core cpu instead of 8c 16t
and that basicly creates all of the other problems that this cpu has because normally windows throw all of the heavy workloads into the physical cores and let the rest on the logical ones but here windows throw everything at everything resulting on the cpu to have to rely on "stealing" ram from the system ram because windows thinks it has a massive 138mb l3

and due to the nature of the smt some times when windows keeps a thread on the cpu (remember amd says that a ccx is a cpu not a core) the data on the l3 gets "lost" and thus windows re issues a new load to the said thread but the data is already on l3 thus resulting on the core parking bug because the cpu needs to pause the new workload to flush the identical one that is already on the l3


----------



## Kanan (Mar 7, 2017)

airfathaaaaa said:


> this is windows load balancing working like it id on nehalems and first gen skylakes
> 
> basicly windows treats ryzen as a massive 16 core cpu instead of 8c 16t
> and that basicly creates all of the other problems that this cpu has because normally windows throw all of the heavy workloads into the physical cores and let the rest on the logical ones but here windows throw everything at everything resulting on the cpu to have to rely on "stealing" ram from the system ram because windows thinks it has a massive 138mb l3
> ...


How did you come to this conclusion?


----------



## Gasaraki (Mar 7, 2017)

Camm said:


> One does wonder if the 4 core parts will suffer the same fate since it will be one straight core complex.



The quad cores might be a beast!


----------



## airfathaaaaa (Mar 7, 2017)

Kanan said:


> How did you come to this conclusion?


we already have the full picture of the problems

and we already have similiar problems in the past(identical to be honest ) its not really hard to connect the dots especially when we know that the smt taps into all the three caches

also a really good video to watch


----------



## r9 (Mar 7, 2017)

geon2k2 said:


> If this is the case, why on earth didn't AMD just send an email to Microsoft to modify the scheduler in the way they wanted, just before the launch or even better, why they didn't release a driver. In the old days for Athlon X2 there was a driver called dual core optimizer.


Good question. Also even all of that makes a lot of sense might be just load of BS who knows. But like I've said to many times already, for the love of God please somebody disable smt and one of the CCX and bench games and compare. If the scores sucks the same, means thread/cache shuffle has nothing to do with it.


----------



## r9 (Mar 7, 2017)

akumod77 said:


> I got it from Thailand (i think) tech website Zolkorn. Here's the link http://translate.google.com/translate?hl=en&sl=auto&tl=en&u=http://www.zolkorn.com/reviews/amd-ryzen-7-1800x-vs-intel-core-i7-7700k-mhz-by-mhz-core-by-core/4/


In their review they either the found the answer to the poor gaming performance of Ryzen or they are doing something very wrong with the 7700k lol. Because in their test ryzen is matching 7700k more or less. And that is not bad taking into consideration ryzen will dominate everything else.


----------



## mastrdrver (Mar 8, 2017)

C_Wiz said:


> The Asus BIOS (5704) is dated 23/02 (it wasn't available publicly then obviously, but same bios is available on Asus's website now, we checked checksums to confirm it's the same). This is the BIOS that includes the "final" (before launch) microcode update from AMD. Cache is shown correctly there as wark0 posted earlier in the thread (here : http://forum.hardware.fr/hfr/Hardware/hfr/dossier-1800x-retour-sujet_1017196_20.htm#t10089095 )
> 
> To be clear, 5704 was not the BIOS given to reviewers (I think 5702 ?) on the motherboards by AMD, you had to flash it yourself but that's pretty common with launchs and AMD gave many heads up on that.



Ok I understand you now, thanks.

How much performance is left in Ryzen once things get tweaked out? 10%?


----------



## jahramika (Mar 8, 2017)

Imsochobo said:


> I can confirm that Windows 10 is a big issue and could cause 10FPS throughout most games.
> SMT is also an issue but Windows 10 is worse than Windows 7 and Linux in terms of performance, most games have been benched in Windows 10.
> I cannot find the same findings with a Xeon 2680 V2, it only drops a fps in windows 10 instead of 10 and in csgo 20 for me.
> 
> I can confirm something is iffy.



My windows 10 constantly keeps switching to Power Saving Mode setting.


----------



## nemesis.ie (Mar 8, 2017)

I highly recommend everyone to take a look at this:










Some vary salient points in there and also showing that Bulldozer seems to have overtaken 2500k in gaming over time ...


----------



## EarthDog (Mar 8, 2017)

Aging with this vid... I think its the third thread it was posted in already.


----------



## TRWOV (Mar 8, 2017)

It all depends on how you see things really. If you want absolutely the best gaming performance then stick with Intel. If you need a CPU with lots of threads that can also game pretty well then go for the R7s.

My main gaming rig still has a 3770k, haven't found a reason to upgrade yet. Same with my Steambox build, running a 4590. I'll replace all my crunchers with 1700s, that's a given.


----------



## Shihab (Mar 8, 2017)

Getting GTX970 feels for some reason. Also, Bulldozer "modularity."
Anyone tried to disable 4 cores (or 7, to be safe) in BIOS and see how it fares?




TRWOV said:


> It all depends on how you see things really. If you want absolutely the best gaming performance then stick with Intel. If you need a CPU with lots of threads that can also game pretty well _*but doesn't cost an arm and a leg,*_ then go for the R7s.



Fixed that for you.
Admittedly, first time I saw the Zen reviews I thought that it'd kill intel in anything outside games. Then I remembered that even industry standard productivity software can be -and often is- embarrassingly single threaded, so a combination of both (core count and per-core IPC) is often needed, and here Intel still reigns, as long as price isn't a factor.




nemesis.ie said:


> I highly recommend everyone to take a look at this:
> 
> 
> 
> ...



IMO, saying that that channel is merely an AMD apologist would be an understatment.
(And I have a nagging feeling that I've already said that somewhere around here....)


----------



## mastrdrver (Mar 9, 2017)

nemesis.ie said:


> I highly recommend everyone to take a look at this:
> 
> 
> 
> ...



I think this is a better video with this post about Nehalem compared to Yorkfield:

https://forum.beyond3d.com/threads/why-nehalem-for-games-is-not-better-than-yorkfield.44382/


----------



## mcraygsx (Mar 9, 2017)

Vlada011 said:


> If Skylake-E and Kaby Lake-E samples are finished I don;t know how much Intel could change to improve his tragic position where his 1700$ worth CPU lost from 500$ AMD with 2 core less and much less power consumption, almost half.
> Even if Intel catch AMD that would be with 8 and 10 cores processors and 150W power consumption.
> Because of that upgrade on AMD is good choice at the moment.
> Special if someone want small PC, mATX mobo, fanless 500W PSU and RX 580 + 1800X.
> ...



Well said, enthusiast should be thankful we have a real competition in high end CPU. This was my first AMD CPU and I am clearly impressed by 1800X and amazing by how much money I have thrown at Intel for better IPC. And AMD just gave me a 6900K/6850K equivalent for $499.


----------



## FordGT90Concept (Mar 9, 2017)

When one core accesses another core's memory, it behaves like an L4 instead of an L3.  The extra latency at L4 is still much better than the system RAM so, I really don't see the cause for the fuss.

In fact, it does look like 1800X's dedicated L3 (~15 ns) is faster than i7-6900K (~17ns).


----------



## TheLaughingMan (Mar 9, 2017)

FordGT90Concept said:


> When one core accesses another core's memory, it behaves like an L4 instead of an L3.  The extra latency at L4 is still much better than the system RAM so, I really don't see the cause for the fuss.
> 
> In fact, it does look like 1800X's dedicated L3 (~15 ns) is faster than i7-6900K (~17ns).



I think people were bored and haven't needed to talk about AMD for like 5 years and it all came out at once. As the workstation CPU that it is, it is great and handling some gaming on the side pretty damn well. If we get 4 core /8 thread CPUs from AMD that still can't clock higher than 4.0 GHz and falling well behind 3000 series i5 CPUs, everyone can panic. Until then its bug squishing and industry adjustment time!


----------



## TheGuruStud (Mar 9, 2017)

TheLaughingMan said:


> I think people were bored and haven't needed to talk about AMD for like 5 years and it all came out at once. As the workstation CPU that it is, it is great and handling some gaming on the side pretty damn well. If we get 4 core /8 thread CPUs from AMD that still can't clock higher than 4.0 GHz and falling well behind 3000 series i5 CPUs, everyone can panic. Until then its bug squishing and industry adjustment time!



You'll have to wait till the refresh for (hopefully) better clocks.


----------



## TheLaughingMan (Mar 9, 2017)

TheGuruStud said:


> You'll have to wait till the refresh for (hopefully) better clocks.



I am going to wait to see Ryzen 5 in action. We have not concrete information about those chips and how they OC. Overclocking 8 cores is a different animal than overclocking 4 cores...historically at least. And I don't think the limit in OC is entirely the architecture, but we will find out.


----------



## FordGT90Concept (Mar 10, 2017)

TheLaughingMan said:


> I think people were bored and haven't needed to talk about AMD for like 5 years and it all came out at once. As the workstation CPU that it is, it is great and handling some gaming on the side pretty damn well. If we get 4 core /8 thread CPUs from AMD that still can't clock higher than 4.0 GHz and falling well behind 3000 series i5 CPUs, everyone can panic. Until then its bug squishing and industry adjustment time!


CPU matters less and less with the rise of Vulkan and D3D12.  I am utterly unconcerned about it.

Proof: http://www.techspot.com/review/1348-amd-ryzen-gaming-performance/

In games that are well multithreaded, Ryzen does fine.  In games that aren't, it does well enough.  The higher the resolution and detail, the more Ryzen closes the gap with Intel.


----------



## TheGuruStud (Mar 10, 2017)

TheLaughingMan said:


> I am going to wait to see Ryzen 5 in action. We have not concrete information about those chips and how they OC. Overclocking 8 cores is a different animal than overclocking 4 cores...historically at least. And I don't think the limit in OC is entirely the architecture, but we will find out.



I assume the arch is fine. It's the LPP process that wasn't intended for such clocks.


----------



## Fabiano (Mar 10, 2017)

AIDA64 Build 5.80.4089 shows much better reults for Ryzen now.
https://forums.aida64.com/topic/3768-aida64-compatibility-with-amd-ryzen-processors/


----------



## nemesis.ie (Mar 10, 2017)

Is this some beta d/l as my AIDA is not offering an update from 5.08.40 at the moment? Thanks!

And "2 x Octal core" doesn't seem right. When a 4790k says "quadcore" so maybe it needs some more fixing.


----------



## cadaveca (Mar 10, 2017)

nemesis.ie said:


> Is this some beta d/l as my AIDA is not offering an update from 5.08.40 at the moment? Thanks!


You need engineer edition, as I'm on 5.08.4093 and version above is 5.08.4089.


----------



## nemesis.ie (Mar 10, 2017)

That's a bit cheeky to not put it out for Extreme!

Thanks Dave.


----------



## cadaveca (Mar 10, 2017)

nemesis.ie said:


> That's a bit cheeky to not put it out for Extreme!
> 
> Thanks Dave.


Meh, We get to beta-test the new versions, you get stable, what's the issue?


----------



## nemesis.ie (Mar 10, 2017)

The issue is I'm perfectly up for beta testing (and do some in other areas too), maybe they should add an opt-in for that.

In other news, my Asrock X370 (pro gaming) ships today.


----------



## Enlightnd (Mar 17, 2017)

Question, I've read here and in other places that part of the CCX bus congestion issue for games is that PCIe data is also shoved over the CCX bus.

Has anyone done any tests to see if the issue is greater for GPU's on the chipset PCIe lanes vs GPU's on the CPU embedded PCIe lanes?

(EDIT: Fix CPU lanes with PCIe lanes)


----------



## uuuaaaaaa (Mar 17, 2017)

Enlightnd said:


> Question, I've read here and in other places that part of the CCX bus congestion issue for games is that PCIe data is also shoved over the CCX bus.
> 
> Has anyone done any tests to see if the issue is greater for GPU's on the chipset PCIe lanes vs GPU's on the CPU embedded CPU lanes?



That would be a cool thing to test!


----------



## Shirley Marquez (Mar 17, 2017)

airfathaaaaa said:


> this is windows load balancing working like it id on nehalems and first gen skylakes
> 
> basicly windows treats ryzen as a massive 16 core cpu instead of 8c 16t



The cache design of Ryzen 7 suggests that an even better way to handle it would be to schedule it as a two socket system, each of which is a 4c 8t CPU. The L3 cache is divided into two parts, and performance is much worse if a core on side A needs data from side B or vice versa.


----------



## uuuaaaaaa (Mar 17, 2017)

Shirley Marquez said:


> The cache design of Ryzen 7 suggests that an even better way to handle it would be to schedule it as a two socket system, each of which is a 4c 8t CPU. The L3 cache is divided into two parts, and performance is much worse if a core on side A needs data from side B or vice versa.



I think NUMA would require a separate memory controller for each CCX, which is shared between ccx's on ryzen. But yeah, somewhat of an hybrid thing would be the real deal. For now lets hope that 4000MHz memory support gets there...


----------



## EasyListening (Mar 20, 2017)

Enlightnd said:


> Question, I've read here and in other places that part of the CCX bus congestion issue for games is that PCIe data is also shoved over the CCX bus.
> 
> Has anyone done any tests to see if the issue is greater for GPU's on the chipset PCIe lanes vs GPU's on the CPU embedded PCIe lanes?
> 
> (EDIT: Fix CPU lanes with PCIe lanes)



GPUs can only use the lanes on the Ryzen CPUs, they don't connect to the Southbridge. So 16x or 8x/8x, off the CPU.



Shirley Marquez said:


> The cache design of Ryzen 7 suggests that an even better way to handle it would be to schedule it as a two socket system, each of which is a 4c 8t CPU. The L3 cache is divided into two parts, and performance is much worse if a core on side A needs data from side B or vice versa.



I'm wondering if the higher speed of copy operations on the L3 was specifically tweaked to speed up copies between the two L3s, allowing both CCXs to work from the same data after copying things over, if that would even help... but looks like the new version of AIDA makes this whole CCX intercommunication "bug" a non-issue.

Naples has a ton of PCIe lanes connecting two sockets together on dual socket configs. Somewhere at AMD there must have been people who worked on intercommunication between the 2 CCXs. I don't buy the theory that AMD simply dropped the ball and put out a chip with a glaring architectural flaw. If there are limitations of Ryzen I expect to find compromises that were made after intense discussion. Although they don't have a foundry, they do have the ability to do limited production in house for testing and research purposes. It really feels like people are way underestimating AMD and the quality of their product.


----------



## Enlightnd (Mar 20, 2017)

I wonder if that is accurate (about the PCIe lanes). I'm in conversation on IRC with several people using pass-trough (for virtualization) and they are explicitly speaking about the issues they have between GPU's on the CPU based bus and ones on a chipset hosted PCIe slot. Seems some boards have crappy IOMMU groupings causing weirdness with GPUs.


----------



## EasyListening (Mar 20, 2017)

Enlightnd said:


> I wonder if that is accurate (about the PCIe lanes). I'm in conversation on IRC with several people using pass-trough (for virtualization) and they are explicitly speaking about the issues they have between GPU's on the CPU based bus and ones on a chipset hosted PCIe slot. Seems some boards have crappy IOMMU groupings causing weirdness with GPUs.



edit: Sorry, I didn't read your post carefully enough. I'll leave the pic up though, maybe someone will find it useful. But, yea, I have no idea what those guys on IRC are talking about. Aren't they mistaken in thinking that one of their GPUs is running off the chipset?






Taken from
https://rog.asus.com/articles/techn...platform-and-its-x370-b350-and-a320-chipsets/


----------



## Super XP (Mar 23, 2017)

AMD will tighten up this L3 Latence. It will get better and better.


----------



## nemesis.ie (Mar 23, 2017)

EasyListening said:


> edit: Sorry, I didn't read your post carefully enough. I'll leave the pic up though, maybe someone will find it useful. But, yea, I have no idea what those guys on IRC are talking about. Aren't they mistaken in thinking that one of their GPUs is running off the chipset?



No, they are not mistaken, you could for example have 3 GPUs in there.

2 from the CPU and one from the chipset (with the associated latency).

In fact what a lot of the folks using VM want to do is have all 3 cards in separate I/O groups so you can e.g. have one card for your host O/S and the others each dedicated to a VM.
If the groups/UEFI are right, you could have a slower card off the chipset and have that as the host OSes' card (boot graphics) and then two powerful cards connected to the VMs or whatever.


----------



## EasyListening (Mar 29, 2017)

Super XP said:


> AMD will tighten up this L3 Latence. It will get better and better.



anddddddddd, it did.










Man, I am laughing all the way to the bank.


----------



## Nephilim666 (Mar 30, 2017)

The day there is 4GHz ram, 4GHz chip and a nice high capacity (64GB sounds nice) I will be throwing cash at AMD.


----------



## Super XP (Mar 30, 2017)

Nephilim666 said:


> The day there is 4GHz ram, 4GHz chip and a nice high capacity (64GB sounds nice) I will be throwing cash at AMD.


Seeing how Ram Speed makes a huge performance difference in Ryzen, yes Agreed.


----------



## msroadkill612 (Mar 26, 2018)

Shirley Marquez said:


> The cache design of Ryzen 7 suggests that an even better way to handle it would be to schedule it as a two socket system, each of which is a 4c 8t CPU. The L3 cache is divided into two parts, and performance is much worse if a core on side A needs data from side B or vice versa.



What an interesting suggestion.

Your paradigm of splitting, for coding purposes,  the 8 cores into discrete 4 core ccxS & 8MB L3 cache blocks. &   then minimising interaction between them, could speed some apps considerably.

I am a newb~, but i mused similarly in the context of a poor mans vega pro ssg (a 16GB $5000+ Vega w/ an onboard 4x 960 pro raid array).

if you install an Affordable 8 lane vega and an 8 lane 2x nvme adapter, so both link to the same 16 lane ccx (as a 16 lane card does e.g.) , then the gpu and the 2x nvme raid array may be able to talk very directly, and ~share the same 8MB cpu L3 cache. It doesnt bypass the shared pcie bus like Vega SSG, but it could be minimal latency, and enhanced by specialised large block size formatting for;  swapping, workspace, temp files and graphics.

Vega 56/64 of course, have a dedicated HBCC subsystem for such gpu cache extension using nvme arrays. Done right, it promises a pretty good illusion of ~unlimited gpu memory/address space. Cool indeed.

As you see, a  belated post from me. We now have evidence in the perf figures of single ccx zen/vega apuS. Yes, inter ccx interconnects have dragged Ryzen ~IPC down.


----------



## TheGuruStud (Mar 26, 2018)

Shirley Marquez said:


> The cache design of Ryzen 7 suggests that an even better way to handle it would be to schedule it as a two socket system, each of which is a 4c 8t CPU. The L3 cache is divided into two parts, and performance is much worse if a core on side A needs data from side B or vice versa.



Devs probably won't have a choice. It's only a matter of time before intel announces their copy of Ryzen.


----------

