# x299 vs x399 vs x470 - A memory conundrum



## tpullo (May 6, 2018)

*tl;dr version*:  Is quad-channel pointless in the current market?

Hey so I'm designing a system to replace an i5-2500k (yes it has solidly lasted till now!), and am finding out that
AMD has really changed the playing field since I was last paying attention (awesome).

My intention was to head to an enthusiast chipset on my next upgrade, specifically for the benefits of quad-channel memory.

*BUT*

None of the benchmarks I'm seeing in the x299 and x399 space are beating LGA 1151 or X470.  That's not wholly surprising when
it comes to single core applications.  What's throwing me off is that there isn't a giant difference in benchmarks on multi-process,
and more modern games.

Are reviewers just not properly utilizing the enthusiast systems?  The general wisdom, in favor of quad-channel memory, back in the day
was that all things being equal:  4 x 1GB will always outperform 2 x 2GB.  More memory loaded onto a single channel would increase access time,
require higher voltage (to achieve equal performance), and potentially create a bottleneck.

All of the reviews I see, instead are focused on the RAM cap (64GB vs 128GB) and extra PCI-E lanes as being the benefit of enthusiast builds, which is not helpful when
they're neither increasing the RAM in system comparison, nor switching to a multi GPU (2 or 3  16x PCi-E SLI or Crossfire, where non-
enthusiast is stuck with 1x16 and 1x8 for multi GPU). 
But back to the RAM...
Is there just too much RAM loaded onto channels now to even see this benefit of smaller loads? (Who in the world would put 128GB into a gaming rig?
32GB is P L E N T Y.  Games haven't come far enough to make 16GB standard, even. Though I think for 3 - 5 year future proofing, they definitely will).

To make matters a little more confusing, is I'm reading that the Ryzen chip design relies heavily on RAM clock speeds to pull off
advertised numbers. The reviews are throwing the same slower RAM they use on dual channel systems, onto quad channel systems when comparing the two.
(Which makes some sense for a comparison, but totally hides the designed performance
gain of HEDT chips ?)


----------



## Vya Domus (May 6, 2018)

I am not sure what your issue with X299 and X399 is in terms of performance. Both are built using server architectures and will perform accordingly. They are designed with many memory channels and PCIe lanes for entirely different purposes as opposed to normal use cases for desktop PCs.



tpullo said:


> What's throwing me off is that there* isn't a giant difference in benchmarks on multi-process,*
> and more modern games.



Not sure what you mean by that , as in outside of games ? If so then that's not true at all.

It seems to me you are actually fixated on gaming and as a result these platforms are definitely not for you in my opinion.


----------



## tpullo (May 6, 2018)

Thanks for your feedback -  my use case for general purpose computing includes multiple VM's, I'm a software Dev. 
But yes, this question is looking through the glass of high performance PC Gaming, and more to the point, as a min-maxer.
And,  at the current price point AMD has driven multi-core to with competition, it makes more sense to compare enthusiast to
non-enthusiast based on performance, than on price.

If I can build a system that fits my needs for multiple use cases, then I only need an enthusiast setup.  If the enthusiast
builds are falling short in game performance, then I'll instead build two boxes that share cooling resources. Hence, my thread.

I need to be convinced that an socket 2066 or TR4 MOBO running two GPU's at 16x with 32GB DDR4 in Quad Channel, will strongly outperform
a socket 1151 or AM4 system running a single GPU  at 16x with 32GB DDR4 in Dual channel.


----------



## cadaveca (May 6, 2018)

tpullo said:


> I need to be convinced that an socket 2066 or TR4 MOBO running two GPU's at 16x with 32GB DDR4 in Quad Channel, will strongly outperform
> a socket 1151 or AM4 system running a single GPU  at 16x with 32GB DDR4 in Dual channel.


Memory is NOT the deciding factor here. It is whether you use the cores offered by these other platforms or not that should dictate this choice.


----------



## tpullo (May 6, 2018)

Thanks. Would you say that holds true when comparing a 2700x chip to the 1900x chip, (both 8 cores) ?

I think this video may have answered my question. AMD's enthusiast chips have two memory controllers which will confuse / slow down legacy games. So they introduced
gaming mode to disable the second memory controller.  I'd still expect newer games that focus on parallel processing (Bannerlord or Star Citizen for example) to outperform non-enthusiast builds
with quad channel.


----------



## therealmeep (May 6, 2018)

Quad channel vs dual channel in pretty much every game ive seen is negligible. I recently sidegraded from a kit of 4x8 DDR4@2400 to a kit of 2x16 DDR4@3000, and even at the same speeds, gaming is the same, I have yet to see any major differences in gaming and day to day tasks. Where quad/hex/octa channel memory suffices is in very intense tasks where the RAM needs to feed the cpu lots of data at once and would fully saturate a dual channel solution’s available bandwidth leading to a bottleneck. Most games do not use enough memory bandwidth to fully saturate a dual channel solution, and as such will not fully take advantage of quad channel.

For your use, the 2700x will probably be a better chip than the 1900x due to the improved memory compatibility/stability as well as slightly better overclocking and possibly power draw vs the 1900x due to it being a zen refresh. I have not seen the use for the 1900x as it is pretty much an 1800x with quad channel support on a different socket with much more expensive mobos. As for multi gpu, unless you are doing something very intense with 2 or more titan V’s, you will not saturate a consumer chip in multi gpu configs. As for multi gpu, sli/cfx are still present, however the performance gains in game are not worth the headaches and for 4k 60/1440p 144hz a 1080ti is enough.


----------



## tpullo (May 6, 2018)

Thanks! ^ First useful answer.  You're definitely right about the power draw - I was comparing the TDP and heat signatures of those two chips.  The multi-GPU setup has been a disappointment, at least in my current build. But it's always been 1 x 16x + 1 x 8x.  I was starting to figure that if 2 x 16x didn't make much gain over something like a single Titan V, then multi-gpu won't make sense until it is natively supported without a bridge (something Vulkan was aiming to do).  I probably will try the 2700x first. If that's not cutting it, then I'll revisit for a second build.


----------



## Vya Domus (May 6, 2018)

tpullo said:


> I think this video may have answered my question. AMD's enthusiast chips have two memory controllers which will confuse / slow down legacy games. So they introduced
> gaming mode to disable the second memory controller.  I'd still expect newer games that focus on parallel processing (Bannerlord or Star Citizen for example) to outperform non-enthusiast builds
> with quad channel.



If that throws you off it might be worth waiting for next generation Threadripper , Ryzen 2000 has improved memory latency and these improvements are likely to carry on.


----------



## therealmeep (May 6, 2018)

+1 On waiting for Threadripper 2, at this point it doesnt make sense to buy a tr chip as tr2 is probably right around the corner. As for bandwidth, I am currently running my 1080ti at 8x due to the slot layout of my x99 sli with my 28 lane 6800k, due to the gpu and cpu being so close to each other in my loop, and i have yet to see any performance degradation vs my gpu in 16x.


----------



## Woomack (May 7, 2018)

Vya Domus said:


> If that throws you off it might be worth waiting for next generation Threadripper , Ryzen 2000 has improved memory latency and these improvements are likely to carry on.



AFAIK there is improved L2 cache latency. Memory controller in 2nd gen of Ryzen is the same and memory latency is not really improved. It doesn't change fact that depends on calculations, faster cache will give up to couple of % performance gain.

Considering TPU news, TR2 will be in August '18 so I guess that wide availability will be about 1 month later.

Most games need fast access time more than much higher bandwidth. Most games also have problems to utilize more than 4 cores. It's hard to find games that will use more cores and keep CPU at 100% load.


----------



## Vya Domus (May 7, 2018)

Woomack said:


> AFAIK there is improved L2 cache latency. Memory controller in 2nd gen of Ryzen is the same and memory latency is not really improved. It doesn't change fact that depends on calculations, faster cache will give up to couple of % performance gain.



There is definitely improvement to memory latency.




Regardless , memory latency , cache latency/hit rate/prefetching , all these things are transparent to software and they result in improved memory I/O in general.



Woomack said:


> Most games also have problems to utilize more than 4 cores.



Games have absolutely no problem with running across multiple threads. Game engines are very well threaded actually and have been like that for quite a while. These are common misconceptions , game aren't bound to 2 , 4 , 6 or whatever number of cores.

Crysis 3 for example makes use of every thread you throw at it , but the overhead caused by having inevitable interdependencies between processing threads becomes more and more apparent as you scale up the core count. Just because you don't see a linear increase in performance that doesn't mean said software isn't making use of the threads/cores available.


----------



## newtekie1 (May 7, 2018)

The simple fact is that almost all the work done on a normal consumer desktop, including gaming, do not really benefit from an increase in memory bandwidth.


----------



## Woomack (May 7, 2018)

Vya Domus said:


> There is definitely improvement to memory latency.
> View attachment 100705
> Regardless , memory latency , cache latency/hit rate/prefetching , all these things are transparent to software and they result in improved memory I/O in general.
> 
> ...




With your results, you only proved there is almost no difference. Ryzen 2600/2700X is 2nd gen, Ryzen 2200/2400G is 1st gen, results are about the same. No idea why 1600X has worse results, maybe was used different memory from earlier tests at different timings. On Guru3D results on different test rigs are often in the same comparisons so I wouldn't be surprised.

Load balance makes it use all cores but all are usually loaded to 20-50%, depends on the game. Games like Civilization VI can use all cores, 10+ too and I mean like fully load the CPU for some time. Most other games will keep the load at 20-50% max what means that max FPS on 4, 6 and 20 cores will be the same in most titles.
It's changing and we will see more games that can use higher amount of cores. Right now for gaming 6 cores are enough.


----------



## cucker tarlson (May 7, 2018)

Question no.1 - what do you do on your machine and what is your priority.
Question no.2 - were you managing to do your work on 2500k fine ?

If 2500k did okay up until now, there's no need for a HEDT system. Get a 8600K with Z370, 16 gigs of 3000 CL15 RAM, some nice $40 air coller like TR Macho and overclock it to 4.7GHz on all cores.
Look at TPU review. 8600K is actually faster than 8c/16t 1800X in utility work (images,video,files etc.) once it's overclocked. I'm in favour of upgrading your rig with best bang for the buck parts once every 2-3 years than builiding a HEDT system for 6 years. The amount of money you have to sink in your HEDT build is crazy compared to what it actually delivers in typical workloads. 7820X will do your task 1.25x faster on average than 8600K. How much more does it cost ?

$560 for 8600k+z370 sli plus+16 gigs of dual channel ripjaws V 3000 cl15 + tr macho direct
$880 for 7820x+x299 extreme 4 + 16 gigs of quad channel 2133 memory + macho x2

That's $320 extra cost (60%) for 25% performance in typical scenario. With that you can get a 1TB nvme drive, which you will certainly appreciate more in your everyday work. And 8600K is gonna overclock better and outperform it in games.


----------



## InVasMani (May 7, 2018)

On quad channel couldn't you cut the DRAM frequency and CAS in half to reduce bandwidth, but tighten up and lower latency if it translates well in the real world that would put Threadripper 1950X memory latency ns at 44.25ns rather than 88.5ns. I'm not sure how well that works or not in practice, but I'd think you could trade bandwidth for latency no problem, but by just how much I'm uncertain.


----------



## eidairaman1 (May 8, 2018)

@cadaveca @sneekypeet, y'all care to explain the true latency and bandwidth of ram?


----------



## sneekypeet (May 8, 2018)

eidairaman1 said:


> @cadaveca @sneekypeet, y'all care to explain the true latency and bandwidth of ram?


http://www.crucial.com/usa/en/memory-performance-speed-latency

Pretty much answers it. Basically you start with 1000 and divide that by half the rated speed of the kit ( IE 3200MHz kit, use 1600 for the equation). Using the result of that, multiple by the CAS# of the RAM, and the resulting number is what you compare to other configurations to see which is best. In the end, a 4600MHz C19 kit is slightly better than 3200MHz C14.  8.26 vs 8.75 as an example; as long as I did my math right here, it is late, so I may have screwed something up in translation.


----------



## phanbuey (May 8, 2018)

you take cas * 2000 / dram speed

so 14 * 2000 / 3200 = 8.75

There are other sub timings (like TRFC) that make a huge latency difference.  But you really have to experiment and overclock each set and each IMC type differently since the lowest aida latency may not equal the best performance (esp in games).  Some weird glitching/ stuttering can start happening if your timings are off.

Also Ive found that I really start seeing massive diminishing returns in true latency as you get below 3000/3200.  Although it would be sweet to see what a 3200 c13 overclock would look like, it would be pretty difficult to get something like that stable.


----------



## VasDrakken (Jun 2, 2018)

ryzom at the high end is designed to work with larger data sets of the radon memory group design concepts. basically they are designing the cpus to be able to take information to and from the 1TB workstation cards so in some tasks they can can pull data really fast. The Intel setup is designed around solving math equations as fast as possible. Pairing a i9 core x with a raedon discrete workstation card or nvidia titan or gp100 gives you the fastest system possible. So you look at the parts you are working with the buget you have and scale down to what you want to spend over what you need.

Meaning in a gaming system is sound important? Is eye candy important?
Black Desert Online uses 32 GB of system ram if you have a 12GB of video ram in use. If you only have a video card with a 1 GB of ram and medium to low amounts of textures and swap memory in use you might use less than 8GB of ram.  That said I remember why quad channel was better and 4x1GB was never better than 2x 2GB, what was better was having the ability to feed four threads independantly of each other than only having two threads that could memory at a time.

Both AMD and Intel treat the memory stack as a jenga stack. they can pull a piece out to write on it but the stack is writting where in the memory stack the memory is to the top or the first megabyte of information on the memory array. So a memory system that builds the memory sticks as their own stacks has to have multiply copies of all the data and you may as well only have half the memory. What the memory controllers do is dual channel means that the memory stick is divided into two parts an upper stack and a lower and some percentage of memory chips are set aside to hold the arrays that tell the memory controller on the cpu what is in each section so if the cpu does not know what is stored where because the program does not queary the os first the memory controller can pull the stack of each array and return what is there. That should result in a gobal protection fault and likely the system crashes but when the operating system uses the arrayes to know where things are it knows that program a is writing to stack 02 and to memory section 00 through 07 and the threads make only calls to that channel and operate faster. That requires the code to be pure 64 bit code, as the code before 64 bit code is going to ask the operating system where the memory is and randomly write to any part of the memory as long as when it asks the operating system if the bit is zero it writes there. Your video games are trying to get rid of all that old code because of the wasted time debugging it. So I would stick to Ryzen and skylake/kaby lake and later chips. They are designed better.

As to how much impact quad channel verse dual channel in a propperly coded program look to *SPECwpc V2.0 benchmark *shows the impact the hardware can have. My current rig uses 32GB of ram and other than one program being designed around hardware two years from now, it is enough for video games. Using even 4 channel 1151 chips on kaby lake most video games today meaning the ones you already have use about 4GB to 22GB. I have a 12GB titan and that means I have to have 4GB for the operating system, 12GB for the videocard to swap texture memory to and data to the cpu, and how much I need for programs running. Now suppose you are running world of warcraft and using ten gigabytes due to the draw distance. The low end is about eight hundred megabyte foot print but if you have the draw distance set as far as I do, I can see from dalaran broken isles to high mountain tribes in broken shores, not everything is drawn at that distance but far more than the low end is in memory. It's eye candy and if you don't have flying, pointless. But that is ten gigabytes in memory at least. now 4GB os + 12GB video swap + 10GB = 26GB, which is still less than 32GB. Now if you run a browser with too many windows open looking up something, chatting in discus, and running movie in the background suddenly you have a videogame rig being a home entertainment system using more than 32GB of system memory. Now suppose we have a 16GB videocame memory now we need 32GB for the video swap section of system memory. If you read the programming books you would know that for optimal perforamcne you should have forty five percent of memery free after than the videoswap that should be time and half again. So 16GB time 1.5 means 16 + 16 + 8 or 40GB of memory to have the data move from the gpu to the cpu. That is a video game system still.

I use my videogame system for my home entertainment system and for aritecural work, so designing a high rise in revit I am looking at what I can put in there and have the system not over heat and still be stable. But that is were the system diverges from an entertainment system.

So my costing out of my system is based on more than one use. So if you stop at the hifi theather system, that can play bluray in 7.2.2 from the on board sound even if I end up with a processor and rack mount system instead of case, you find that you figure out what games you play what type of eye candy you want to see. So while most people are willing to spend a hundred dollars on a case and another couple hundred on a psu, I am looking at sliding rackmounts from mid atlantic to build the system in and have it built into a desk or closet.

So if your goal is fast 32GB system kaby lack or Ryzen should be fast enough for todays games.

I would look at what types of things you use it for. My entertainment center is controlled by a PC system. So using likes like seas excel drivers in hard wood cabinets (yes I know MDF is supposed to be supier but I really prefer not to use it when I can from having too many nice desks start to pit that were mdf). 

But looking a normal pc 1r8 or 2r8 (doublesided memory sticks) at 16GB system memory and 32GB system memory,
with a titan x or radon card, and even a slower cpu that can be upgraded if it is not fast enough.

I know Intel has two kaby lake chips in the 2066 Socket now because the motherboards got so expensive people are seeing sticker shock at upgrading the cpu, memory and motherboard all at once. I just bought a gigabyte board as the 10 nm are all being sold at high cost to the space systems for geospatial math where that might save someone's life so buyign into a skylake now and when the 10nm upgrade my workstation system. But for most entertainment consumption uses you might consider the high end x299 mother boards for the features on them and even as low the i7 or i5 chip that looks like are made to get people past the cost of the mother board. Just remember lga2066 is being obsoleted next year or so by the wider pin grid.


----------



## eidairaman1 (Jun 2, 2018)

You might want to shorten that up.


----------



## hat (Jun 2, 2018)

Hell of a first post though... 

In short, quad channel doesn't matter for gaming. It only matters if you're doing a lot of seriously RAM intensive stuff... perhaps like what @xkm1948 does. I once saw someone say something along the lines of even single channel these days is sufficient for gaming, think that was @newtekie1 . Lunacy that would be in the days of DDR, but with high speed DDR4... maybe not.


----------



## VasDrakken (Jun 2, 2018)

humm short form the memory controller can only put things in memory or get things from memory with each channel per hertz.

 meaning that if everything it needs can be pulled using a get statement in one hertz no matter how many cycles that is all you need is one channel. if you have more to to pull into the cpu or into the video card to parse for rendering to screen than you get with one get statement you need more than one channel per hetz. Which is why dual channel is the least you get so that if the cpu gets stuck in loop the operating system can still access what is in ram.

Black desert online benifits from more channels because the data between the cpu and gpu is really heavy with draw calls, partical effects, and water effects. But like hat said for _most _gaming you do not need quad channel. most being my quanitifer because some really high polygon, with lots of effects, start to pull more physics off the cpu which is better at some things like collision math and some ncloth simulation, but then again I also recomended a kaby lake which is dual channel only processor. I like that with the two extra kaby lake processors there is cheaper way to build the systems and get them running even if you end up buying the quad channel processor later on. I am sure if I look into AMD has a simalar system because the work station group got to take over the entire companyʻs design process. They had a really cool presentation in orlando, it was spin but there was also acutal information, when AMD turned the design over to the work station group so that the work station quality was expected from the whole company. The gist I got from the presentation they did for aritectures and MEP designers was that the engineers knew what they were doing but no one was listening to them, except in the workstation card section where price is the last factor considered. Ryzen is the first design since 2017 I think so that would be that impact of that shuffling and I was honestly wondering if we would ever see a tbird type cpu ever again.


----------



## InVasMani (Jun 6, 2018)

As you increase memory channels you basically have got either half the latency for the same bandwidth or double the bandwidth. You can easily tighten latency on higher frequency ram kits so you can match a dual channel's bandwidth with lower latency. Basically your true latency improves a lot with more memory channels. That kind of reminds me of bank groups with DDR4 something that was added that DDR3 and earlier lacked and borrowed from GDDR basically. What it does is allows faster burst accesses. Basically it allows a smaller prefetch size, but the performance of a larger one. 
https://www.micron.com/products/dram/ddr3-to-ddr4
https://www.synopsys.com/designware-ip/technical-bulletin/ddr4-bank-groups.html

Valid point on price being a factor as much as I'd love for quad channel or even triple channel to become the new baseline rather than dual channel. I think that will be a increasingly bigger concern perhaps with the rapid core/thread count increase we're seeing as of late due to better competition. If this trend holds up even at half or quarter the rate of the GHz war between AMD/Intel we should see a substantial productivity boost and more emphasis on better memory channel support to keep pace with those needs. I really love what AMD is doing with storeMI to I have to say and that would defiantly benefit from more memory channels as would APU design's. TPU or someone on it on the new 2nd Gen threadripper announcement made mention of possibly 8 channel memory with a new chipset for that would be incredible especially if it also worked on the older 1st Gen threadripper CPU's.


----------

