# Workstation DDR4 memory benchmarks: ECC vs. non-ECC, 16 GB vs. 32 GB, single vs. dual vs. quad channel, overclocked vs. default timings



## Artas1984 (Jul 22, 2019)

It's been quite a while since i posted some of my last benchmarks. It's so nice not only to post random things here in TPU from time to time, but to return with something really BIG. This also marks my 10th anniversary both here in TPU and since i first started bench-marking generally. Grab some pop corn, because these RAM benchmarks will be long and boring... ;D

Since i am sitting on Intel X99, i thought it would be a nice idea to present all the possible memory performance benefits for this platform in different configurations, as most of the ''revelations'' ''revealed'' here will be also ''relevant'' for Intel X299 systems, as well as for X79 systems, as they both support quad channel and ECC memory. For this memory comparison specifically i am venturing in to the world of workstation and productivity tasks. Folks who work with video converting, financial, mathematic and office calculations, compiling, archiving, 3D building, rendering and product & technological design will find their needed memory sweet spot.

*Best performance with the least cash and time investment is the key aspect in any process design, same criteria applies for the high end Intel workstation tasks.

OBJECTIVES*

1. Do you need ECC memory or just standard memory? ECC memory is not needed for workstations, but if the price and performance is adequate, then it is highly recommended as it will increase system stability and make the system more fault proof. I will test the difference in performance between ECC DDR4 and non-ECC DDR4 modules of the same speed, latencies and quantity, and leave the pricing to you.

2. How much benefit would quad channel memory bring over dual channel and single channel? There is an undeniable benefit in increased bandwidth, but not all programs will benefit. Thus the second task is to find out which workstation tasks benefit the most.

3. Quantity of needed memory is the most straightforward question you can imagine. I won't even bother with 8 GB as a starting point and i am sure you will agree. 16 GB vs. 32 GB will be the focus here, and while i also tested 48 GB of RAM, i can tell you straight away, that not a single tested program benefited going from 32 GB quad channel to 48 GB quad channel (1 % margin of error), therefore i won't bother posting 48 GB results just to save time. That does not mean, however, that 48 GB are not needed. Adobe After Effects is the prime example when usage exceeds 32 GB, and there are more examples, but not in my test.

4. Finally, for those who want to use non-ECC memory, the most important thing is overclocking. Aggressive memory timings vs. jedec standard might justify the pricing of faster RAM, and that definitely has to be checked.

My focus for this test is DDR4 ECC memory performance analysis (and comparison with non-ECC DDR4 memory), thus no Core i7 CPU would suffice here. I also did not want to test a CPU with hyper-threading, as virtual cores sometimes mess up thread priorities in some programs. To avoid that, a pure 10 core Haswell-EP Xeon E5-2663 V3 with 25 MB L3 cache (Genuine Intel CPU) will be the workhorse. All 10 cores work at 3.1 GHz in turbo mode. By CPU specification the supported memory is from 1600 MHz to 2133 MHz only, thus a 3000 MHz DDR4 kit, supported by X99 system, will still work at 2133 MHz max! That's why the only way to overclock memory is to lower the memory latencies! I found the multi-threaded performance of this Xeon E5-2663 V3 to be actually identical to the Core i5 8600K overclocked to 5.0 GHz, which i tested before, so you get the picture. Asrock X99 Extreme4 motherboard, BIOS version P3.80.

*TESTED RAM CONFIGURATIONS (all memory is unbuffered and double sided)*

1. Crucial 16 GB SINGLE CHANNEL DDR4 ECC 2133 MHz CL15-15-15-36-300
2. Crucial 16 GB DUAL CHANNEL DDR4 ECC 2133 MHz CL15-15-15-36-300
3. Crucial 32 GB DUAL CHANNEL DDR4 ECC 2133 MHz CL15-15-15-36-300
4. Crucial 32 GB QUAD CHANNEL DDR4 ECC 2133 MHz CL15-15-15-36-300
5. Crucial 48 GB QUAD CHANNEL DDR4 ECC 2133 MHz CL15-15-15-36-300 (will not be included)
6. G.Skill 16 GB DUAL CHANNEL DDR4 non-ECC 2133 MHz CL15-15-15-36-300
7. G.Skill 16 GB DUAL CHANNEL DDR4 non-ECC 2133 MHz CL11-12-12-30-300 overclocked

All the tested programs as well as their temporary output files and product appearance paths are located on an Samsung 860 Pro 1 TB SATA SSD.  Windows 10 v1903 installed. There have been tested synthetic, semi-synthetic and custom programs, including full Passmark 9 CPU and RAM separate tests, and full SPEC Workstation CPU separate tests.

*VIDEO PRESENTATION*









-----------------------------------------------------------------------------------------------------
*Cinebench R20*








The values have been rounded to the tenth, as the results were in the range from 2780 to 2820 points only, which is in the margin of error zone.

*V-ray 4.10.6*






With V-ray we see something interesting. In 5 runs for each tested config, the non-ECC value RAM seems to perform the worst, even though the timings and speed are the same. Some runs were as low as 8100... Even the overclocked non-ECC RAM seems here only catching up to DDR4 ECC.

*Luxmark C++*






C++ language compiling seems to benefit from non-ECC RAM with with the fastest timings, but any RAM config will do.

*3DPM 2.1*






A 3D movement algorithm simulation takes no preference in RAM config. Calculated in millions operations per second.

*7-ZIP 19.0 compression at 16 GB RAM allocation*






When it comes to file compression, everything matters - quantity, bandwidth, timings... Calculated in millions instructions per second.

*7-ZIP 19.0 decompression at 16 GB RAM allocation*






But it ain't so true in file decompression. Here, only timings play the most important role. Calculated in millions instructions per second.

*Corona 1.3*






Even single a channel 16 GB DIMM does not suffer rendering issues.

*Blender benchmark 1.0 beta2 quick CPU*






And the same goes for Blender. Granted, i've only tested the recently released beta quick CPU test, so i can not guarantee the same result for independent custom demos. However, *and this is important*, the 5 most popular demos: BMW, CLASSROOM, BARCELONA PAVILION, SPLASH, and COSMOS LAUNDROMAT DEMO did not show any significant difference  between 16 GB and 32 GB RAM, only minor. However, i can assure you, that 3DS Max will significantly benefit from 32 GB RAM over 16 GB. This has been confirmed in my previous CPU benchmark, in which i did not talk about memory advantages, but i say it now, so that you take considerations.

*Handbrake 1.2.2*






For this test i am using a custom 8.04 GB, raw, avi, 3820X2160 resolution, video file, that i convert H.265, mkv, 8 bit, CFR, 25 FPS, at 64 Mp/s, at single pass, at auto encoder profile, at slower preset. With that being said, even though 4K conversion fully utilizes multiple cores, there is no difference in memory configs.

*Y-cruncher 0.7.7.9501 11.8 GB RAM allocation Pi numbers multi-threaded *






Well, whatever this Pi calculation deals with, it surely likes bandwidth. Difference between quad and dual channel is not that big, but single channel memory users would suffer greatly. Very interesting is the fact that even the overclocked non-ECC memory has got nothing on ECC memory, and overclocking timings bring very little performance boost. The test is very accurate and multiple repeats show the same result to the second.

*Microsoft Excel Pro Plus 2016*






Custom sheet calculation. All RAM configs actually calculate the bench within 0.5 second difference, with single channel DDR4 ECC RAM at near 51 s, and overclocked non-ECC RAM at near 50 s.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Before going into anything, my Xeon E5-2663 V3 at 3.2 GHz OC in Passmark CPUmark scored 15500 points, which is about the same result for stock Core i7 8700K. This is the reason why the general Passmark score SUCKS, as it absolutely reveals no difference between processors (Core i7 8700K is faster, but how would you know from Passmark?). Dissecting Passmark to separate tests is the only way to find what is what and this applies to memory too.

*Passmark 9 CPU integer math*






Looks ok so far.

*Passmark 9 CPU prime numbers*






These prime numbers have some ECC affinity or what? Whenever i see ECC RAM beating standard RAM i redo the test at least 10 times, but eventually get the same results. Whoever said that ECC RAM are inferior in performance to non-ECC RAM was so so wrong, as this is not the first time we see this bizzare occurrence. Some 20 tests in repeat for each RAM config were done to confirm this.

*Passmark 9 CPU physics*






A major win for quad channel memory and ECC RAM in general. Each retesting gives 5 to 10 % different resutls, but the best scores for each RAM config out of 20+ runs was checked.

*Passmark 9 CPU floating point math*






One of the most accurate tests in Passmark shows variation of less than 0.5 % with each retest.

*Passmark 9 CPU extended instructions*






The weirdest result ever. The 704 points for each ECC RAM config would repeat forever, and then go to like 703 for once... For non-ECC RAM it's the same way with 585 points at max. There is no difference between single and quad channel, no difference in quantity, no difference in memory timings, just this absolutely bizzare radical ECC vs. non-ECC RAM differentiation. Can anybody put some insight WTF is going on here?

*Passmark 9 CPU encryption*






Another stable test.

*Passmark 9 CPU sorting*






Go ahead and ''sort'' how the overclocked RAM lost to value RAM.

*Passmark 9 RAM database operations*






Previously we evaluated how different RAM configs would alter the CPU performance, now we look into the raw RAM performance itself.

*Passmark 9 RAM read cached*






Big performance jump due to improved memory timings in non-ECC RAM is noticed. And yes, single channel seems to be the fastest config???

*Passmark 9 RAM read uncached*






No mater how many times i tried to repeat the 32 GB RAM quad channel result, it would only be around 11000 points, while 32 GB RAM dual channel result would always stay at 12000 points. Perhaps very sensitive to latencies?

*Passmark 9 RAM write*






Writing is another thing. Single channel RAM suffers a huge performance penalty.

*Passmark 9 RAM threaded*






And now we have the absolutely best result for quad channel memory users.

*Passmark 9 RAM latency*






While the primary RAM timings are clearly defined, there is so much more to that. This is the proper test to see how really fast the RAM response is. So even though 32 GB quad channel RAM has the same timings as 32 GB dual channel RAM, quad channel config brings up a slightly increased latency.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

SPEC - standard performance evaluation corporation provides a wide variety of apps f to test the performance of the computer parts simulating the most popular and demanding third party programs in the world. I am using the SPEC Workstation benchmark, which has CPU, GPU and storage tests. I will be using all of the CPU tests, except, Z-ZIP, Handbrake, Blender and Luxmark, as those have been tested independently. This time around i won't comment on anything. These SPEC tests take an incredibly long amount of time to roll out, but they are quite accurate, with the exception of FSI, which i excluded, because  the variations in results were too great.

*SPEC Workstation CalculiX*






*SPEC Workstation WPCcfd*






*SPEC Workstation rodiniaCFD*






*SPEC Workstation lammps*






*SPEC Workstation namd*






*SPEC Workstation rodiniaLifeSci*






*SPEC Workstation Convolution*






*SPEC Workstation WWTF*






*SPEC Workstation Kirchhoff*






*SPEC Workstation poisson*






*SPEC Workstation srmp*






*SPEC Workstation octave*






*SPEC Workstation python36*





----------------------------------------------------------------------------------------------------------------------


As i said earlier, the difference between 32 GB quad channel and 48 GB quad channel was too small to showcase anything.
In many tests DDR4 ECC managed to outperform the standard non-ECC DDR4 RAM. Out of interest i will make additional tests with a different 2X8 GB non-ECC DDR4 kit and will report if anything interesting.
In the future i will make productivity benchmarks 2.0, which will be the follow up to the popular test, done previously. This has been of the most brain-squashing benchmarks i have ever done here in TPU. Please don't demand ''do that and that''. Just be patient.

So now you know what you need. Even if you are still using a Xeon with X79 and DDR3, many things should nicely apply for your system. X299 users will additionally benefit from higher frequency DDR4.


----------



## eidairaman1 (Jul 22, 2019)

Just remember ram channel doubling from single to dual and from dual to quad is maximum theoretical bandwidth, there is resistance, reactance and impedance in all conductors. No Conductor is 100% efficient.


----------



## Flaky (Jul 22, 2019)

I couldn't find what exact kind of memory is being tested.
Are both ECC and nonECC memory UDIMMs?
What motherboard do you use? Do you verify in any way that the error correction is working?

I've seen many people incorrectly say "ECC" when they meant server memory types (registered, load reduced...).
While ECC UDIMMs mostly work on typical PCs (except ECC itself not functional) , registered types don't.


----------



## Artas1984 (Jul 22, 2019)

Flaky said:


> I couldn't find what exact kind of memory is being tested.
> Are both ECC and nonECC memory UDIMMs?
> What motherboard do you use? Do you verify in any way that the error correction is working?
> 
> ...



Actually, great question about ECC. Indeed folks how do you know if ECC is working or not? Because if the CPU does not support ECC, the RAM will work in non-ECC mode. So this the way you check:






These codes imply that ECC is working correctly. First  - numbers 6 imply that it is multi channel ECC. Second -  If DataWidth had been 64 bit instead of 72 bit, it would mean non-ECC.

Also, there was so much stuff to write, that i forgot to fill in some of the details, i've edited that now in the tested ram configurations section. All memory is unbuffered (UDIMM). With that being said, does registered RAM even work on X99? Anyone know? Or is it RAM manufacturer specific and sensitive?


----------



## Flaky (Jul 22, 2019)

Afaik some people managed to make reg ram work on x99, as long as the cpu is a xeon. One example validation here


----------



## Bill_Bright (Jul 22, 2019)

eidairaman1 said:


> Just remember ram channel doubling from single to dual and from dual to quad is maximum theoretical bandwidth


Right! With "theoretical" being the key word there. What we see in theory (and benchmarks) is rarely what we see in the real world. I would rather have more RAM in single channel than less RAM in dual or quad.


----------



## Artas1984 (Jul 30, 2019)

Thank's to sneaky, i've updated the thread with video presentation!

Honestly, i have not indulged myself what do these synthetic SPEC and PASSMARK tests represent in real life. You can see how quad channel RAM at the same size/speed/timings massively outperforms dual channel RAM in SPEC WWTF, SPEC rodiniaLifeSci, SPEC lammps, SPEC WPCcfd and Passmark RAM threaded apps, but what do these tests represent? Perhaps someone knows and someone might actually benefit from this... I feel like most people here would benefit from ''yet another gaming benchmark'', which are plenty in internet media, but honestly, i have not seen anyone do a deep dive into testing RAM for workstation apps like i did here. I myself only cared about Handbrake, and given the results i made the right choice to switch to mainsteam X370 Ryzen platform.


----------



## Artas1984 (Sep 30, 2019)

I've remade some tests with another kit of common DDR4: Kingston Hyperx 2X8 GB 2666 MHz at CL16 down-clocked to 2133 MHz CL15-15-15-36 to match the exact settings of the tested bed.

Amazingly, in the majority of the WS programs it was slightly to notably faster than my G.Skill kit at the same frequency and timings!

In other words 2X8 GB DDR4 2133 MHz CL15-15-15-36 can be faster than another kit of 2X8 GB DDR4 2133 MHz CL-15-15-15-36.

This does explain why my Crucial ECC RAM for whatever reason managed to outperform the non-ECC G.Skill RAM... The key here must be secondary timings, no other explanation is possible. I guess the G.Skill kit, originally targeted at 3600 MHz speed at CL19 just sucks when promped to 2133 MHz CL15, as it retains high secondary timings, which i did not play with, while the Kingston kit had lower secondary timigns by default. I never realized that secondary timings can impact performance so notably.


----------

