The problem is that pretty much nothing can utilize this performance. Bandwidth has to be utilized, just like cores in a CPU.
Very true. Looking at the graphs to find out which applications scale linearly with bandwidth (4000 vs. 4800), the only ones I can find are Comsol and 7-zip compression. Both very well multithreaded, I suppose. But both also behave in a very weird way: when RAM speed drops from 4800 to 2400, the performance drops to less than one half.
Let me explain to you with an example. in 2013 a 2400mhz cl10 was an average memory speed and dirt cheap(much faster DDR3 memories existed at that time). now in 2021 an average DDR5 memory is around 4800mhz CL40. after 8 years ram speed has increased 100 percent but on the other hand memory latency has increased a whopping amount of 300 percent. it means in action performance can not be that much better. in my opinion that's disappointing.
Like it or not, latency in nanoseconds hasn't improved much since ... ever. New museum-grade modules that you can buy today (much easier to buy than DDR5, hah) are DDR-333 CL 2.5 or DDR-400 CL 3 or SDR-133 CL 2. All of those calculate to 15 ns.
With that in mind, it's a little unfair to say it's increased by 300%. DDR5 is again around 15 ns, your DDR3-2400 CL10 example is 8.3 ns, making DDR5 80% slower. As for DDR4, it becomes really costly once you get to 8.7 ns or below.
Indeed. Shouldn't that in theory yield more performance if implemented properly?
Probably - or it wouldn't be worth the added complexity.
And yet, in a way, DDR5 is twice as slow, or half as fast, at same clock speed. How so? The minimum unit of data transfer between CPU and RAM is
one cache line, which is 64 bytes. This amount of data is moved in:
- 8 transfers, or 4 clock cycles in DDR4, or 2 ns in DDR4-4000, which has a 64-bit channel
- 16 transfers, or 8 clock cycles in DDR5, or 4 ns in DDR5-4000, which has a 32-bit channel (
if implemented properly).
I'm sure that a specific microbenchmark could be devised that could measure a significant difference in favour of DDR4. It would need to have a very bad pattern of memory access, keeping just one 32-bit channel active, while the other one(s) would be idle.