The main future suggestion I'd make is an open bench, custom loop test with a single 1R DIMM departs from common use.
Your saying move away from a single DIMM setup? This was done with a single DIMM to keep the variables limited.
More on application, more useful to most readers data seems likely to come from testing in a typical ATX case (Lancool 207, 216, II, North, Torrent Compact, something like that) with denser DIMM configs such as 2x48 and 4x48 GB. Other variables I see regularly come up for DDR thermals are
1. CPU cooler: AIO with fanless block, AIO with a block fan, setback dual tower (Fuma 3, Royal Knight 120) or single tower with DIMMs in front of the fan, and dual tower with DIMMs under the front fan.
2. Crossflow: top intake fan cooling (if it's not a top exhaust AIO config) and potential for GPU passthrough heating.
3. Lighting: RGB on temperatures versus off versus the non-RGB version of the same DIMMs.
Good ideas. #3 is the easiest. Case airflow is a complicated one though. Like I've gotten my memory to error out just with a Nvidia FE card before because it blows directly onto the memory.
I like the CL and tREFI testing but it seems unclear from the current text what active cooler was used and what tRFC was set to. A related difficulty's stress tools (including also y-cruncher FFTv4 and Prime95 long) lack a benchmark component. So a common miss is all this work we do for stability and thermals rarely gets tied back to the question of whether it's actually worth it functionally as opposed to just for highmarking numbers. IMO y-cruncher timings or other memory intensive benches would be good data towards articulating a value proposition for CL24, 65+k tREFI, and such.
Active cooling is just a fan - will update to mention that. I also don't see the point in using y-cruncher or prime95 over a strictly memory stress test. It yet another factor introduced by putting the CPU into the mix. It can also be offset by just lowering the CPU frequency, negating the "stress" if would add.
The tests in the article were designed / setup to explore the characteristics of the memory itself, not the platform it is used with. Partially why a lower frequency was primary used. Not pushing the limits of the IMC so if errors came out, it was a likely memory related problem. Still lots of things that can be explored like all the other secondaries. That is know changes based on the CPU and motherboard.
I'm not set up to probe low CL but FWIW it's been my experience extending DDR5's default 3.9 μs tREFI has little effect on real world compute throughput once tRFC's tightened. I'm looking mainly at runtime shifts in working apps that max out dual channel DDR for like eight hours solid. But y-cruncher picks up on this too.
This is interesting. I've pushed M-die tRFC to values low enough I've backed it off after black screens and OCCT errors but not to a clear breaking point and not in single variable testing where instability could be unambiguously attributed to tRFC. I need the rig up for probably somewhere in the range of 42-56 hours of compute this weekend but will try leaning on tRFC more if some slack time opens up.
I was at tRFC2 376 tRFCSB 270 for DDR5-5600. Could not trigger a error even at 1.6v. didn't seem to matter if it was 1.25v or higher, that was the lowest it would go to boot. Changing it in windows below this would instantly BSOD or freeze outright.
Still haven't fully explored other factors. But knowing lowest tRFC is tied to frequency, it can still be played with. Higher CAS needs less voltage and to extent frequency x CAS are linked together.
So inclusive. All I found out is at 376-270, that is the lowest it could be stable at for 5600 regardless of the voltage and corresponding CAS linked to the voltage. For two different DIMMs using this specific SK Hynix A die. Larger sample is needed to narrow down if this is abnormal.
UPDATE:
I had some nice in person feedback from a Data Analyst. He pointed out the names of my graphs are incorrectly labeled because its not titled based on X&Y. This does not affect the data and can still be read as is.
Secondly it was assumed that when the graph flatline at the end, this was understood that is was showing equilibrium, ie it will not rise further in temperature due to the thermal dissipation from heat spreader out performing the thermal output of the memory.
Starting temperature is not the reason why one frequency or voltage ends up above or below another. To prove this I will need to make another chart where the temperature starts out at 60+ by using a hair dryer and plot the decline to the same equilibrium as previously shown.
Both will be done after I return from vacation.