Online discussions about system performance and memory timing adjustments often spark debates over the best software for the job. While the goal of a memory stress test is to assess system stability, each software uses different methods to test this. As a result, the true 'best' software for this task remains a matter of debate. However, one thing that can be clarified is which program generates the highest thermal load. The answer is that, aside from AIDA64 being an outlier among the four, the rest perform similarly, with TestMem5 showing measurable and noticeable temperature changes in the graph depending on the specific test being run.
Active Cooling Required
In the pursuit of optimal performance, one of the most common adjustments is to the CAS (Column Address Strobe) value, often referred to as the CL value. A lower CAS value directly affects overall latency as well. Thus low is better, but also not the only factor in memory latency. However, finding the lowest value for a specific set voltage requires either knowing which memory manufacturer is used and IC revision that goes along with it or using the trial-and-error method. In either case, achieving the highest performance typically demands more voltage, which in turn raises DRAM temperatures. Eventually, this will necessitate active cooling or result in errors if the temperature exceeds the thermal threshold for stability. For example in this test, setting the CAS value to 24 required 1.50 V to pass the memory stress test, but would error out if the temperature exceeded 43°C.
tREFi Thermals
To further understand what this next graph represents, and subsequently these tREFi (Refresh Interval) results, we must first understand what tRFC (Row Refresh Cycle Time) is and how these two are connected. In a basic description, tRFC value set determines in clocks cycles the time that must elapse during a refresh cycle. Essentially, the DRAM cannot be accessed while the refresh of memory cells is ongoing. In short, the higher these values are, the "longer" in time (clock cycles) the memory is inaccessible to the system for use. This interrupt must happen due to the physical properties of the memory cells. Also known as volatile storage, DRAM will lose all data in the event of power loss. If each cell is not periodically refreshed, data bits are corrupted. To ensure that data remains intact, the refresh cycle must be performed. A lower tRFC value results in less time spent during the refresh, but if you set the value too low, the memory cells do not have enough time to fully refresh, resulting in data loss.
tREFi is the counterpart to tRFC, with the value set determining how many clock cycles must pass before the next memory cell refresh occurs. Since memory cells are sensitive to temperature, a longer wait time between refreshes increases the likelihood of data corruption. In the test above, tREFi values of 65,535, 131,070, and 262,143 were used without active cooling to observe the temperatures at which errors typically occur.
With active cooling, none of the tests failed for DDR5-5600, as the temperatures remained below the error zone threshold. Knowing that a higher frequency has a higher operational temperature, passive cooling would not be possible for DDR5-8000 even at 65K. During testing, it was found tREFi is partially dependent on voltage but not for tRFC. In fact, tRFC tests were also conducted, but the results were inconclusive. The tRFC values entered either allowed the system to boot or it did not. There was no in between. Temperatures had no impact on the results under ambient cooling, nor did they cause the stress test to fail like tREFi. With a limited sample size, it is unclear whether this behavior is exclusive to SK Hynix Rev A-Die, a flawed testing methodology or is an expected outcome.