Maximum Safe Voltage
The maximum safe voltage for a given processor has been an eternal mystery for users since none of the manufacturers publish this information for public viewing. Sometimes, they even market a 95 W TDP for a processor running at 5 GHz or forget to set a package power limit. Documents that are not under NDA usually indicate an indefinite limit, which in most cases leads to a point where catastrophic failures become more common. Data on voltages that are safe to use 24/7 without causing any harm to the processor have never been published.
This limit is quite difficult to determine since it varies between different CPU samples (silicon dispersion, SIDD), the cores in each CCX, and workloads (peak current for a certain number of cores, temperature, and so on).
To get the most accurate answer to the question about the limit, I decided to measure it myself, on processors based on the Zen+ and Zen 2 architecture. The results show that full reliability for AMD Ryzen Threadripper 2990WX and other 12 nm processors is at 1.33 V at maximum current and 1.425 V at minimum current. For processors using the Zen 2 architecture, the results were 1.325 V with all cores loaded and 1.419 V with just a single core active.
For higher voltages, the FIT allows for a variant of 1.380 V / 1.487 V, but this may lead to a reduction in processor life, or degradation.
Just to clarify, all the voltages mentioned in this article refer to the actual effective voltage, not to the voltage requested by the CPU (VID). During operation, the processor sends a VID request to the voltage regulation circuitry, "give me this voltage". Things like LLC will change the processor voltage request, which will result in an effective voltage that's different from the VID.
The most accurate (software) method for measuring that voltage on the AM4 platform is monitoring "CPU SVI2 TFN," which is available in HWInfo. This value is the most accurate reading from among those available to end users, but is of course subject to tolerances of the monitoring circuitry. To maximize accuracy, never blindly trust the current and power readings that are monitored—each motherboard needs separate calibration.
Energy Efficiency
Energy efficiency testing has always been an important characteristic for any silicon product. My testing methodology is quite simple and consists of the following: measure the minimum voltage and power consumption for each frequency using the LinX 0.7.0 test software. I used steps of 50 MHz between 3.50 and 4.20 GHz. The DRAM frequency was fixed at 3200 MHz (Threadripper 2990WX) and 3733 MHz (Threadripper 3960X).
Just like on "Colfax" (second-generation Threadripper), the margin for overclocking the high-performance Castle Peak (third-generation Threadripper) is extremely limited. Critical points for Colfax are present at 3650 and 3900 MHz; for Castle Peak, they sit at 4050 MHz.
By critical points I mean tipping points at which there is a need to increase the voltage significantly to pass stress tests. The most energy efficient frequency for Zen+ is the range of 3.6 to 3.8 GHz. For Zen 2 it is 3.70 to 4.05 GHz. With reasonable voltage, up to 4.05 GHz can be achieved for Zen+ and up to 4.15 GHz for Zen 2.
For Threadripper 2990WX, the result in LinX is 437 GFLOP/s, and for Threadripper 3960X, 1160 GFLOP/s. Surprisingly, when running at 3.5 GHz, the results in LinX 0.7.0 did not change for both subjects. This may indicate the presence of a factory AVX offset, which the user cannot influence.
Looking at these results, we can say the new Threadrippers are a highly energy-efficient solution which significantly surpassed the previous generation.
I also tested the relationship between the processor's frequency and its operating temperature. You can notice a significant frequency difference due to temperature. You're roughly losing 5 MHz for every 1°C higher temperature, in a mostly linear fashion.
This is another reason why users who purchased Zen 2 HEDT processors should spend some time thinking about the quality of the cooling system.
Intelligent Overclocking (iOC)
If you're planning on overclocking by setting a fixed frequency through the multiplier, I recommend you forget about it. Overclocking with that method will in most cases lead to a decrease in performance in single-threaded mode and significantly lower energy efficiency of the processor since each CCX and each core have different SIDD characteristics.
The solution? Overclocking via CCX, which I call "Intelligent Overclocking iOC." The CCX now has a new capability that became available with the Zen 2 architecture, which allows you to set the frequency individually for each CCX. Let's start by building an individual CCX quality card by testing each CCX at a specific frequency while selecting the minimum stable voltage for the successful completion of LinX or Prime95. Ryzen Master in Creator mode or the BIOS can be used for this.
In order to facilitate the search for the best CCX, we will use the
ACPI CPP2 table report in Windows. "Maximum Performance Percentage" displays the hardware core quality labels TSMC set at the factory. As a starting point, I chose a voltage of 1.35 V and 4400 MHz. After spending several hours with testing, the following results were determined:
The best CCX is number 6, and the worst is 7. I also noticed that each CCX sometimes has a mix of low and high SIDD cores. A good example is CCX number 7, which has a core capable of running 4400 MHz at 1.225 V, but its other cores required as much as 1.3125 V. These results also indicate that AMD does not bin the dies for HEDT, at least in my case. There were three CCX just as good as Ryzen 5 3600, and one was good for Ryzen 7 3800X.
As you can see, the core tags in Ryzen Master don't reflect actual core capability—neither do the ACPI core-ranking values. One good thing is that in single-threaded mode, the best cores were indeed loaded correctly, which is great news for those who like to pop in older games between work breaks.
I also wondered what would happen if we only used the five best cores. The result did not surprise me. The best threads (according to ACPI) were used, while one of the threads ended up on the "worst" core. Threads did not migrate between cores.
Another controversial issue is the lack of sleep (CC6 mode) for some cores during inactivity. These idle cores not getting shut off will eat into the precious PPT/TDC/EDC limit, which results in around a 25–75 MHz lower clock overall.
Threadripper launched three months after the Zen 2 Ryzen was announced. This additional time had no effect on the quality of the silicon. The delta between the best and worst cores still reaches 10.8%, which is a reasonably large value for the first iteration of the 7 nanometer production process.
After compiling the core-quality map, I started overclocking each CCX individually. I selected 1.35 V as the safe voltage, which resulted in the following frequencies: 4350, 4350, 4350, 4350, 4475, 4475, 4325, and 4350 MHz. The total power consumption of the processor was 337 W. This increased performance in Cinebench R20 from 13987 to 15035 points (or +7.5%).
For this sample, this is an adequate maximum with an adequate arithmetic performance/power ratio, but you can go further. With high-quality custom cooling, you can squeeze another 75 to 100 MHz out of each core, while the power will reach a monstrous 550–600 W. Whether a few percent extra performance is worth doubling the power consumption is up to you.