Thursday, November 21st 2019
AMD Admits "Stars" in Ryzen Master Don't Correspond to CPPC2 Preferred Cores
AMD in a blog post earlier today explained that there is no 1:1 correlation between the "best core" grading system displayed in Ryzen Master, and the "preferred cores" addressed by the Windows 10 Scheduler using CPPC2 (Collaborative Power and Performance Control 2). Deployed through BIOS and AMD chipset drivers, CPPC2 forms a middleware between OS and processor, communicating the system's performance demands at a high frequency of 1 ms (Microsoft's default speed for reporting performance states to processors is 15 ms). Ryzen Master, on the other hand, has had the ability to reveal the "best" cores in a Ryzen processor by ranking them across the package, on a CCD (die), and within a CCX. The best core in a CCX is typically marked with a "star" symbol on the software's UI. The fastest core on the package gets a gold star. Dots denote second fastest cores in a CCX.
Over the past couple of months we've posted several investigative reports by our Ryzen memory overclocking guru Yuri "1usmus" Bubly, and a recurring theme with our articles has been to highlight the discrepancy between the highest performing cores as tested by us not corresponding to those highlighted in Ryzen Master. Our definition of "highest performing cores" has been one that's able to reach and sustain the highest boost states, and has the best electrical properties. AMD elaborates that the CPPC2 works independently from the SMU API Ryzen Master uses, and the best cores mapped by Ryzen Master shouldn't correspond with preferred cores reported by CPPC2 to the OS scheduler, so it could send more workload to these cores, benefiting from their higher boosting headroom.The "best cores" as defined by SMU and reported by Ryzen Master are hence decided on the basis of electrical properties, and hard-coded at the time of die binning in the factory. The "preferred cores" as defined by CPPC2 are those cores to which AMD wants the OS scheduler to send the most traffic to, not just on the basis of their superior physical or electrical properties, but also being optimal for Windows scheduler core rotation policy. Windows scheduler is programmed to not keep a long application work thread allocated to a particular core indefinitely, but to periodically rotate it between a pair of two cores. The rationale behind this is thermal management (spreading the heat across two cores that are spatially apart).
On monolithic multi-core chips such as the i9-9900 or i9-9980XE, in which all cores not only sit on the same die, but are also part of the same group (no CCX here), core rotation works as intended, as all cores share the L3 cache, and a relieving core can pick up work from where its rotation pair partner has left off, by pulling data from the L3 cache.
AMD's "Zen" multi-core topology complicates this, as not all cores share the same L3 cache; and in 12-core, 16-core, or Threadrippers, not all cores sit on the same die. This is where CPPC2 fits in, giving Windows the awareness of the topology it needs, so it can rotate threads among cores without hurting performance by forcing workloads onto a core that uses a separate instance of cache, which forces data reloads from RAM. So how does CPPC2-reported "favored cores" fit into the scheme of things? CPPC2 deliberately misreports "favored cores" to the Windows scheduler — to build core rotation pairs within localized groups of cores, rather than picking cores from different CCXs or CCDs to build rotation pairs.
"Ryzen Master, using firmware readings, selects the single best voltage/frequency curve in the entire processor from the perspective of overclocking. When you see the gold star, it means that is the one core with the best overclocking potential. As we explained during the launch of 2nd Gen Ryzen, we thought that this could be useful for people trying for frequency records on Ryzen," reads the AMD blog on the discrepancy between Ryzen Master "best cores" and CPPC2 Preferred Cores. "Overall, it's clear that the OS-Hardware relationship is getting more complex every day. In 2018, we imagined that the starred cores would be useful for extreme overclockers. In 2019, we see that this is simply being conflated with a much more sophisticated set of OS decisions, and there's not enough room for nuance and context to make that clear. That's why we're going to bring Ryzen Master inline with what the OS is doing so everything is visibly in agreement, and the system continues along as-designed with peak performance," it adds. "Best cores" and "preferred cores" are hence both "right." The former refers to a physically high-quality core, while the other is more "circumstantial", for better performance.
Sources:
Reddit, Anandtech
Over the past couple of months we've posted several investigative reports by our Ryzen memory overclocking guru Yuri "1usmus" Bubly, and a recurring theme with our articles has been to highlight the discrepancy between the highest performing cores as tested by us not corresponding to those highlighted in Ryzen Master. Our definition of "highest performing cores" has been one that's able to reach and sustain the highest boost states, and has the best electrical properties. AMD elaborates that the CPPC2 works independently from the SMU API Ryzen Master uses, and the best cores mapped by Ryzen Master shouldn't correspond with preferred cores reported by CPPC2 to the OS scheduler, so it could send more workload to these cores, benefiting from their higher boosting headroom.The "best cores" as defined by SMU and reported by Ryzen Master are hence decided on the basis of electrical properties, and hard-coded at the time of die binning in the factory. The "preferred cores" as defined by CPPC2 are those cores to which AMD wants the OS scheduler to send the most traffic to, not just on the basis of their superior physical or electrical properties, but also being optimal for Windows scheduler core rotation policy. Windows scheduler is programmed to not keep a long application work thread allocated to a particular core indefinitely, but to periodically rotate it between a pair of two cores. The rationale behind this is thermal management (spreading the heat across two cores that are spatially apart).
On monolithic multi-core chips such as the i9-9900 or i9-9980XE, in which all cores not only sit on the same die, but are also part of the same group (no CCX here), core rotation works as intended, as all cores share the L3 cache, and a relieving core can pick up work from where its rotation pair partner has left off, by pulling data from the L3 cache.
AMD's "Zen" multi-core topology complicates this, as not all cores share the same L3 cache; and in 12-core, 16-core, or Threadrippers, not all cores sit on the same die. This is where CPPC2 fits in, giving Windows the awareness of the topology it needs, so it can rotate threads among cores without hurting performance by forcing workloads onto a core that uses a separate instance of cache, which forces data reloads from RAM. So how does CPPC2-reported "favored cores" fit into the scheme of things? CPPC2 deliberately misreports "favored cores" to the Windows scheduler — to build core rotation pairs within localized groups of cores, rather than picking cores from different CCXs or CCDs to build rotation pairs.
"Ryzen Master, using firmware readings, selects the single best voltage/frequency curve in the entire processor from the perspective of overclocking. When you see the gold star, it means that is the one core with the best overclocking potential. As we explained during the launch of 2nd Gen Ryzen, we thought that this could be useful for people trying for frequency records on Ryzen," reads the AMD blog on the discrepancy between Ryzen Master "best cores" and CPPC2 Preferred Cores. "Overall, it's clear that the OS-Hardware relationship is getting more complex every day. In 2018, we imagined that the starred cores would be useful for extreme overclockers. In 2019, we see that this is simply being conflated with a much more sophisticated set of OS decisions, and there's not enough room for nuance and context to make that clear. That's why we're going to bring Ryzen Master inline with what the OS is doing so everything is visibly in agreement, and the system continues along as-designed with peak performance," it adds. "Best cores" and "preferred cores" are hence both "right." The former refers to a physically high-quality core, while the other is more "circumstantial", for better performance.
67 Comments on AMD Admits "Stars" in Ryzen Master Don't Correspond to CPPC2 Preferred Cores
Preferred core is what AMD picks/exposes through CPPC.
It's really AMD disagreeing with themselves.
These are undisputed facts which anyone can check:
- AMD lists in Ryzen Master the best cores per CPU and per CCX based on their properties at the factory
- All OS schedulers don't require 2 cores for single thread load
- AMD doesn't require 2 cores for single threaded load (nor does Intel or any other CPU manufacturer)
- Microsoft requires 2 cores for single thread load
And somehow, it's AMDs "fault" when they report those best cores per CPU and per CCX based on their properties at the factory instead of reporting the best 2 cores within same CCX as "best cores" because Windows wants 2 cores for 1 thread?
However, that doesn't change the fact that what Ryzen Master currently shows is correct for what it's saying it's showing, they are the best cores just like it says. In Windows that's just not relevant for anything but finding highest possible frequencies, aka HC overclockers looking for record frequencies which is done on just one core. AMD never claimed they're showing anything but the potentially highest clocking cores based on CPUs physical properties.
If you'd use OS whose scheduler wants just that one core for one thread, what Ryzen Master currently shows would be optimal (and next 3 threads would be assigned to same CCX as the best one even when they're not the 2nd, 3rd and 4th best cores on the CPU, because spilling to another CCX hurts performance)
Just wondering, because in my CPU the best core is located on CCX1, but Windows chooses to fully load CCX0 first and only then it loads CCX1, so may be it does so because not to load CCX0 first and to load CCX1 instead will cause some performance penalty?..
edit: fixed typos (not going to promise there isn't still some in there)
1 thread - the fastest core on CCX0
2 threads - the fastest and the second fastest core on CCX0
3 threads - all the cores on CCX0
4 threads - all the cores on CCX0 and the golden star core which is on CCX1