Thursday, August 10th 2023
AMD "Strix Point" Company's First Hybrid Processor, 4P+8E ES Surfaces
Beating previous reports that AMD is increasing the CPU core count of its mobile monolithic processors from the present 8-core/16-thread to 12-core/24-thread; we are learning that the next-gen processor from the company, codenamed "Strix Point," will in fact be the company's first hybrid processor. The chip is expected to feature two kinds of CPU cores, with "Zen 5" being the microarchitecture behind the performance cores, and "Zen 5c" behind the efficiency cores. An engineering sample featuring 4 P-cores, and 8 E-cores, surfaced on the web, thanks to Performancedatabases. A HWiNFO screenshot reveals the engineering sample's core-configuration of 4x P-cores and 8x E-cores, with identical L1 cache sizes. Things get a little fuzzy with the L2 cache size detection, and L3 cache.
We know from the current "Zen 4c" core design that it is essentially a compacted version of "Zen 4" designed for higher-density chiplets that have 16 cores; and that it has both the same ISA and IPC as "Zen 4," with the only difference being that "Zen 4c" is designed with lower amounts of shared L3 caches at their disposal, are generally configured with lower clock speeds, and have higher energy efficiency than "Zen 4." "Zen 4c" cores also 35% smaller in die-area than "Zen 4." The company could develop "Zen 5c" CPU cores with similar design goals.The "Strix Point" silicon could hence have two CCX (CPU core complexes); one of which has the larger "Zen 5" P-cores and certain amount of L3 cache, and another CCX with the smaller "Zen 5c" cores, and their own L3 caches. This would essentially be similar to "Renoir," which has two 4-core CCXs of "Zen 2" cores. The L1 cache sizes for both kinds of cores is identical—48 KB L1D and 32 KB L1I, and it's likely that both core types have 1 MB of dedicated L2 caches per core. The L3 cache sizes could vary between the two CCXs, with the P-core CCX having 16 MB (4 MB per core), and the E-core CCX 8 MB (512 KB per core).
It would be interesting to imagine how AMD handles the hybrid architecture from a software standpoint. Intel uses Thread Director, a hardware-based solution that's designed to send the right kind of compute workload to the right kind of CPU core. AMD could either try to develop its own version of Thread Director, or use a less sophisticated OS-based solution such as what it's doing with its multi-CCD client processors.
Sources:
Performancedatabases, IThome, VideoCardz
We know from the current "Zen 4c" core design that it is essentially a compacted version of "Zen 4" designed for higher-density chiplets that have 16 cores; and that it has both the same ISA and IPC as "Zen 4," with the only difference being that "Zen 4c" is designed with lower amounts of shared L3 caches at their disposal, are generally configured with lower clock speeds, and have higher energy efficiency than "Zen 4." "Zen 4c" cores also 35% smaller in die-area than "Zen 4." The company could develop "Zen 5c" CPU cores with similar design goals.The "Strix Point" silicon could hence have two CCX (CPU core complexes); one of which has the larger "Zen 5" P-cores and certain amount of L3 cache, and another CCX with the smaller "Zen 5c" cores, and their own L3 caches. This would essentially be similar to "Renoir," which has two 4-core CCXs of "Zen 2" cores. The L1 cache sizes for both kinds of cores is identical—48 KB L1D and 32 KB L1I, and it's likely that both core types have 1 MB of dedicated L2 caches per core. The L3 cache sizes could vary between the two CCXs, with the P-core CCX having 16 MB (4 MB per core), and the E-core CCX 8 MB (512 KB per core).
It would be interesting to imagine how AMD handles the hybrid architecture from a software standpoint. Intel uses Thread Director, a hardware-based solution that's designed to send the right kind of compute workload to the right kind of CPU core. AMD could either try to develop its own version of Thread Director, or use a less sophisticated OS-based solution such as what it's doing with its multi-CCD client processors.
86 Comments on AMD "Strix Point" Company's First Hybrid Processor, 4P+8E ES Surfaces
It remains to be seen how this will change in 4nm. 4-4.2Ghz is good enough for laptops anyway
.
Going back to Intel for a sec, hardware scheduling could be best, but Thread Director just doesn't seem to work, especially on linux, which is problematic when almost every part is P+E.
Intels thread director was removed in 12th gen, to software
This led to some large latency issues on 12th and 13th gen - it's not present in long running tasks, but the initial choosing of which cores to throw a task on is much, much slower than previous intel hardware
His part two has benchmarks especially real world tasks like searching files in windows where things slow to a crawl as they're shoved onto the E-cores
That whole Nvidia DPC latency issue?
Windows 11 and it's optimisations for intels designs speeds that up fairly well (HAGS, i'm guessing) but it's still inferior to previous intel hardware by a large margin
However the tweaks they did to help the latency affect some tasks, dragging and dropping multiple music files takes a massive penalty in 11 vs 10 on these systems.
Likely because they're shoved to the slower E-cores, rather than the P cores.
Synthetic benchmarks are programmed into the software scheduler to help performance, but doing something like having a 3D rendering program open - not rendering anything but open, means the scheduler prioritises the P cores for that .exe and tasks like moving files around gets shoved to the E-cores, despite the P-cores having performance to spare and being free to the task.
This ends up in a weird situation where you lose the ability to multitask without large performance issues, and the biggest issues are the 'low priority' stuff that users actually deal with in real life, like copying files or importing video/mp3 files.
Didn't notice his part 3 video since the title changed, heres the link
Intel hired Jim Keller, the one who designed AMD's current hardware to fix this in their future designs.
TPU snuck into his video of intel mocking AMD's CCX design as glue, who then used slower glue :p
TL;DR: AMD better hope their design isn't this bad.
TL;DR2: These CPUs are being designed for enterprise work (for both AMD and Intel), and we get the leftovers. It doesn't always suit end users at home and they almost don't care since we aren't the target market - we're the secondary market. Look at GPU's with mining and AI, for example.
The skylakes had some of the lowest latency around (includes all the variants of lakes off that design 7700-10900ks)
Lenovo Legion Slim 5 with AMD ZEN 4 9 7940HS + RTX 4060 in (14", 8) Spectacular what Lenovo has done the miracle in 14" www.techpowerup.com/312027/lenovo-legion-slim-5-begins-shipping-in-august
IDK about your friend, but I have personally used so many laptops, desktops and workstations, and I'm not touching Lenovo even if someone gives it to me for free.
If you want a better advice, tell us what are you going to use it for. This way we can point you better to a proper model for the task.
Skylake had weirdness with faulty timers, so its results may have been false. The bug affected the HEDT Skylake-X as well
The HPET bug: What it is and what it isn't - overclockers.at
Core-to-Core Latency - AMD Zen 4 Ryzen 9 7950X and Ryzen 5 7600X Review: Retaking The High-End (anandtech.com)
Combine that with nvidia drives, power savings plans, scaling core frequencies, DDR5 latency and more feature packed motherboards (AM5/1700) plus windows 11 security features and you can get some astronomical DPC latency. Won't notice it in normal use but it's there.
It seems like there's only X3D chips, and disabling E-cores or running VMs that can get scores like the older chips in terms of latency - alot of its is software and motherboard bios, not so much chip latency but i think it all adds up.