Die-shot Suggests "Phoenix 2" is AMD's First Hybrid Processor

unwind-protect · Sep 12, 2023

Well, this will still require the OS scheduler to be aware of which core is potentially fast and which one has a lower maximum speed limit. Imagine that a heavily multicore load suddenly morphs into a single thread that was slacking off. You want that thread on one of the potentially high clocked cores, but it might sit on a low core right now. So you have to make a decision whether to move it, which is a very difficult decision to make for a scheduler since it doesn't know how much longer this particular situation will last.

In other words, some losses from sub-optimal scheduling are unavoidable if you have faster and slower cores.

BTW, this concept is much older than E-cores on 12th gen Intel. They had Xeon chips with cores with different clock limits way before.

AnotherReader · Sep 12, 2023

unwind-protect said:
Well, this will still require the OS scheduler to be aware of which core is potentially fast and which one has a lower maximum speed limit. Imagine that a heavily multicore load suddenly morphs into a single thread that was slacking off. You want that thread on one of the potentially high clocked cores, but it might sit on a low core right now. So you have to make a decision whether to move it, which is a very difficult decision to make for a scheduler since it doesn't know how much longer this particular situation will last.

In other words, some losses from sub-optimal scheduling are unavoidable if you have faster and slower cores.

BTW, this concept is much older than E-cores on 12th gen Intel. They had Xeon chips with cores with different clock limits way before.

Yes, this disparity is nothing new and existed before E cores as well. Scheduling is a NP-hard problem even in the case of identical parallel processors so you'll never get an optimal scheduler.

lexluthermiester · Sep 13, 2023

AnotherReader said:
I think there's much ado about nothing here.

YES! Thank You.

fevgatos said:
So if a thread is sent to the 4c core, performance will suffer, just like with ecores. Yes?

Depends on the thread, it's function and whether or not the clock speed will make a difference. Core scheduling is defined, but still complicated. If the OS kernel shifts a thread from one core to a slower core, it's either because that thread has a lower priority or is under-utilizing the core it's running on. A thread being shifted to a slower/lower tier core can also be the OS prioritizing and optimizing in real-time.

For Intel's Big/Little design, this can and frequently does result in degraded performance for the thread shifted to an E-Core, because the "little" cores are of a different(less efficient) design and thus much lower IPC. With the AMD version of it in this example, the same dynamic doesn't exist because the "little" core is functionally identical to the "Big" core, just slower clocking, which means the IPC is the same, but the clock speed is slower.

Put another way, Intel's Big/little design results is significant degradation of thread performance due to the differences not only in clock speed but in core instruction execution capabilities. AMD's Big/Little seems a much better way of doing it as the difference is in clock speed alone.

Does this make sense?

unwind-protect said:
They had Xeon chips with cores with different clock limits way before.

True, but they were all the same cores IIRC. The per-core clock limitations were microcode imposed.

JustBenching · Sep 13, 2023

lexluthermiester said:
Depends on the thread, it's function and whether or not the clock speed will make a difference. Core scheduling is defined, but still complicated. If the OS kernel shifts a thread from one core to a slower core, it's either because that thread has a lower priority or is under-utilizing the core it's running on. A thread being shifted to a slower/lower tier core can also be the OS prioritizing and optimizing in real-time.

For Intel's Big/Little design, this can and frequently does result in degraded performance for the thread shifted to an E-Core, because the "little" cores are of a different(less efficient) design and thus much lower IPC. With the AMD version of it in this example, the same dynamic doesn't exist because the "little" core is functionally identical to the "Big" core, just slower clocking, which means the IPC is the same, but the clock speed is slower.

Put another way, Intel's Big/little design results is significant degradation of thread performance due to the differences not only in clock speed but in core instruction execution capabilities. AMD's Big/Little seems a much better way of doing it as the difference is in clock speed alone.

Does this make sense?

Sure, but the end result is, if the scheduler ***cks up, you lose performance.

Thank god it's not happening on intel thanks to the thread director

lexluthermiester · Sep 13, 2023

fevgatos said:
Sure, but the end result is, if the scheduler ***cks up, you lose performance.

That happens anyway. In non-big/little systems if the scheduler encounters an error, it dumps the current work and restarts the thread. There is no difference there. And that happens regardless of the type or manufacturer of the CPU. That's not really something to focus on.

Redwoodz · Sep 13, 2023

fevgatos said:
If the Zen 4c core would perform as well as the full fat core then there wouldn't be any full fat cores. Obviously that is not the case, zen 4c will be slower so it will have the same "issues" ecores do.

No. When all cores are loaded, the mobile platform will reduce the total available max speed anyway, so in essence you have the same total performance as if you had 6 full cores. The ecores problem is sofware related, which this has nothing to do with other than boost algorithms.

unwind-protect · Sep 13, 2023

fevgatos said:
Sure, but the end result is, if the scheduler ***cks up, you lose performance.

Thank god it's not happening on intel thanks to the thread director

LOL

Yeah, I am sure thread director works perfectly, which is only possible if it can see into the future.

Unless you were being sarcastic, in which case I apologize.

chrcoluk · Sep 13, 2023

I think hybrid is going to be the future for both companies, but I agree that its early days and its not good everything doesnt just work 100% optimised out of the box. (Although on Win11 is still reasonably good out of the box due to thread director).

However my research and investigation into improving things has led me to learn some exciting discoveries about CPU scheduling in windows and the hidden power schema settings, I have started documenting it as well, however the few attempts I have tried to share some of this stuff on the net, no one has bitten, I seem to be the only one excited by it.

W1zzard briefly got interested but only on the NVME power saving states.

JustBenching · Sep 13, 2023

unwind-protect said:
LOL

Yeah, I am sure thread director works perfectly, which is only possible if it can see into the future.

Unless you were being sarcastic, in which case I apologize.

My experience has been perfect thus far. All the games I've tried work better with ecores on. I've heard star citizen doesn't like ecores but never tried it.

unwind-protect · Sep 13, 2023

fevgatos said:
My experience has been perfect thus far. All the games I've tried work better with ecores on. I've heard star citizen doesn't like ecores but never tried it.

That doesn't mean that the whole shebang is running optimally. Not slowing down with E-Cores is a very low bar, especially if your applications have less threads than you have P-cores in the first place.

JustBenching · Sep 13, 2023

unwind-protect said:
That doesn't mean that the whole shebang is running optimally. Not slowing down with E-Cores is a very low bar, especially if your applications have less threads than you have P-cores in the first place.

I didn't say they don't slow down. I said they actually run better.

But - still, what do you mean "low bar". What would be the high bar?

Chrispy_ · Sep 14, 2023

dyonoctis said:
I just feel likes this whole debate will ultimately depends on whether or not AMD decides to limit the clock of zen 4C vs classic zen 4.

How is that different to turbo boost that we already have?

Unlike desktops with 142-230W package power, mobile chips really do clock all the way down under all-core loads. The 15W 6800U, for example, really does drop from 4.7GHz on single-threaded loads to under 3GHz when rendering. The 6800U's eight full-fat Zen4 cores are already operating in a way similar to 2x Zen4 and 6x Zen4C, simply because there are "preferred cores" which are the two marked as the best for high boost clocks. By the time the third core is engaged, clocks have already dropped 500MHz, and they'll lose another GHz as the rest of the cores are loaded and the laptop approaches its STAPM limits.

The clocks of Zen4C will definitely be limited by their own stability at sensible voltages, but many of the cores in a full-fat Zen4 processor are already limited anyway by power targets, so the die area spent on giving them the potential ability to clock higher is going to waste, since if there's ever power budget to spare, the cores that that get boosted to 4.7GHz are the two preferred cores.

ToTTenTranz · Sep 14, 2023

dyonoctis said:
I just feel likes this whole debate will ultimately depends on whether or not AMD decides to limit the clock of zen 4C vs classic zen 4.

Zen4c clocks lower by design. They're using a denser transistor library, meaning it'll consume less power when the clocks are lower (shorter paths for the current to go through) but at the same time there's more heat density at the same clocks so it can't clock as high.
In practice, this means the Zen4c cores will have different power/frequency curves than Zen4.

It should be a bit like ARMs big vs. LITTLE frequency curves, though less far apart because it's still essentially the same core.

Processor	Ryzen 7 5700X
Motherboard	ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling	Noctua NH-C14S (two fans)
Memory	2x16GB DDR4 3200
Video Card(s)	Reference Vega 64
Storage	Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s)	Nixeus NX-EDG27, and Samsung S23A700
Case	Fractal Design R5
Power Supply	Seasonic PRIME TITANIUM 850W
Mouse	Logitech
VR HMD	Oculus Rift
Software	Windows 11 Pro, and Ubuntu 20.04

System Name	Mean machine
Processor	AMD 6900HS
Memory	2x16 GB 4800C40
Video Card(s)	AMD Radeon 6700S

System Name	Main PC
Processor	13700k
Motherboard	Asrock Z690 Steel Legend D4 - Bios 13.02
Cooling	Noctua NH-D15S
Memory	32 Gig 3200CL14
Video Card(s)	4080 RTX SUPER FE 16G
Storage	1TB 980 PRO, 2TB SN850X, 2TB DC P4600, 1TB 860 EVO, 2x 3TB WD Red, 2x 4TB WD Red
Display(s)	LG 27GL850
Case	Fractal Define R4
Audio Device(s)	Soundblaster AE-9
Power Supply	Antec HCG 750 Gold
Software	Windows 10 21H2 LTSC

System Name	Mean machine
Processor	AMD 6900HS
Memory	2x16 GB 4800C40
Video Card(s)	AMD Radeon 6700S

System Name	Mean machine
Processor	AMD 6900HS
Memory	2x16 GB 4800C40
Video Card(s)	AMD Radeon 6700S

System Name	Bragging Rights
Processor	Atom Z3735F 1.33GHz
Motherboard	It has no markings but it's green
Cooling	No, it's a 2.2W processor
Memory	2GB DDR3L-1333
Video Card(s)	Gen7 Intel HD (4EU @ 311MHz)
Storage	32GB eMMC and 128GB Sandisk Extreme U3
Display(s)	10" IPS 1280x800 60Hz
Case	Veddha T2
Audio Device(s)	Apparently, yes
Power Supply	Samsung 18W 5V fast-charger
Mouse	MX Anywhere 2
Keyboard	Logitech MX Keys (not Cherry MX at all)
VR HMD	Samsung Oddyssey, not that I'd plug it into this though....
Software	W10 21H1, barely
Benchmark Scores	I once clocked a Celeron-300A to 564MHz on an Abit BE6 and it scored over 9000.

Processor	Ryzen 9 5900X
Motherboard	Gigabyte X570 Aorus Pro
Cooling	AiO 240mm
Memory	2x 32GB Kingston Fury Beast 3600MHz CL18
Video Card(s)	Radeon RX 6900XT Reference (amd.com)
Storage	O.S.: 256GB SATA \| 2x 1TB SanDisk SSD SATA Data \| Games: 1TB Samsung 970 Evo
Display(s)	LG 34" UWQHD
Audio Device(s)	X-Fi XtremeMusic + Gigaworks SB750 7.1 THX
Power Supply	XFX 850W
Mouse	Logitech G502 Wireless
VR HMD	Lenovo Explorer
Software	Windows 10 64bit