• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Core Configurations of Intel Core Ultra 200 "Arrow Lake-S" Desktop Processors Surface

Joined
Feb 18, 2005
Messages
5,847 (0.81/day)
Location
Ikenai borderline!
System Name Firelance.
Processor Threadripper 3960X
Motherboard ROG Strix TRX40-E Gaming
Cooling IceGem 360 + 6x Arctic Cooling P12
Memory 8x 16GB Patriot Viper DDR4-3200 CL16
Video Card(s) MSI GeForce RTX 4060 Ti Ventus 2X OC
Storage 2TB WD SN850X (boot), 4TB Crucial P3 (data)
Display(s) 3x AOC Q32E2N (32" 2560x1440 75Hz)
Case Enthoo Pro II Server Edition (Closed Panel) + 6 fans
Power Supply Fractal Design Ion+ 2 Platinum 760W
Mouse Logitech G602
Keyboard Razer Pro Type Ultra
Software Windows 10 Professional x64
multi-chiplet MCM
Sorry for doing this but MCM literally means "multi-chiplet module", so you just wrote "multi-chiplet multi-chiplet module" :p
 
Joined
Jun 10, 2014
Messages
2,986 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
As always, I'm a bit skeptical about leaked clock speeds for a simple reason; they never know the final clock speeds until they got the final stepping in a significant volume, which leads to the following logical deductions;
a) The CPUs are ready for "imminent" release (within the next couple of months)
or
b) This is yet another fake leak
Please keep this in mind.

Hmmm....So let me get this straight no more Hyper Threading U9 will have "only" 8 performance cores but 16 efficient cores there are rumors around the net that we could expect better improvements in IPC from 5% to 15% with P-cores 'tho some people claim it will be much better improvements when it comes to the E-cores then again U9 285 have in total 24 Threads compared to the I9 14900k that have 32 Threads....hmmm is it going to be better in multithreads apps at all???
You are raising excellent questions, which no one can answer until we get a deep-dive into actual finalized products, but I can still point out a few important aspects most people miss;
Firstly, given Arrow Lake is presumably a very different microarchitecture, we don't know its performance characteristics at all, and even if we got confirmed IPC figures, base clocks and boost clocks, amount of cache etc., it only gives us an idea of the overall performance, but still very little whether this is an all-round excellent performer, or only excels in computationally heavy (but logically simple) SIMD, or very good at mixed loads but not at heavy SIMD. It may very well end up like a stellar performer in synthetic or very specific benchmarks, and just being a modest upgrade in real world tasks, only time will tell. When it comes to E-cores, those are already mostly a gimmick. They serve two purposes; make the specs look nice, like having >5 GHz 20 cores at 65W (the big PC vendors loves this), and to make certain benchmarks like Cinebench look good (which have little or no relevance for end-users).

We also need to keep in mind when they do (presumably) larger architectural overhauls there might actually be areas with significant downsides too, especially with the "first iteration", so be mentally prepared for that, and don't completely dismiss large advancements in some areas if there are some regressions too. Additionally, despite IPC and rated clock speeds, the microarchitecture and the node ultimately decides which performance will be achieved in specific workloads. Contrary to popular belief, IPC is actually an average amount of instructions, not performance at all. Plus, the node and the microarchitecture might allow the CPU to run a specific workload at a higher than expected sustained real clock speed than a competitor with similar or even higher "IPC". This was the case back with Zen 2 vs. Coffee Lake/Comet Lake, where in many multithreaded workloads Zen 2 achieved much higher actual clock speeds, while the Skylake-family throttled heavily despite higher IPC, resulting in lower performance for Intel. And IPC estimates based on rated clock speeds is useless, as rated clock speeds on current CPUs is mostly a gimmick anyways. This is why I always say; performance is what ultimately matters, how it's achieved is just details for those interested. ;)

Two different diagrams of the LionCove core from LunarLake graphics:
Lots of good info there. Just keep in mind that any graphics used in promotional material prior to release may very well be based on approximations, not the final design. ;)
 
Joined
Nov 3, 2020
Messages
27 (0.02/day)
Previously, the Meteorlake graphic was very loose on RedwoodCove's core and cache structures, which is clearly visible in the graphic.

The Lunarlake graphic represents LionCove's diagrams very accurately. Why do I think it's very accurate? Because the main LionCove project has been completed for some time and will also be implemented in ArrowLake.

I dare say that the same diagrams come from the preparation for the presentation of the LionCove microarchitecture.
 
Joined
Dec 25, 2020
Messages
6,734 (4.71/day)
Location
São Paulo, Brazil
System Name "Icy Resurrection"
Processor 13th Gen Intel Core i9-13900KS Special Edition
Motherboard ASUS ROG MAXIMUS Z790 APEX ENCORE
Cooling Noctua NH-D15S upgraded with 2x NF-F12 iPPC-3000 fans and Honeywell PTM7950 TIM
Memory 32 GB G.SKILL Trident Z5 RGB F5-6800J3445G16GX2-TZ5RK @ 7600 MT/s 36-44-44-52-96 1.4V
Video Card(s) ASUS ROG Strix GeForce RTX™ 4080 16GB GDDR6X White OC Edition
Storage 500 GB WD Black SN750 SE NVMe SSD + 4 TB WD Red Plus WD40EFPX HDD
Display(s) 55-inch LG G3 OLED
Case Pichau Mancer CV500 White Edition
Power Supply EVGA 1300 G2 1.3kW 80+ Gold
Mouse Microsoft Classic Intellimouse
Keyboard Generic PS/2
Software Windows 11 IoT Enterprise LTSC 24H2
Benchmark Scores I pulled a Qiqi~
Sorry for doing this but MCM literally means "multi-chiplet module", so you just wrote "multi-chiplet multi-chiplet module" :p

LOL but actually it's multi-chip module :laugh:
 
Joined
Jun 1, 2021
Messages
306 (0.24/day)
This is your quote: "In the age of massive core counts HT/SMT is not needed. Without it you can clock higher, use less voltage, and design more secure processors."

First of all, 'HT/SMT is not needed' is incorrect, regardless of core counts. It depends on the arch/workload.
Secondly, SMT will not automatically lower clocks/need higher voltages. Disabling SMT in a CPU that is designed with SMT in mind might allow higher clocks/lower voltages, but you lose performance and CPU utilization which in turn might allow those clocks. But if you design an arch without SMT, there are too many variables in play to determine whether clocks will actually increase or decrease. So no, not having SMT will not automatically increase clocks.

Security part is true, SMT does require added security measures. I guess given intel's track record it's probably a good thing they're not going to have HT.

Here's a link to one of his articles on SMT: https://www.anandtech.com/show/1626...-multithreading-on-zen-3-and-amd-ryzen-5000/5
The issue with what you are saying is that it's all entirely relative and we cannot know anything. The thing is that it does not matter if Zen 3 benefits from SMT in some workloads, because a core designed without SMT will have wildly different architectural decisions.

Modern Zen cores(as well as Intel ones) have highly refined SMT implementations, with many resources as possible being competitively shared and etc. This is going to cost a lot of transistors, engineering hours, validation, heat and etc.

Intel might have run simulations and thought from the results that further SMT might just be take more than it adds.

but you lose performance and CPU utilization which in turn might allow those clocks
You are not guaranteed to lose performance, and can gain from it. It depends too much on workload, SMT would benefit the most when there are stalls in places like memory or you have threads that have very different resource utilization(say, one is purely integer and the other is majority float).

By disabling SMT, you are giving the entire core resources to a single thread, and that might have been the bottleneck for a lot of things, including some MT workloads.
 
Joined
Feb 18, 2005
Messages
5,847 (0.81/day)
Location
Ikenai borderline!
System Name Firelance.
Processor Threadripper 3960X
Motherboard ROG Strix TRX40-E Gaming
Cooling IceGem 360 + 6x Arctic Cooling P12
Memory 8x 16GB Patriot Viper DDR4-3200 CL16
Video Card(s) MSI GeForce RTX 4060 Ti Ventus 2X OC
Storage 2TB WD SN850X (boot), 4TB Crucial P3 (data)
Display(s) 3x AOC Q32E2N (32" 2560x1440 75Hz)
Case Enthoo Pro II Server Edition (Closed Panel) + 6 fans
Power Supply Fractal Design Ion+ 2 Platinum 760W
Mouse Logitech G602
Keyboard Razer Pro Type Ultra
Software Windows 10 Professional x64
The issue with what you are saying is that it's all entirely relative and we cannot know anything. The thing is that it does not matter if Zen 3 benefits from SMT in some workloads, because a core designed without SMT will have wildly different architectural decisions.

Modern Zen cores(as well as Intel ones) have highly refined SMT implementations, with many resources as possible being competitively shared and etc. This is going to cost a lot of transistors, engineering hours, validation, heat and etc.

Intel might have run simulations and thought from the results that further SMT might just be take more than it adds.


You are not guaranteed to lose performance, and can gain from it. It depends too much on workload, SMT would benefit the most when there are stalls in places like memory or you have threads that have very different resource utilization(say, one is purely integer and the other is majority float).

By disabling SMT, you are giving the entire core resources to a single thread, and that might have been the bottleneck for a lot of things, including some MT workloads.
When SMT/MT was introduced, CPUs had only a couple of cores, so the synchronisation overhead and thus transistor budget required was relatively minimal. Now that CPUs are have multiple times more cores and need to synchronise across all of them, adding SMT/HT on top of that is liable to cause the required number of transistors to explode. While Intel's engineering decisions haven't been great in the past few years, I would expect that a clean-room-(ish) core design lacking SMT/HT has been thoroughly researched and evaluated as more useful going forward than their current Skylake++++++ architecture. In other words, while Arrow Lake's loss of SMT/HT may not be made up for with IPC gains, likely future derivatives of its architecture will be able to achieve that IPC and more. As with all things in engineering, it's a tradeoff.

Honestly, what really interests me is what AMD chooses to do in response: will they stick with HT and their lighter-weight Zen cores, will they try their own heterogenous approach, will they come up with something completely different, will they do all or none of the above?
 
Joined
May 3, 2019
Messages
2,095 (1.03/day)
System Name BigRed
Processor I7 12700k
Motherboard Asus Rog Strix z690-A WiFi D4
Cooling Noctua D15S chromax black/MX6
Memory TEAM GROUP 32GB DDR4 4000C16 B die
Video Card(s) MSI RTX 3080 Gaming Trio X 10GB
Storage M.2 drives WD SN850X 1TB 4x4 BOOT/WD SN850X 4TB 4x4 STEAM/USB3 4TB OTHER
Display(s) Dell s3422dwg 34" 3440x1440p 144hz ultrawide
Case Corsair 7000D
Audio Device(s) Logitech Z5450/KEF uniQ speakers/Bowers and Wilkins P7 Headphones
Power Supply Corsair RM850x 80% gold
Mouse Logitech G604 lightspeed wireless
Keyboard Logitech G915 TKL lightspeed wireless
Software Windows 10 Pro X64
Benchmark Scores Who cares
When SMT/MT was introduced, CPUs had only a couple of cores, so the synchronisation overhead and thus transistor budget required was relatively minimal. Now that CPUs are have multiple times more cores and need to synchronise across all of them, adding SMT/HT on top of that is liable to cause the required number of transistors to explode. While Intel's engineering decisions haven't been great in the past few years, I would expect that a clean-room-(ish) core design lacking SMT/HT has been thoroughly researched and evaluated as more useful going forward than their current Skylake++++++ architecture. In other words, while Arrow Lake's loss of SMT/HT may not be made up for with IPC gains, likely future derivatives of its architecture will be able to achieve that IPC and more. As with all things in engineering, it's a tradeoff.

Honestly, what really interests me is what AMD chooses to do in response: will they stick with HT and their lighter-weight Zen cores, will they try their own heterogenous approach, will they come up with something completely different, will they do all or none of the above?

AMD will stick with X3D, it's the only bow to their fiddle.

At least Intel is trying something new, moving from monolithic, and no more HT. I think Arrow Lake will surprise us all.
 
Joined
Jun 1, 2011
Messages
4,591 (0.93/day)
Location
in a van down by the river
Processor faster at instructions than yours
Motherboard more nurturing than yours
Cooling frostier than yours
Memory superior scheduling & haphazardly entry than yours
Video Card(s) better rasterization than yours
Storage more ample than yours
Display(s) increased pixels than yours
Case fancier than yours
Audio Device(s) further audible than yours
Power Supply additional amps x volts than yours
Mouse without as much gnawing as yours
Keyboard less clicky than yours
VR HMD not as odd looking as yours
Software extra mushier than yours
Benchmark Scores up yours
to paraphrase a certain long island comedian; what's the deal with all these E-cores, how many E-cores does one person need?
 
Joined
Dec 25, 2020
Messages
6,734 (4.71/day)
Location
São Paulo, Brazil
System Name "Icy Resurrection"
Processor 13th Gen Intel Core i9-13900KS Special Edition
Motherboard ASUS ROG MAXIMUS Z790 APEX ENCORE
Cooling Noctua NH-D15S upgraded with 2x NF-F12 iPPC-3000 fans and Honeywell PTM7950 TIM
Memory 32 GB G.SKILL Trident Z5 RGB F5-6800J3445G16GX2-TZ5RK @ 7600 MT/s 36-44-44-52-96 1.4V
Video Card(s) ASUS ROG Strix GeForce RTX™ 4080 16GB GDDR6X White OC Edition
Storage 500 GB WD Black SN750 SE NVMe SSD + 4 TB WD Red Plus WD40EFPX HDD
Display(s) 55-inch LG G3 OLED
Case Pichau Mancer CV500 White Edition
Power Supply EVGA 1300 G2 1.3kW 80+ Gold
Mouse Microsoft Classic Intellimouse
Keyboard Generic PS/2
Software Windows 11 IoT Enterprise LTSC 24H2
Benchmark Scores I pulled a Qiqi~
to paraphrase a certain long island comedian; what's the deal with all these E-cores, how many E-cores does one person need?

I'm fairly sure we are not talking about the same comedian, but it's a busy couch :laugh:

When SMT/MT was introduced, CPUs had only a couple of cores, so the synchronisation overhead and thus transistor budget required was relatively minimal. Now that CPUs are have multiple times more cores and need to synchronise across all of them, adding SMT/HT on top of that is liable to cause the required number of transistors to explode. While Intel's engineering decisions haven't been great in the past few years, I would expect that a clean-room-(ish) core design lacking SMT/HT has been thoroughly researched and evaluated as more useful going forward than their current Skylake++++++ architecture. In other words, while Arrow Lake's loss of SMT/HT may not be made up for with IPC gains, likely future derivatives of its architecture will be able to achieve that IPC and more. As with all things in engineering, it's a tradeoff.

Honestly, what really interests me is what AMD chooses to do in response: will they stick with HT and their lighter-weight Zen cores, will they try their own heterogenous approach, will they come up with something completely different, will they do all or none of the above?

I agree, but the Skylake++++ thing ended with Comet. Rocket was a small regression because it was the first processor of the "Cove" era, and as the first-generation P-core design backported to 14 nm, it just didn't hold up to the established Skylake core back then. The "Cove" era will still live on in Core Ultra's first few generations, I reckon.
 
Joined
Nov 8, 2017
Messages
229 (0.09/day)
When it comes to E-cores, those are already mostly a gimmick. They serve two purposes; make the specs look nice, like having >5 GHz 20 cores at 65W (the big PC vendors loves this), and to make certain benchmarks like Cinebench look good (which have little or no relevance for end-users).
You might as well say that high core count CPU are a gimmick too. A core I9 with 16 P-cores wouldn't have been as fast as many people seems to imagine...and Puget did those test with PL1 = 125w PL2= 253w 56sec. The 16 core Xeon was chugging 240w all the time. E-cores were really the only way for Intel to make a CPU that would be great as ST task and MT task at the same time. The latest and greatest XEON are kind of shit if you don't absolutely require a lot of memory.
1715264098933.png
1715264112792.png

1715264690938.png
 
Joined
Jun 1, 2021
Messages
306 (0.24/day)
When SMT/MT was introduced, CPUs had only a couple of cores, so the synchronisation overhead and thus transistor budget required was relatively minimal. Now that CPUs are have multiple times more cores and need to synchronise across all of them, adding SMT/HT on top of that is liable to cause the required number of transistors to explode. While Intel's engineering decisions haven't been great in the past few years, I would expect that a clean-room-(ish) core design lacking SMT/HT has been thoroughly researched and evaluated as more useful going forward than their current Skylake++++++ architecture. In other words, while Arrow Lake's loss of SMT/HT may not be made up for with IPC gains, likely future derivatives of its architecture will be able to achieve that IPC and more. As with all things in engineering, it's a tradeoff.

Honestly, what really interests me is what AMD chooses to do in response: will they stick with HT and their lighter-weight Zen cores, will they try their own heterogenous approach, will they come up with something completely different, will they do all or none of the above?
The first SMT/HT for Intel was the Pentium, which made a lot of sense considering that it had a high clockspeed but a lot of issues in terms of pipeline and feeding the core. And well, it's not just the synchronization which is an issue but the core itself, the what is happening with the core. As another example, you need to be able to fetch from two streams of instructions so there can be consequences to the instruction cache or similar, you will also need duplicated some of the architectural registers like PCs.

There are a lot of ways to implement SMT/HT and they have different costs-performance considerations. The PS3 and Xbox360 had SMT too and they did that because the architecture would have been too prone to single core stalls otherwise..

It's not necessarily a lightweight thing to implement.

I think that honestly in terms of Architecture, Intel has never got behind really. Skylake was great for many years, they just really got behind in terms of lithography. So if they are deciding that HT isn't worth it anymore, I would really assume they have good reasons to believe so.
 
Joined
Nov 26, 2021
Messages
1,648 (1.51/day)
Location
Mississauga, Canada
Processor Ryzen 7 5700X
Motherboard ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling Noctua NH-C14S (two fans)
Memory 2x16GB DDR4 3200
Video Card(s) Reference Vega 64
Storage Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s) Nixeus NX-EDG27, and Samsung S23A700
Case Fractal Design R5
Power Supply Seasonic PRIME TITANIUM 850W
Mouse Logitech
VR HMD Oculus Rift
Software Windows 11 Pro, and Ubuntu 20.04
You might as well say that high core count CPU are a gimmick too. A core I9 with 16 P-cores wouldn't have been as fast as many people seems to imagine...and Puget did those test with PL1 = 125w PL2= 253w 56sec. The 16 core Xeon was chugging 240w all the time. E-cores were really the only way for Intel to make a CPU that would be great as ST task and MT task at the same time. The latest and greatest XEON are kind of shit if you don't absolutely require a lot of memory.
View attachment 346686View attachment 346687
View attachment 346688
To be fair, they should be comparing the 12900k to the Xeon as the Xeon uses Golden Cove cores.

The first SMT/HT for Intel was the Pentium, which made a lot of sense considering that it had a high clockspeed but a lot of issues in terms of pipeline and feeding the core. And well, it's not just the synchronization which is an issue but the core itself, the what is happening with the core. As another example, you need to be able to fetch from two streams of instructions so there can be consequences to the instruction cache or similar, you will also need duplicated some of the architectural registers like PCs.

There are a lot of ways to implement SMT/HT and they have different costs-performance considerations. The PS3 and Xbox360 had SMT too and they did that because the architecture would have been too prone to single core stalls otherwise..

It's not necessarily a lightweight thing to implement.

I think that honestly in terms of Architecture, Intel has never got behind really. Skylake was great for many years, they just really got behind in terms of lithography. So if they are deciding that HT isn't worth it anymore, I would really assume they have good reasons to believe so.
We aren't privy to their reasons, but SMT is lightweight from a die area perspective. However, keep in mind that a lot of Xeons are now sold to the cloud vendors, and their business model is built upon sharing resources. SMT is rather susceptible to side channel attacks so that would have been a major consideration. Validation time would have been another concern.
 

dgianstefani

TPU Proofreader
Staff member
Joined
Dec 29, 2017
Messages
5,029 (1.99/day)
Location
Swansea, Wales
System Name Silent
Processor Ryzen 7800X3D @ 5.15ghz BCLK OC, TG AM5 High Performance Heatspreader
Motherboard ASUS ROG Strix X670E-I, chipset fans replaced with Noctua A14x25 G2
Cooling Optimus Block, HWLabs Copper 240/40 + 240/30, D5/Res, 4x Noctua A12x25, 1x A14G2, Mayhems Ultra Pure
Memory 32 GB Dominator Platinum 6150 MT 26-36-36-48, 56.6ns AIDA, 2050 FCLK, 160 ns tRFC, active cooled
Video Card(s) RTX 3080 Ti Founders Edition, Conductonaut Extreme, 18 W/mK MinusPad Extreme, Corsair XG7 Waterblock
Storage Intel Optane DC P1600X 118 GB, Samsung 990 Pro 2 TB
Display(s) 32" 240 Hz 1440p Samsung G7, 31.5" 165 Hz 1440p LG NanoIPS Ultragear, MX900 dual gas VESA mount
Case Sliger SM570 CNC Aluminium 13-Litre, 3D printed feet, custom front, LINKUP Ultra PCIe 4.0 x16 white
Audio Device(s) Audeze Maxwell Ultraviolet w/upgrade pads & LCD headband, Galaxy Buds 3 Pro, Razer Nommo Pro
Power Supply SF750 Plat, full transparent custom cables, Sentinel Pro 1500 Online Double Conversion UPS w/Noctua
Mouse Razer Viper Pro V2 8 KHz Mercury White w/Tiger Ice Skates & Pulsar Supergrip tape
Keyboard Wooting 60HE+ module, TOFU-R CNC Alu/Brass, SS Prismcaps W+Jellykey, LekkerV2 mod, TLabs Leath/Suede
Software Windows 11 IoT Enterprise LTSC 24H2
Benchmark Scores Legendary
The first SMT/HT for Intel was the Pentium, which made a lot of sense considering that it had a high clockspeed but a lot of issues in terms of pipeline and feeding the core. And well, it's not just the synchronization which is an issue but the core itself, the what is happening with the core. As another example, you need to be able to fetch from two streams of instructions so there can be consequences to the instruction cache or similar, you will also need duplicated some of the architectural registers like PCs.

There are a lot of ways to implement SMT/HT and they have different costs-performance considerations. The PS3 and Xbox360 had SMT too and they did that because the architecture would have been too prone to single core stalls otherwise..

It's not necessarily a lightweight thing to implement.

I think that honestly in terms of Architecture, Intel has never got behind really. Skylake was great for many years, they just really got behind in terms of lithography. So if they are deciding that HT isn't worth it anymore, I would really assume they have good reasons to believe so.
Yeah it really astounds me that some people/armchair critics seem to think they know better than Intel researchers and engineers, who have delivered some incredible advancements over the years, and are some of the smartest, most innovative people in the industry.

These guys look to be the first to bring backside power delivery/powerVIA and gate all around/ribbonFET transistors to mass market. This is not an insignificant achievement. Intel were also the first to bring hybrid architecture to consumer mainstream x86 PCs with the SoC lakefield, which incorporated foveros, two types of cores, IO and DRAM on a single chip in 2020 before M1.

No, they aren't choosing to end HT from a complete lack of understanding of how these things work. Please, get real.
 
Joined
Jun 1, 2021
Messages
306 (0.24/day)
We aren't privy to their reasons, but SMT is lightweight from a die area perspective
Well, can you source that?

A lot of details are hard and requires adjustements. I would suggest to take a look at the SMT section of this article:

Loongson 3A6000: A Star among Chinese CPUs – Chips and Cheese

It might not seem much, but it can affect the schedulers and everything which becomes significantly more complex as they now have to check and schedule from two separated threads.

Obviously, when you consider L2, that is going to be as big if not bigger than the core. But the impact might be similar or somewhat greater than the PS5 FPU nerf. If they can then use that transistor budget for other things, you could see some benefit.

The Nerfed FPU in PS5’s Zen 2 Cores – Chips and Cheese
 
Joined
Nov 26, 2021
Messages
1,648 (1.51/day)
Location
Mississauga, Canada
Processor Ryzen 7 5700X
Motherboard ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling Noctua NH-C14S (two fans)
Memory 2x16GB DDR4 3200
Video Card(s) Reference Vega 64
Storage Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s) Nixeus NX-EDG27, and Samsung S23A700
Case Fractal Design R5
Power Supply Seasonic PRIME TITANIUM 850W
Mouse Logitech
VR HMD Oculus Rift
Software Windows 11 Pro, and Ubuntu 20.04
Well, can you source that?

A lot of details are hard and requires adjustements. I would suggest to take a look at the SMT section of this article:

Loongson 3A6000: A Star among Chinese CPUs – Chips and Cheese

It might not seem much, but it can affect the schedulers and everything which becomes significantly more complex as they now have to check and schedule from two separated threads.

Obviously, when you consider L2, that is going to be as big if not bigger than the core. But the impact might be similar or somewhat greater than the PS5 FPU nerf. If they can then use that transistor budget for other things, you could see some benefit.

The Nerfed FPU in PS5’s Zen 2 Cores – Chips and Cheese
The most recent figure comes from Marvell. The corresponding figure for Intel's much larger CPUs should be less as they spend far more area on wide vector execution than the ThunderX3.

1715266936354.png
 
Joined
Oct 30, 2020
Messages
250 (0.17/day)
The issue with what you are saying is that it's all entirely relative and we cannot know anything. The thing is that it does not matter if Zen 3 benefits from SMT in some workloads, because a core designed without SMT will have wildly different architectural decisions.

Modern Zen cores(as well as Intel ones) have highly refined SMT implementations, with many resources as possible being competitively shared and etc. This is going to cost a lot of transistors, engineering hours, validation, heat and etc.

Intel might have run simulations and thought from the results that further SMT might just be take more than it adds.


You are not guaranteed to lose performance, and can gain from it. It depends too much on workload, SMT would benefit the most when there are stalls in places like memory or you have threads that have very different resource utilization(say, one is purely integer and the other is majority float).

By disabling SMT, you are giving the entire core resources to a single thread, and that might have been the bottleneck for a lot of things, including some MT workloads.

You actually agreed to my points. I never said that intel are wrong to not have SMT in their future gens, that wasn't my point at all. I think from the link I provided you thought I was trying to refute that intel are incorrect to disable SMT in future gens? No no..I was responding to our resident TPU staff and his blanket statement of 'HT is not needed for modern CPU's with lots of cores' and 'CPU's can clock higher with lower volts without SMT', both of which are factually incorrect. Clocks can be higher or lower depending on the design of the arch, and having a high number of cores doesn't have anything to do with SMT but it's rather core utilization. That entirely depends on how they design the arch, and not the blanket statement I was refuting which I quoted earlier.


Yeah it really astounds me that some people/armchair critics seem to think they know better than Intel researchers and engineers, who have delivered some incredible advancements over the years, and are some of the smartest, most innovative people in the industry.

These guys look to be the first to bring backside power delivery/powerVIA and gate all around/ribbonFET transistors to mass market. This is not an insignificant achievement. Intel were also the first to bring hybrid architecture to consumer mainstream x86 PCs with the SoC lakefield, which incorporated foveros, two types of cores, IO and DRAM on a single chip in 2020 before M1.

No, they aren't choosing to end HT from a complete lack of understanding of how these things work. Please, get real.

You were factually incorrect and I mentioned as to why you were. Rather than a reply or discussion, you resort to indirectly calling others armchair critics. Have a discussion, there's nothing wrong with being incorrect and admitting to it. And I wasn't even saying they choose to end HT from a lack of understanding, so i'm not even sure who you are replying to in the last sentence.
 

dgianstefani

TPU Proofreader
Staff member
Joined
Dec 29, 2017
Messages
5,029 (1.99/day)
Location
Swansea, Wales
System Name Silent
Processor Ryzen 7800X3D @ 5.15ghz BCLK OC, TG AM5 High Performance Heatspreader
Motherboard ASUS ROG Strix X670E-I, chipset fans replaced with Noctua A14x25 G2
Cooling Optimus Block, HWLabs Copper 240/40 + 240/30, D5/Res, 4x Noctua A12x25, 1x A14G2, Mayhems Ultra Pure
Memory 32 GB Dominator Platinum 6150 MT 26-36-36-48, 56.6ns AIDA, 2050 FCLK, 160 ns tRFC, active cooled
Video Card(s) RTX 3080 Ti Founders Edition, Conductonaut Extreme, 18 W/mK MinusPad Extreme, Corsair XG7 Waterblock
Storage Intel Optane DC P1600X 118 GB, Samsung 990 Pro 2 TB
Display(s) 32" 240 Hz 1440p Samsung G7, 31.5" 165 Hz 1440p LG NanoIPS Ultragear, MX900 dual gas VESA mount
Case Sliger SM570 CNC Aluminium 13-Litre, 3D printed feet, custom front, LINKUP Ultra PCIe 4.0 x16 white
Audio Device(s) Audeze Maxwell Ultraviolet w/upgrade pads & LCD headband, Galaxy Buds 3 Pro, Razer Nommo Pro
Power Supply SF750 Plat, full transparent custom cables, Sentinel Pro 1500 Online Double Conversion UPS w/Noctua
Mouse Razer Viper Pro V2 8 KHz Mercury White w/Tiger Ice Skates & Pulsar Supergrip tape
Keyboard Wooting 60HE+ module, TOFU-R CNC Alu/Brass, SS Prismcaps W+Jellykey, LekkerV2 mod, TLabs Leath/Suede
Software Windows 11 IoT Enterprise LTSC 24H2
Benchmark Scores Legendary
You were factually incorrect and I mentioned as to why you were. Rather than a reply or discussion, you resort to indirectly calling others armchair critics. Have a discussion, there's nothing wrong with being incorrect and admitting to it. And I wasn't even saying they choose to end HT from a lack of understanding, so i'm not even sure who you are replying to in the last sentence.
This is your quote: "In the age of massive core counts HT/SMT is not needed. Without it you can clock higher, use less voltage, and design more secure processors."

First of all, 'HT/SMT is not needed' is incorrect, regardless of core counts. It depends on the arch/workload.
Secondly, SMT will not automatically lower clocks/need higher voltages. Disabling SMT in a CPU that is designed with SMT in mind might allow higher clocks/lower voltages, but you lose performance and CPU utilization which in turn might allow those clocks. But if you design an arch without SMT, there are too many variables in play to determine whether clocks will actually increase or decrease. So no, not having SMT will not automatically increase clocks.

Security part is true, SMT does require added security measures. I guess given intel's track record it's probably a good thing they're not going to have HT.

Here's a link to one of his articles on SMT: https://www.anandtech.com/show/1626...-multithreading-on-zen-3-and-amd-ryzen-5000/5


I mean, the disadvantage of having cores with different ISA's are on an entirely different level compared to having two different CCD's with the same cores. Even having Zen4c's on a different CCD is better than intel's approach for pretty much any server workload and sometimes causes issues on the client side as well.

"all the same" isn't really the same.
Firstly is a "it depends"
Secondly is context, you're implying when tuning and turning off HT, voltage/frequency imrprovements are "not automatic", right, this does not make my original statement "factually incorrect".

This is a different situation to having out of the box HT disabled/not architecturally designed in, therefore voltage/frequency improvements would be "baked in" to the microcode.

Your statement "but you lose performance" is also, how do you put it "factually incorrect" in the same way my original statement was. It depends. In many processes and games, disabling HT even with no other changes/tunes made, will improve performance on something like a 13900K.

Finally, you are assuming it's you I'm referring to when I say "people/armchair critics seem to think they know better than Intel researchers and engineers", this is a projection on your part. I was not even thinking of you when I wrote this.

The bottom line is HT benefits software when additional MT performance is needed, but has drawbacks when that MT performance is not needed and there are enough cores/threads even without HT. How often do you think that is the case in a 24 core CPU?

If you want to test the "Without it you can clock higher, use less voltage" assertion.

Get a Raptor Lake CPU, set a static frequency. Now tune the voltage until you're unstable. Note that voltage.

Now turn off HT.

Tune the voltage again.

Note the voltage.

You can do the same thing for clocks etc.

If Intel delivers a CPU without HT that performs better in applications and games that the previous generation, and is more secure. It's a win.
 
Joined
Oct 30, 2020
Messages
250 (0.17/day)
Firstly is a "it depends"
Secondly is context, you're implying when tuning and turning off HT, voltage/frequency imrprovements are "not automatic", right, this does not make my original statement "factually incorrect".

This is a different situation to having out of the box HT disabled/not architecturally designed in, therefore voltage/frequency improvements would be "baked in" to the microcode.

Your statement "but you lose performance" is also, how do you put it "factually incorrect" in the same way my original statement was. It depends. In many processes and games, disabling HT even with no other changes/tunes made, will improve performance on something like a 13900K.

Finally, you are assuming it's you I'm referring to when I say "people/armchair critics seem to think they know better than Intel researchers and engineers", this is a projection on your part. I was not even thinking of you when I wrote this.

The bottom line is HT benefits software when additional MT performance is needed, but has drawbacks when that MT performance is not needed and there are enough cores/threads even without HT. How often do you think that is the case in a 24 core CPU?

Your first statement that was incorrect was that SMT is not needed for modern high core count CPU's. That's incorrect, because it depends if an architecture is designed with SMT in mind. I've explained it before so I won't go into further details.

Secondly, you said without SMT you can design CPU's that can clock higher, use less volts. That's not true as it again, depends on the arch.

When I said lose performance, that's incorrect and it's not what I was trying to say. If you read my post, I was trying to imply that maybe an architecture designed with SMT in mind might lose performance which in turn will allow those clocks. Or it might be the transistors that are idle now and depending on the grey silicon can help with thermals/hotspots. It might be reduced utilization, or CPU cores not fighting for cache. It can be a multitude of things. But it doesn't change the fact that you said having no SMT will lead to higher clocks/lower volts which is still incorrect because, again, it depends on how a CPU is designed and may or may not lead to higher clocks.

Regarding your last sentence alluding to HT only increasing multithreaded performance and when you already have 24 cores you don't need more of it, that's not really true is it? If the architecture is designed with SMT in mind, then regardless of the fact that consumers don't need more than 24 cores it will be on by default because it'll lead to better numbers per core because they need to extract TLP as the core can handle two concurrent instruction streams and still not be starved of resources (relatively). But maybe intel saw that they don't need SMT anymore because they can design an arch that will go around the reasons as to why SMT is needed in the first place and that's fine.

edit: I see that you're still trying to argue with the clock higher part in an edited post of yours. Let me try to make it easy for you - you said without SMT you can design a core that uses less volts and have higher clocks. I said that's not correct because it entirely depends on how the architecture is designed. Having an arch that is perfect and extracts the maximum from a thread will not require SMT, but it doesn't mean it'll also clock higher in the process. You're now trying to say disabling SMT leads to higher clocks, which is a different thing entirely and I said as much - there are a number of reasons why that might be the case
 
Last edited:

dgianstefani

TPU Proofreader
Staff member
Joined
Dec 29, 2017
Messages
5,029 (1.99/day)
Location
Swansea, Wales
System Name Silent
Processor Ryzen 7800X3D @ 5.15ghz BCLK OC, TG AM5 High Performance Heatspreader
Motherboard ASUS ROG Strix X670E-I, chipset fans replaced with Noctua A14x25 G2
Cooling Optimus Block, HWLabs Copper 240/40 + 240/30, D5/Res, 4x Noctua A12x25, 1x A14G2, Mayhems Ultra Pure
Memory 32 GB Dominator Platinum 6150 MT 26-36-36-48, 56.6ns AIDA, 2050 FCLK, 160 ns tRFC, active cooled
Video Card(s) RTX 3080 Ti Founders Edition, Conductonaut Extreme, 18 W/mK MinusPad Extreme, Corsair XG7 Waterblock
Storage Intel Optane DC P1600X 118 GB, Samsung 990 Pro 2 TB
Display(s) 32" 240 Hz 1440p Samsung G7, 31.5" 165 Hz 1440p LG NanoIPS Ultragear, MX900 dual gas VESA mount
Case Sliger SM570 CNC Aluminium 13-Litre, 3D printed feet, custom front, LINKUP Ultra PCIe 4.0 x16 white
Audio Device(s) Audeze Maxwell Ultraviolet w/upgrade pads & LCD headband, Galaxy Buds 3 Pro, Razer Nommo Pro
Power Supply SF750 Plat, full transparent custom cables, Sentinel Pro 1500 Online Double Conversion UPS w/Noctua
Mouse Razer Viper Pro V2 8 KHz Mercury White w/Tiger Ice Skates & Pulsar Supergrip tape
Keyboard Wooting 60HE+ module, TOFU-R CNC Alu/Brass, SS Prismcaps W+Jellykey, LekkerV2 mod, TLabs Leath/Suede
Software Windows 11 IoT Enterprise LTSC 24H2
Benchmark Scores Legendary
Your first statement that was incorrect was that SMT is not needed for modern high core count CPU's. That's incorrect, because it depends if an architecture is designed with SMT in mind. I've explained it before so I won't go into further details.

Secondly, you said without SMT you can design CPU's that can clock higher, use less volts.

When I said lose performance, that's incorrect and it's not what I was trying to say. If you read my post, I was trying to imply that maybe an architecture designed with SMT in mind might lose performance which in turn will allow those clocks. Or it might be the transistors that are idle now and depending on the grey silicon can help with thermals/hotspots. It might be reduced utilization, or CPU cores not fighting for cache. It can be a multitude of things. But it doesn't change the fact that you said having no SMT will lead to higher clocks/lower volts which is still incorrect because, again, it depends on how a CPU is designed and may or may not lead to higher clocks.

Regarding your last sentence alluding to HT only increasing multithreaded performance and when you already have 24 cores you don't need more of it, that's not really true is it? If the architecture is designed with SMT in mind, then regardless of the fact that consumers don't need more than 24 cores it will be on by default because it'll lead to better numbers per core because they need to extract TLP as the core can handle two concurrent instruction streams. But maybe intel saw that they don't need SMT anymore because they can design an arch that will go around the reasons as to why SMT is needed in the first place and that's fine.

edit: I see that you're still trying to argue with the clock higher part in an edited post of yours. Let me try to make it easy for you - you said without SMT you can design a core that uses less volts and have higher clocks. I said that's not correct because it entirely depends on how the architecture is designed. Having an arch that is perfect and extracts the maximum from a thread will not require SMT, but it doesn't mean it'll also clock higher in the process. You're now trying to say disabling SMT leads to higher clocks, which is a different thing entirely and I said as much - there are a number of reasons why that might be the case
I'm not interested in arguing hypotheticals with you.

Have a nice day.
 
Joined
Oct 30, 2020
Messages
250 (0.17/day)
I'm not interested in arguing hypotheticals with you.

Have a nice day.

The hypotheticals exist because you were incorrect. If you were correct, there would be no discussing hypotheticals, or any of the 'it depends' which refute your initial claim.
 

dgianstefani

TPU Proofreader
Staff member
Joined
Dec 29, 2017
Messages
5,029 (1.99/day)
Location
Swansea, Wales
System Name Silent
Processor Ryzen 7800X3D @ 5.15ghz BCLK OC, TG AM5 High Performance Heatspreader
Motherboard ASUS ROG Strix X670E-I, chipset fans replaced with Noctua A14x25 G2
Cooling Optimus Block, HWLabs Copper 240/40 + 240/30, D5/Res, 4x Noctua A12x25, 1x A14G2, Mayhems Ultra Pure
Memory 32 GB Dominator Platinum 6150 MT 26-36-36-48, 56.6ns AIDA, 2050 FCLK, 160 ns tRFC, active cooled
Video Card(s) RTX 3080 Ti Founders Edition, Conductonaut Extreme, 18 W/mK MinusPad Extreme, Corsair XG7 Waterblock
Storage Intel Optane DC P1600X 118 GB, Samsung 990 Pro 2 TB
Display(s) 32" 240 Hz 1440p Samsung G7, 31.5" 165 Hz 1440p LG NanoIPS Ultragear, MX900 dual gas VESA mount
Case Sliger SM570 CNC Aluminium 13-Litre, 3D printed feet, custom front, LINKUP Ultra PCIe 4.0 x16 white
Audio Device(s) Audeze Maxwell Ultraviolet w/upgrade pads & LCD headband, Galaxy Buds 3 Pro, Razer Nommo Pro
Power Supply SF750 Plat, full transparent custom cables, Sentinel Pro 1500 Online Double Conversion UPS w/Noctua
Mouse Razer Viper Pro V2 8 KHz Mercury White w/Tiger Ice Skates & Pulsar Supergrip tape
Keyboard Wooting 60HE+ module, TOFU-R CNC Alu/Brass, SS Prismcaps W+Jellykey, LekkerV2 mod, TLabs Leath/Suede
Software Windows 11 IoT Enterprise LTSC 24H2
Benchmark Scores Legendary
The hypotheticals exist because you were incorrect. If you were correct, there would be no discussing hypotheticals, or any of the 'it depends' which refute your initial claim.
Whatever you say buddy.
 
Joined
Apr 19, 2018
Messages
1,227 (0.51/day)
Processor AMD Ryzen 9 5950X
Motherboard Asus ROG Crosshair VIII Hero WiFi
Cooling Arctic Liquid Freezer II 420
Memory 32Gb G-Skill Trident Z Neo @3806MHz C14
Video Card(s) MSI GeForce RTX2070
Storage Seagate FireCuda 530 1TB
Display(s) Samsung G9 49" Curved Ultrawide
Case Cooler Master Cosmos
Audio Device(s) O2 USB Headphone AMP
Power Supply Corsair HX850i
Mouse Logitech G502
Keyboard Cherry MX
Software Windows 11
to paraphrase a certain long island comedian; what's the deal with all these E-cores, how many E-cores does one person need?
When they can put enough of them on to the die, they will delete the P cores.
 
Joined
Jun 10, 2014
Messages
2,986 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
When SMT/MT was introduced, CPUs had only a couple of cores, so the synchronisation overhead and thus transistor budget required was relatively minimal. Now that CPUs are have multiple times more cores and need to synchronise across all of them, adding SMT/HT on top of that is liable to cause the required number of transistors to explode.
Correct, and I may add that the complexity of implementing SMT in the pipeline has grown greatly with ever more superscalar CPU designs. Not to mention the biggest problem; all the security issues, which requires lots of constraints for the designers to avoid. Thirdly, there is also the fact that modern CPUs have much more capable front-ends, which are better and better at keeping the execution units saturated. This was originally one of the core motivations of SMT, but going forward the potential gain here is going to shrink relatively speaking.

While Intel's engineering decisions haven't been great in the past few years, I would expect that a clean-room-(ish) core design lacking SMT/HT has been thoroughly researched and evaluated as more useful going forward than their current Skylake++++++ architecture. In other words, while Arrow Lake's loss of SMT/HT may not be made up for with IPC gains, likely future derivatives of its architecture will be able to achieve that IPC and more.
If you're talking of architectural engineering decisions, then I disagree. Their designs have generally been held back 2-3 years due to production issues, which probably still have some lasting delays. When it comes to their production however, there has been lots of bad decisions…

As to a "clean room" design, I doubt any of big CPU designers will start that much from scratch, but they do however have to make the big design decisions in the very beginning of the design process, like how threading will work, how cores are interacting etc., as all other design decisions are resulting from that, although they probably don't have the resources to redesign and finetune every tiny part of the CPU design in the first try. So deciding to ditch SMT certainly was done early on, but I would expect them to need a few "attempts" to fully break free from all the design constraints and unleash new levels of IPC. :)

Looking forward, there will be a lot of advancements in superscalar execution. I know Intel are looking into strategies to lessen the impact of branch mispredictions and avoid pipeline stalls and flushes. I believe some of this was supposed to show up in Meteor Lake, but I haven't studied whether it is and the success of it. But over the next generations, we should expect there to be significant gains.

Rocket was a small regression because it was the first processor of the "Cove" era, and as the first-generation P-core design backported to 14 nm, it just didn't hold up to the established Skylake core back then…
Just for the sake of being correct, Rocket Lake wasn't a regression in terms of overall performance, it offered ~19% IPC gains and similar clocks, but sacrificed 2 cores vs. Comet Lake, which leads to people thinking it was inferior. Rocket Lake which was a "backport" of Ice Lake to 14nm was greatly held back by this "inferior" node. The whole family is called "Sunny Cove", with Ice Lake being released in 2019 (server only, very limited availability), followed by Tiger Lake which was a small architectural improvement. Rocket Lake surprisingly seems to be a derivative of Ice Lake-S(never finalized) rather than Tiger Lake, I assume because Tiger Lake never was designed for this purpose and it was much quicker to backport Ice Lake-S instead.
 
Top