• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD Could Release Next Generation EPYC CPUs with Four-Way SMT

Joined
Sep 26, 2012
Messages
871 (0.19/day)
Location
Australia
System Name ATHENA
Processor AMD 7950X
Motherboard ASUS Crosshair X670E Extreme
Cooling ASUS ROG Ryujin III 360, 13 x Lian Li P28
Memory 2x32GB Trident Z RGB 6000Mhz CL30
Video Card(s) ASUS 4090 STRIX
Storage 3 x Kingston Fury 4TB, 4 x Samsung 870 QVO
Display(s) Acer X38S, Wacom Cintiq Pro 15
Case Lian Li O11 Dynamic EVO
Audio Device(s) Topping DX9, Fluid FPX7 Fader Pro, Beyerdynamic T1 G2, Beyerdynamic MMX300
Power Supply Seasonic PRIME TX-1600
Mouse Xtrfy MZ1 - Zy' Rail, Logitech MX Vertical, Logitech MX Master 3
Keyboard Logitech G915 TKL
VR HMD Oculus Quest 2
Software Windows 11 + Universal Blue
I hope it comes with the ability to enable disable modify smt levels per core on the fly, as this could really shine if so.
 
Joined
Apr 30, 2008
Messages
4,904 (0.80/day)
Location
Multidimensional
System Name Apple MacBook Air M2
Processor Apple M2 8 Core CPU
Motherboard Apple Motherboard
Cooling Laptop Passive Cooling
Memory 16GB LPDDR5 RAM
Video Card(s) Apple M2 8 Core GPU
Storage 256GB SSD
Display(s) 13.6-Inch Liquid Retina display 2560x1664
Case Apple Laptop
Audio Device(s) Generic Apple Audio
Power Supply Laptop Battery
Mouse Track Pad
Keyboard Laptop Keyboard
VR HMD ( ◔ ʖ̯ ◔ )
Software MacOS Sequoia
Benchmark Scores Don't do them anymore.
Can someone educate me on why we don't hear much about IBM anymore, they're still huge yet always in the background, do they make actual CPU's still or CPU architectures & if so why aren't they in the desktop consumer scene?
 
Joined
Sep 26, 2012
Messages
871 (0.19/day)
Location
Australia
System Name ATHENA
Processor AMD 7950X
Motherboard ASUS Crosshair X670E Extreme
Cooling ASUS ROG Ryujin III 360, 13 x Lian Li P28
Memory 2x32GB Trident Z RGB 6000Mhz CL30
Video Card(s) ASUS 4090 STRIX
Storage 3 x Kingston Fury 4TB, 4 x Samsung 870 QVO
Display(s) Acer X38S, Wacom Cintiq Pro 15
Case Lian Li O11 Dynamic EVO
Audio Device(s) Topping DX9, Fluid FPX7 Fader Pro, Beyerdynamic T1 G2, Beyerdynamic MMX300
Power Supply Seasonic PRIME TX-1600
Mouse Xtrfy MZ1 - Zy' Rail, Logitech MX Vertical, Logitech MX Master 3
Keyboard Logitech G915 TKL
VR HMD Oculus Quest 2
Software Windows 11 + Universal Blue
Can someone educate me on why we don't hear much about IBM anymore, they're still huge yet always in the background, do they make actual CPU's still or CPU architectures & if so why aren't they in the desktop consumer scene?

You answered your own question. They aren't in the desktop consumer scene as they don't make desktop consumer products.
 
Joined
Feb 22, 2019
Messages
71 (0.03/day)
Can someone educate me on why we don't hear much about IBM anymore, they're still huge yet always in the background, do they make actual CPU's still or CPU architectures & if so why aren't they in the desktop consumer scene?

Short version, Lenovo bought their consumer level business structure off them.
 
Joined
Feb 3, 2017
Messages
3,854 (1.33/day)
Processor Ryzen 7800X3D
Motherboard ROG STRIX B650E-F GAMING WIFI
Memory 2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s) INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage 2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s) 42" LG C2 OLED, 27" ASUS PG279Q
Case Thermaltake Core P5
Power Supply Fractal Design Ion+ Platinum 760W
Mouse Corsair Dark Core RGB Pro SE
Keyboard Corsair K100 RGB
VR HMD HTC Vive Cosmos
four virtual threads per core
Can we stop this already, it's 2019 for Christ sake, computer architecture isn't that cryptic anymore. There is nothing virtual about them, they are as physical as they can get, you could literally put your finger on the corresponding piece of silicon if you had a chip scaled up.
Virtual is perhaps not the right word but you cannot literally put a finger on the corresponding piece of silicon of an SMP thread because these are all in the same core and not even a separate part of it.

When did Intel:
chiplet architecture.
Infinity fabric.
One chip fits all needs (Entry level desktop to top of the line server)
smt4 (still rumor stage)
CCX design.

Amd's 14NM was way way inferior to Intel's in density, performance and everything and still managed to pretty much match Intel's efficiency.
AMD's (technically GlobalFoundries' and TSMC's) 14nm is not way inferior to Intel's. Density is roughly the same, performance is not far off. Frequency ceiling is quite a bit higher for Intel's 14nm(+/++) but that's about it.

Intel was not first to implement these things but they have dabbled in pretty much everything.
chiplet architecture. - Pentium D
Infinity fabric. - QPI since 2008, now UPI.
One chip fits all needs (Entry level desktop to top of the line server) - This is not an optimal approach for performance or design but a pure cost efficiency decision.
smt4 (still rumor stage) - Xeon Phi
CCX design - What exactly do you mean by CCX design? Separate core complexes on the same die? Dual ringbus designs are not far from it. Pentium D with two glued cores is pretty much the same layout.

I highly doubt AMD is going to divorce the core design between Epyc and Ryzen. If they are, it's fine; if not, AMD is pulling another Bulldozer with this one.
It will have zero effect on desktop. They can easily design the cores with number of SMT threads being configurable (both BIOS/UEFI and laser cutting). AMD probably will keep cores and dies the same across both Ryzen and EPYC and extra transistor cost for more SMT threads is not significant. Perhaps more accurately, parts of that will benefit core anyway and parts that do not are small.
 
Last edited:
Joined
Apr 21, 2010
Messages
578 (0.11/day)
System Name Home PC
Processor Ryzen 5900X
Motherboard Asus Prime X370 Pro
Cooling Thermaltake Contac Silent 12
Memory 2x8gb F4-3200C16-8GVKB - 2x16gb F4-3200C16-16GVK
Video Card(s) XFX RX480 GTR
Storage Samsung SSD Evo 120GB -WD SN580 1TB - Toshiba 2TB HDWT720 - 1TB GIGABYTE GP-GSTFS31100TNTD
Display(s) Cooler Master GA271 and AoC 931wx (19in, 1680x1050)
Case Green Magnum Evo
Power Supply Green 650UK Plus
Mouse Green GM602-RGB ( copy of Aula F810 )
Keyboard Old 12 years FOCUS FK-8100
More Transistor density means More heat density Like L3 Cache.It will getting too hot.20% doesn't mean 20% more heat , but can mean 50% or even more.
 
Joined
Jun 28, 2016
Messages
3,595 (1.15/day)
Can someone educate me on why we don't hear much about IBM anymore, they're still huge yet always in the background, do they make actual CPU's still or CPU architectures & if so why aren't they in the desktop consumer scene?
"We" who?
IBM is still one of the most talked about IT companies. But they don't make consumer products anymore, so they're out of scope on sites/forums like TPU.

If you would go on a datacenter / cloud / AI / ML / quantum computing website or forum, IBM would appear way more frequently than AMD. Even more than Intel.
 
D

Deleted member 172152

Guest
I would just like to say:

...

I've got nothing here. I mean, it's not like most people need even more multithreaded performance, especially with 8, 12 or 16 cores, so if this makes it into ryzen 4000:roll:
 
Joined
Jan 8, 2017
Messages
9,583 (3.28/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
Virtual is perhaps not the right word but you cannot literally put a finger on the corresponding piece of silicon of an SMP thread because these are all in the same core and not even a separate part of it.

Of course you can, the added logic that is required to process multiple streams of instructions exists physically in silicon. It's not some unexplainable abstract entity so yes can definitely put your finger on it.
 
Joined
Feb 3, 2017
Messages
3,854 (1.33/day)
Processor Ryzen 7800X3D
Motherboard ROG STRIX B650E-F GAMING WIFI
Memory 2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s) INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage 2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s) 42" LG C2 OLED, 27" ASUS PG279Q
Case Thermaltake Core P5
Power Supply Fractal Design Ion+ Platinum 760W
Mouse Corsair Dark Core RGB Pro SE
Keyboard Corsair K100 RGB
VR HMD HTC Vive Cosmos
Of course you can, the added logic that is required to process multiple streams of instructions exists physically in silicon. It's not some unexplainable abstract entity so yes can definitely put your finger on it.
Process is all about threading, it is all in the frontend. There is no additional frontend and adding 4-way SMT is pretty much enlargening the existing pieces to fit more threads. It is mainly about queue sizes, more threads also means more cache offcore for it to be efficient.

"Process" here is all about management. Actual execution units are different anyway. Ryzen has 8-10 execution units in it. Squeezing more threads through to them is all about keeping as much of them occupied as possible. Both AMD and Intel have said recently that they are usually looking at 3-4 of these units being active at one time. The effort right now is to make sure they can feed more data in there.

That is the idea behind SMT. At any time when there are execution units idle, more work can be fed into them. If current thread does not utilize them, let's use another one. There are tradeoffs to this as frontend needs to be more capable, queues have to fit more entries etc. Complexity, die space and still possible stalls and SMT's efficiency tends to fall with more threads. In addition to that, SMT itself does not involve adding execution units (although that can still be done in core design regardless of SMT). Right now, both Zen(+/2) and Skylake can pretty much do 2 of any specific operation at once (not all of them but most), but not more.
 
Last edited:
Joined
Sep 28, 2012
Messages
983 (0.22/day)
System Name Poor Man's PC
Processor Ryzen 7 9800X3D
Motherboard MSI B650M Mortar WiFi
Cooling Thermalright Phantom Spirit 120 with Arctic P12 Max fan
Memory 32GB GSkill Flare X5 DDR5 6000Mhz
Video Card(s) XFX Merc 310 Radeon RX 7900 XT
Storage XPG Gammix S70 Blade 2TB + 8 TB WD Ultrastar DC HC320
Display(s) Xiaomi G Pro 27i MiniLED
Case Asus A21 Case
Audio Device(s) MPow Air Wireless + Mi Soundbar
Power Supply Enermax Revolution DF 650W Gold
Mouse Logitech MX Anywhere 3
Keyboard Logitech Pro X + Kailh box heavy pale blue switch + Durock stabilizers
VR HMD Meta Quest 2
Benchmark Scores Who need bench when everything already fast?
It is still unclear how AMD will implement 4 way SMT in their future EPYC, are they taking IBM Power PC clustered SMT or go cascade block like SUN SPARC ?
Either way, future Ryzen desktop will likely have same core count, but doubling threads :D
 

HTC

Joined
Apr 1, 2008
Messages
4,664 (0.76/day)
Location
Portugal
System Name HTC's System
Processor Ryzen 5 5800X3D
Motherboard Asrock Taichi X370
Cooling NH-C14, with the AM4 mounting kit
Memory G.Skill Kit 16GB DDR4 F4 - 3200 C16D - 16 GTZB
Video Card(s) Sapphire Pulse 6600 8 GB
Storage 1 Samsung NVMe 960 EVO 250 GB + 1 3.5" Seagate IronWolf Pro 6TB 7200RPM 256MB SATA III
Display(s) LG 27UD58
Case Fractal Design Define R6 USB-C
Audio Device(s) Onboard
Power Supply Corsair TX 850M 80+ Gold
Mouse Razer Deathadder Elite
Software Ubuntu 20.04.6 LTS
It is still unclear how AMD will implement 4 way SMT in their future EPYC, are they taking IBM Power PC clustered SMT or go cascade block like SUN SPARC ?
Either way, future Ryzen desktop will likely have same core count, but doubling threads :D

Doubtful: they'll likely have Epyc with full 4-way SMT, TR with 3-wat SMT and desktop with "standard" 2-way SMT. Done this way also helps tremendously with segmentation.
 
Joined
May 31, 2016
Messages
4,454 (1.41/day)
Location
Currently Norway
System Name Bro2
Processor Ryzen 5800X
Motherboard Gigabyte X570 Aorus Elite
Cooling Corsair h115i pro rgb
Memory 32GB G.Skill Flare X 3200 CL14 @3800Mhz CL16
Video Card(s) Powercolor 6900 XT Red Devil 1.1v@2400Mhz
Storage M.2 Samsung 970 Evo Plus 500MB/ Samsung 860 Evo 1TB
Display(s) LG 27UD69 UHD / LG 27GN950
Case Fractal Design G
Audio Device(s) Realtec 5.1
Power Supply Seasonic 750W GOLD
Mouse Logitech G402
Keyboard Logitech slim
Software Windows 10 64 bit
Doubtful: they'll likely have Epyc with full 4-way SMT, TR with 3-wat SMT and desktop with "standard" 2-way SMT. Done this way also helps tremendously with segmentation.
Or they will all have 4-way SMT because the segmentation is already applied and AMD doesn't need more prominent segmentation. Epyc, TR and desktop. They are already different with features set.
 
Joined
May 2, 2017
Messages
7,762 (2.76/day)
Location
Back in Norway
System Name Hotbox
Processor AMD Ryzen 7 5800X, 110/95/110, PBO +150Mhz, CO -7,-7,-20(x6),
Motherboard ASRock Phantom Gaming B550 ITX/ax
Cooling LOBO + Laing DDC 1T Plus PWM + Corsair XR5 280mm + 2x Arctic P14
Memory 32GB G.Skill FlareX 3200c14 @3800c15
Video Card(s) PowerColor Radeon 6900XT Liquid Devil Ultimate, UC@2250MHz max @~200W
Storage 2TB Adata SX8200 Pro
Display(s) Dell U2711 main, AOC 24P2C secondary
Case SSUPD Meshlicious
Audio Device(s) Optoma Nuforce μDAC 3
Power Supply Corsair SF750 Platinum
Mouse Logitech G603
Keyboard Keychron K3/Cooler Master MasterKeys Pro M w/DSA profile caps
Software Windows 10 Pro
Wow, this thread went off the rails so quickly you would think it was actually four parallelized threads executing on the same hardware.



On a more serious note, hasn't 4-way SMT beein in the cards for Zen3 since the first design goals for this architecture were presented? I explicitly remember reading about this a couple of years ago (and thinking "that sounds very server specific"). Unfortunately can't remember where I read this, but I wouldn't be surprised if it was one of AnandTech's articles (possibly from a Hot Chips presentation or some such?).

Nonetheless, can we please stop arguing about ridiculous semantics, such as where exactly the threshold for "innovation" in the CPU space lies? No, 4-way SMT is nothing new in and of itself, and as stated above, IBM does it in their Power8 arch (and 8-way too), Intel did it in Xeon Phi and Larrabee, and IIRC there are companies working on this for ARM server hardware. So: AMD is not first to do this (think that was IBM?), not the first to do this in a widely distributed chip (IBM again), not the first to do this in x86 (that was Intel), but the first to do this in (what will be) a widely distributed x86-based chip. Does that qualify as innovation? Who knows? And frankly, who cares? It's a new feature in this space regardless of how much or little AMD can run around screaming "FIRST!!!1!!1!one" like a 14-year-old. Server and datacenter customers will love this. Now please stop arguing over meaningless semantics.

As for consumer uses, the questions of the value (and potential of performance loss) are legitimate. SMT inevitably means sharing resources between threads (as scheduling threads that exclusively use different parts of the core is entirely utopian), meaning that one or more threads can and will need to wait for others to finish using the parts of the core that they need next. That means lower ST performance. Then there's the Windows scheduler, which already struggles with unequal cores, as seen with Ryzen 3000 and the widely documented issues of not scheduling demanding tasks to the known fastest core. It will need a rather fundamental revamp for this to be viable for end-user applications at all. Not something that really ought to be an issue for MS, but they'll need to make a serious effort - and they might not want to, as this is the kind of feature they can charge a serious premium for in Windows Server.

My biggest worry is that the focus on this means less focus on architectural IPC improvements (yes, one can argue that better SMT improves IPC, but that's another can of worms) for Zen3. I hope they have enough tricks up their sleeves for another 10% bump or so.
 
Joined
Jan 8, 2017
Messages
9,583 (3.28/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
One doesn't have to argue about SMT and IPC, in terms of percentages SMT brought forward one of the biggest increases in IPC ever, historically speaking.

A two way SMT core can potentially bring 40-50% higher IPC if the conditions are right. Few other features have been this impactful.
 

phill

Moderator
Staff member
Joined
Jun 8, 2011
Messages
17,029 (3.43/day)
Location
Somerset, UK
System Name Not so complete or overkill - There are others!! Just no room to put! :D
Processor Ryzen Threadripper 3970X
Motherboard Asus Zenith 2 Extreme Alpha
Cooling Lots!! Dual GTX 560 rads with D5 pumps for each rad. One rad for each component
Memory Viper Steel 4 x 16GB DDR4 3600MHz not sure on the timings... Probably still at 2667!! :(
Video Card(s) Asus Strix 3090 with front and rear active full cover water blocks
Storage I'm bound to forget something here - 250GB OS, 2 x 1TB NVME, 2 x 1TB SSD, 4TB SSD, 2 x 8TB HD etc...
Display(s) 3 x Dell 27" S2721DGFA @ 7680 x 1440P @ 144Hz or 165Hz - working on it!!
Case The big Thermaltake that looks like a Case Mods
Audio Device(s) Onboard
Power Supply EVGA 1600W T2
Mouse Corsair thingy
Keyboard Razer something or other....
VR HMD No headset yet
Software Windows 11 OS... Not a fan!!
Benchmark Scores I've actually never benched it!! Too busy with WCG and FAH and not gaming! :( :( Not OC'd it!! :(
This sounds really interesting to me... AMD... what are you cooking up now I wonder :)
 
Joined
Sep 28, 2012
Messages
983 (0.22/day)
System Name Poor Man's PC
Processor Ryzen 7 9800X3D
Motherboard MSI B650M Mortar WiFi
Cooling Thermalright Phantom Spirit 120 with Arctic P12 Max fan
Memory 32GB GSkill Flare X5 DDR5 6000Mhz
Video Card(s) XFX Merc 310 Radeon RX 7900 XT
Storage XPG Gammix S70 Blade 2TB + 8 TB WD Ultrastar DC HC320
Display(s) Xiaomi G Pro 27i MiniLED
Case Asus A21 Case
Audio Device(s) MPow Air Wireless + Mi Soundbar
Power Supply Enermax Revolution DF 650W Gold
Mouse Logitech MX Anywhere 3
Keyboard Logitech Pro X + Kailh box heavy pale blue switch + Durock stabilizers
VR HMD Meta Quest 2
Benchmark Scores Who need bench when everything already fast?
The way I read it, AMD gonna maxed out upcoming Zen 3 with current lithography, both in clocks and core count.So in that manner, giving 4 way SMT and higher clock as 7nm maturing, it still give them advantage in competition :rolleyes:

Doubtful: they'll likely have Epyc with full 4-way SMT, TR with 3-wat SMT and desktop with "standard" 2-way SMT. Done this way also helps tremendously with segmentation.

You can only switch on and off actually, so 3 way SMT is not possible :D
 
Joined
Feb 3, 2017
Messages
3,854 (1.33/day)
Processor Ryzen 7800X3D
Motherboard ROG STRIX B650E-F GAMING WIFI
Memory 2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s) INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage 2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s) 42" LG C2 OLED, 27" ASUS PG279Q
Case Thermaltake Core P5
Power Supply Fractal Design Ion+ Platinum 760W
Mouse Corsair Dark Core RGB Pro SE
Keyboard Corsair K100 RGB
VR HMD HTC Vive Cosmos
IIRC there are companies working on this for ARM server hardware.
There are others but they (now owned by Marvell) should be the most prominent one.

A two way SMT core can potentially bring 40-50% higher IPC if the conditions are right. Few other features have been this impactful.
SMT benefit tends to be in 30-35% range for desktop processors (and in general for current server parts). Not sure about calling this IPC though. Yes, it is the same core but a different thread.
 
Joined
May 2, 2017
Messages
7,762 (2.76/day)
Location
Back in Norway
System Name Hotbox
Processor AMD Ryzen 7 5800X, 110/95/110, PBO +150Mhz, CO -7,-7,-20(x6),
Motherboard ASRock Phantom Gaming B550 ITX/ax
Cooling LOBO + Laing DDC 1T Plus PWM + Corsair XR5 280mm + 2x Arctic P14
Memory 32GB G.Skill FlareX 3200c14 @3800c15
Video Card(s) PowerColor Radeon 6900XT Liquid Devil Ultimate, UC@2250MHz max @~200W
Storage 2TB Adata SX8200 Pro
Display(s) Dell U2711 main, AOC 24P2C secondary
Case SSUPD Meshlicious
Audio Device(s) Optoma Nuforce μDAC 3
Power Supply Corsair SF750 Platinum
Mouse Logitech G603
Keyboard Keychron K3/Cooler Master MasterKeys Pro M w/DSA profile caps
Software Windows 10 Pro
There are others but they (now owned by Marvell) should be the most prominent one.
Thanks, I knew I had seen it somewhere :)
One doesn't have to argue about SMT and IPC, in terms of percentages SMT brought forward one of the biggest increases in IPC ever, historically speaking.

A two way SMT core can potentially bring 40-50% higher IPC if the conditions are right. Few other features have been this impactful.
SMT benefit tends to be in 30-35% range for desktop processors (and in general for current server parts). Not sure about calling this IPC though. Yes, it is the same core but a different thread.
That's the can of worms I was alluding to. While it is undoubtedly true that the hardware is processing more instructions per clock cycle, it is only doing so by executing multiple discrete processing threads - which, given the high-level similarity to having multiple cores, is generally not seen as "pure IPC" which is generally a measure of single-threaded instructions per clock cycle (precisely to exclude misleading multi-core comparisons). Muddying this further, it would then (theoretically) be possible to "increase IPC" by improving SMT hardware utilization without affecting ST performance whatsoever - could you then call that an IPC increase? There's a reason I called this a can of worms. And the only feasible solution to managing it is to keep the definition as simple as possible, i.e. limited to single-thread performance (regardless of the validity of arguments for including SMT).
 
Joined
Jan 8, 2017
Messages
9,583 (3.28/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
SMT benefit tends to be in 30-35% range for desktop processors (and in general for current server parts). Not sure about calling this IPC though. Yes, it is the same core but a different thread.

I wrote stuff that scales well into the 40% range when SMT is enabled myself, granted that is with ideal conditions, no branching, coalesced memory access, etc. 40% is realistic for compute intensive tasks in servers and desktop. The only reasons SMT doesn't scale that well with your average consumer software is because most of the time enough ILP can be extracted without the need of multiple hardware threads or the bottleneck is somewhere else.

There aren't a million ways to increase IPC, instruction level parallelism is pretty much then only way to do it in modern CPUs, the decode/add/multiply/etc logic has been optimized to death already. SMT does just that, it increases the ILP per core, there is no reason to say it doesn't increase IPC.

If my software is say 10% faster when SMT is enabled what else could that possibly mean other than the fact that the average IPC has increased ?
 
Joined
Sep 17, 2014
Messages
22,843 (6.06/day)
Location
The Washing Machine
System Name Tiny the White Yeti
Processor 7800X3D
Motherboard MSI MAG Mortar b650m wifi
Cooling CPU: Thermalright Peerless Assassin / Case: Phanteks T30-120 x3
Memory 32GB Corsair Vengeance 30CL6000
Video Card(s) ASRock RX7900XT Phantom Gaming
Storage Lexar NM790 4TB + Samsung 850 EVO 1TB + Samsung 980 1TB + Crucial BX100 250GB
Display(s) Gigabyte G34QWC (3440x1440)
Case Lian Li A3 mATX White
Audio Device(s) Harman Kardon AVR137 + 2.1
Power Supply EVGA Supernova G2 750W
Mouse Steelseries Aerox 5
Keyboard Lenovo Thinkpad Trackpoint II
VR HMD HD 420 - Green Edition ;)
Software W11 IoT Enterprise LTSC
Benchmark Scores Over 9000
These are so random scores, but yeah most of the time when SMT is off frames are higher.
As core counts is already more than enough for gaming, Why AMD isnt making something similar to what is being done in some mobile phones? Half of cores with high clocks and others with low clocks for other apps/general use and no SMT at al? Is it that hard really?

Ehh. What?

When SMT is off frames are higher ONLY flies if the game is the only thing you run - and ONLY in a very tiny subset of actual games. The examples are very, very rare and the gain is very very minimal. The biggest advantage for no SMT (actually: no HT -) is that your CPU might clock a tiny bit higher, and thát gives you an edge in game FPS. Another small one, at best. Its the type of minmaxing for the last 2%, best case... and situationally.

Definitely worth the trade off to just keep SMT... as soon as the CPU can spend time using SMT on the same core as the game, there are more real cores available to process the game. @FordGT90Concept that's how it works both ways I would say. And if you're using BOINC... and gaming... is a small sacrifice so problematic for the ability to multi task like that? And... how many people actually do this?

I think HT/SMT have long since proven to be a negligible drawback versus a noticeable win, despite lots of testing to prove otherwise, very little has ever been found and if it was there, it wasn't much at all.

Because you have four threads sharing the same underlying execution resources. If one of those is a game, and the others are something like BOINC, only 25%-40% of the execution time is spent on game thread which means fewer frames per second. Server loads, they care about efficiency over response times which is diametrically opposed to what games (consumer in general) needs.

I would say, look at the degree of control you have over an AMD CPU. If it does harm performance, just disable it, and nobody loses anything, right?
 
Last edited:
D

Deleted member 178884

Guest
This reminds me of the "EXCLUSIVE" zen3 has 4 way SMT from redgamingtech and numerous other crap rumors, I'll believe it when I see it which probably won't happen because it's a next to no chance of 4 way SMT happening.
 
Joined
Jul 17, 2011
Messages
87 (0.02/day)
System Name Custom build, AMD/ATi powered.
Processor AMD FX™ 8350 [8x4.6 GHz]
Motherboard AsRock 970 Extreme3 R2.0
Cooling be quiet! Dark Rock Advanced C1
Memory Crucial, Ballistix Tactical, 16 GByte, 1866, CL9
Video Card(s) AMD Radeon HD 7850 Black Edition, 2 GByte GDDR5
Storage 250/500/1500/2000 GByte, SSD: 60 GByte
Display(s) Samsung SyncMaster 950p
Case CoolerMaster HAF 912 Pro
Audio Device(s) 7.1 Digital High Definition Surround
Power Supply be quiet! Straight Power E9 CM 580W
Software Windows 7 Ultimate x64, SP 1
HT, it's Intel's Hyper Threading© Technology.....
HT is a marketing name. SMT is the idea behind it.
And to not have heard about Xeon Phi is quite an achievement for a "PC enthusiast"
It seems you're really new in this...
You're wrong, both of you.
HT is commonly known as the abbreviation for AMD's HyperTransport.
HTT it is what Intel's Hyper-Threading Technology is commonly shortened to.

You are welcome!
 
Joined
Jun 28, 2016
Messages
3,595 (1.15/day)
You're wrong, both of you.
HT is commonly known as the abbreviation for AMD's HyperTransport.
HTT it is what Intel's Hyper-Threading Technology is commonly shortened to.
"Intel HT Technology" - HT is the official acronym.

I'm not sure I've ever seen anyone use "HTT".

Putting the technological terminology differences aside, I'm not even sure if we agree on the meaning of "commonly"...

132667
 
Joined
Jun 10, 2014
Messages
3,010 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
One doesn't have to argue about SMT and IPC, in terms of percentages SMT brought forward one of the biggest increases in IPC ever, historically speaking.

A two way SMT core can potentially bring 40-50% higher IPC if the conditions are right. Few other features have been this impactful.
As some have pointed out, IPC is single thread. What you probably meant is saturation of the core resources, but it's important to understand that SMT even i perfect conditions never exceeds the performance of a single "optimal" thread. It's simply a way to let other threads utilize the resources the other thread doesn't use, scaling towards one "optimal thread".

There are several factors that impacts IPC. One way is to add more execution resources (ALUs, FPUs, AGUs etc.) which boosts your peak performance, but can leave resources unsaturated. Secondly, there are front-end, latency and cache improvements which improve the utilization of the execution resources you already have. Since SMT relies on exploiting idle resources of the CPU core for other threads, the ever increasing efficiency of CPU architectures is actually making SMT less and less useful for generic tasks, as efficiency gains in front-end and cache will ultimately consume the "gains" of SMT.

SMT was introduced at a time when single core CPUs were mostly idle due to stalls in the CPU pipeline, and the cost of implementing SMT in silicon was minuscule. But these days as the gains of SMT are shrinking, and the security implications of SMT makes the silicon costs ever increasing, it's actually time to drop it, not extend it further with 4-way or even 8-way SMT. Today, SMT only really makes sense for server workloads where latency is irrelevant and total throughput of massive amounts of requests (or work items) is the primary goal. SMT is really a relic of the past, and 2020 is not the year to push it further.

While future gains in CPU performance wouldn't get close to the improvements we saw in the 80s and the 90s, it's important to remember that the reason "stagnant" single thread performance for the last ~4+ years is not due to any theoretical performance limit in IPC. Obviously we are now at a "clock wall" for the current type of semiconductors, but the primary reason for the (Intel's) stagnant CPU selection is the node problems causing two years of delays to Ice Lake(Sunny Cove), which they claim offer 18% IPC gains. Both Intel and AMD have their 2-3 next architectures lined up, and theoretically it is absolutely possible to achieve ~50% better IPC over Skylake with just continuing to add more execution resources, improving cache, reducing latency and improving the front-end.

But even beyond that, single thread performance will not hit a wall any time soon. Quite the opposite, we are now on the verge of the largest single thread gain since the 90s. Since Pentium(1993), x86 CPUs have become increasingly superscalar, which obviously does wonders for peak performance, but also keeps widening the gap between minimum and average vs. peak performance, as the CPU becomes more sensitive to the code to keep the resources fully saturated. As anyone familiar with machine code would know, there are two major causes for this lack of saturation; cache misses and branch mispredictions. Optimizing for cache misses can be done fairly efficiently, but branch mispredictions are harder to deal with. Largely it's about removing bloat, but you will usually still have enough of it left to hold back performance. And in the greater scope of even a function, most branching only have local effects, but the CPU can't know that, so when there is a branch misprediction it has to flush the pipeline, even if some of the calculations may still be "good". This is because a lot of context is lost between your high level code and machine code, and even the best prediction models will only get you so far without getting some extra "help". I know Intel is researching a solution to this problem, where basically you have these dependencies between branching implied in machine code (e.g. this branch only affects this code over here, but not the bigger flow of the program), I believe they call it "threadlets" or something, and would probably done by having chains of instructions that are independent of branching in others, like sort of a "thread" that only exists virtually for a few dozen instructions. While this would at least require recompilation of software, it would greatly improve the CPU front-end's ability to reason about true dependencies between calculations, instead of having to assume the pipeline needs to be flushed. Gains in single threaded performance of 2-3x should not be unreasonable. While what I'm describing here may seem a little out of scope, it's actually not, as this would practically eliminate SMT. But don't expect this to be implemented in shipping products yet, it's still experimental, I would expect it 5-10 years down the road.

I wrote stuff that scales well into the 40% range when SMT is enabled myself, granted that is with ideal conditions, no branching, coalesced memory access, etc. 40% is realistic for compute intensive tasks in servers and desktop. The only reasons SMT doesn't scale that well with your average consumer software is because most of the time enough ILP can be extracted without the need of multiple hardware threads or the bottleneck is somewhere else.
Actually, you got this the wrong way. In ideal conditions, SMT would not be needed at all, the only reason why there are gains from SMT is that threads don't saturate the CPU enough. When you have ideal software as you said, branch and cache optimized, it will saturate the CPU very well.

SMT is mostly useful for server workloads where you have an "endless" supply of "work chunks" that can be done in parallel, very typical for a server running worker threads for Java code or scripts. This is code which can't be cache optimized and is heavily abstracted, so the CPU will more or less constantly stall. This is where 4-way and even 8-way SMT makes sense (like Power CPUs), and even then the execution part of the CPU will be largely idle, the bottleneck will be the front-end and the caches, otherwise you could make a 32-way SMT CPU and scale on.

If my software is say 10% faster when SMT is enabled what else could that possibly mean other than the fact that the average IPC has increased ?
Oh, there can be so many, too much to discuss here. It depends how many threads you spawn, how they are synchronized and of course how your application is "disturbed" by background threads.
 
Top