Wednesday, July 14th 2021

AMD Zen 4 Desktop Processors Likely Limited to 16 Cores, 170 W TDP

We have recently seen several reputable rumors confirming that AMD's Zen 4 Raphael desktop processors will be limited to 16 cores with 2 compute units. There were previous rumors of a 24 core model with 3 compute units however that now seems unlikely. While the core counts won't increase some skews may see a TDP increase up to 170 W which should offer some performance uplift. AMD is expected to debut their 5 nm Zen 4 Raphael desktop processors in 2022 which will come with support for PCIe 5.0 and DDR5. The processors will switch to a new AM5 LGA1718 socket and will compete with Intel's Alder Lake-S successor Raptor Lake which could feature 24 cores.
Source: @patrickschur_
Add your own comment

75 Comments on AMD Zen 4 Desktop Processors Likely Limited to 16 Cores, 170 W TDP

#51
eidairaman1
The Exiled Airman
moproblems99I thought only single core mattered and moar cores were for fools?
My earlier video proved that even a 8350 provides snooth performance due to extra cores.
Posted on Reply
#52
Unregistered
I love how people think no moar cores = stagnation. As if 16 cores at the top end mainstream chip isn't enough. You are really stepping into HEDT territory after that. I'd rather have them tune the cores further and make them faster with better boost clocks. My 5900X hitting 5.1 GHz and over consistently in lightly threaded workloads and games (on a 120mm tower air cooler mind you) is already pretty good, and a massive upgrade from my 3900X who only saw 4.35 at best in games.

Looks like AMD went from "moar cores" to "moar cache" and that's good. 16 MB L3 on the 11900K is a laughing stock.
Posted on Edit | Reply
#53
dragontamer5788
EmilyI love how people think no moar cores = stagnation. As if 16 cores at the top end mainstream chip isn't enough. You are really stepping into HEDT territory after that. I'd rather have them tune the cores further and make them faster with better boost clocks. My 5900X hitting 5.1 GHz and over consistently in lightly threaded workloads and games (on a 120mm tower air cooler mind you) is already pretty good, and a massive upgrade from my 3900X who only saw 4.35 at best in games.

Looks like AMD went from "moar cores" to "moar cache" and that's good. 16 MB L3 on the 11900K is a laughing stock.
It should be noted that the Apple M1 chip is just 4 cores, but is winning in a fair number of benchmarks.

128kB L1 cache and double the execution width / reorder buffer size (reorder 700-instructions vs 300ish on AMD Zen / Intel Skylake) and Apple M1 proves that there's still a market for "fewer, better cores". I reject discussion points about the x86 decoder width (ARM had a smaller decoder width than x86 for years. The reason why no one made an 8-way decoder was that no one thought there was a market for an 8-way decoder IMO. If Intel / AMD wanted to do it, I'm pretty sure they can do it).

Wider cores vs more cores vs SIMD-width is an interesting problem. There's lots of different ways to configure a CPU-core, and this competition is quite exciting. We're seeing different designs again, after years of stagnation.
Posted on Reply
#54
Unregistered
sam_86314AMD pulling an Intel here?

Sounds like they're satisfied with their place in the market and have switched to stagnation mode.
Because not adding more cores means they are unable to optimize the cores themselves further. We have the first heated competition in ages, stagnation from either party would be damning and they know it. Well, not sure about Intel, whose 11900K manages to perform worse than the 10900K in some cases. G E N E R A T I O N A L P E R F O R M A N C E U P L I F T
Posted on Edit | Reply
#55
dragontamer5788
EmilyBecause not adding more cores means they are unable to optimize the cores themselves further. We have the first heated competition in ages, stagnation from either party would be damning and they know it. Well, not sure about Intel, whose 11900K manages to perform worse than the 10900K in some cases. G E N E R A T I O N A L P E R F O R M A N C E U P L I F T
You probably know this, but... note that a lot of "IPC" performance is from out-of-core situations.

Adding L3 cache (the "Stacked" SRAM) to 96MB per chip will likely improve instructions-per-count even if the cores are unchanged. In fact, Apple's M1 chip is said to have some of the best "uncore" features (features / benchmarks from outside of a core), such as ARM's relaxed memory model PLUS support for total-store order for those x86 emulators (Rosetta). Also Apple's chip seems to have among the best latency to/from its DRAM modules.

So even if "cores" are stagnating, there are many ways to improve a chip. AMD's I/O chip is clearly a bottleneck (that solved other bottlenecks). I'm sure that future advancements in that I/O chip will have dramatic improvements to Ryzen / Threadripper / EPYC, even if the cores themselves remain mostly unchanged. And even then: I expect AMD will also be working on improving those cores. The march of progress never stops in the tech world.
Posted on Reply
#57
dragontamer5788
EmilyTime to throw away my 5900X and buy M1 lol
Lol well, I definitely think 4 big cores is not enough for the modern computer (and the 4-little cores are so low on performance that I basically ignore them). So the M1 is a good laptop chip but definitely not desktop class.

I mostly use the M1 as proof that single-threaded performance can get better. Doubling the executing pipelines and decoder width is the "obvious" way to improve single-threaded performance... and could very well apply to Intel / AMD if they had the will to do it. I'm not sure if the tradeoff is worth it however: having 8-cores of Zen3 size vs 4-cores of M1 size is... probably more beneficial to the 8-core side.

But I don't expect the consumer market to go beyond 16 cores yet. Having 32-cores of Zen3 size vs 16-cores of M1 size... well... that probably is beneficial to the M1, because not even x265 works well above 16-threads.

There's a bit of Ahmdal's law going on, and a bit of Gustafson's law going on.
Posted on Reply
#58
maxfly
What exactly do we know about the new architecture other than core count at this point? Other than assumptions and guesses of course...?

And the obvious.
Posted on Reply
#59
RoutedScripter
Bah, what a letdown ... but it's kinda in line with the rumor of Raphael not being the top tier, perhaps this isn't the high-end on desktop, so 2 new and still no more than 16 threads, this is a classic sign of when corporations become comfortable and it just stops, hopefully intel does more than 16 to get the competition going, because this is a disgrace. Don't need so much thread on Desktop, yes you do if you want to stream, play, and do many things without hassle, ofcourse you can't prove right now you'd need so much cores on a normal gaming/workstation PC, because so much never existed before, newer opportunities would spring up how such power could be used but in many cases it first needs to be opened up and then the market can figure out how to use it, hopefully. It's really boring listening to the experts saying how "so many threads isn't useful on destkop", such a near sighted statement.

On the other hand, I would take a much better single-core performance indeed, I use cases where it's more important!
Posted on Reply
#60
Dux
170W TDP? Eeeeeeh. Seems to me like power consumption is getting out of control in CPU and especially GPU market.
Posted on Reply
#61
stimpy88
170W TDP for a 5nm 16 core CPU... Nah, I don't believe that TDP number is for a 16 core SKU, at all.
Posted on Reply
#62
Makaveli
TheinsanegamerNWere the most recent BF games benchmarked in 2010? I dont think so. Unless you came back from the future nobody was doing so. For games of 2010, the 970 was no faster, and oftentimes was a bit slower due to lower clock speeds then the consumer quad cores. Most consumer software of the time could not use more then 4 cores, or even more then 2 cores.

And while your 970 can still play games, I'd bet good money that a current gen i3 quad core would absolutely smoke it. Single core performance has come a long way in the last decade, as has multicore performance. One can look up benchmarks for an i3-10320 and see it is ranked slightly higher then a i7-980.


You suggested that the fundamental change for intel performance wise was due to DRAM speed increases instead of core changes. That, you yourself, have proven incorrect, as both haswell and the core 2 quad can use DDR3. You said it, not me.


Well, when 8 cores finally isnt enough, AMD already has 12 and 16 cores available for you! and by that time, AMD will have likely have higher then 16 core parts available.

Funny how when the 2000 series came out and still features 8 cores nobody was loosing their mind on AMD "stagnating", yet after the same amount of time at 16 cores suddenly AMD is worse then intel. This would be like people complaining that intel was stagnating right after sandy bridge came out because it was still 4 cores when the vast majority of games were still single threaded. By the time it's relevant this AM4/AM5 platform will be ancient history.
Dude I was playing BFV on a i7-970 on my current RX580 in 2019 before I upgraded and it was very playable.

My cpu was clocked at 4Ghz I also have a i7-920 I swapped in for testing and with the same setup I had much lower frame rate and drops you clearly felt the two missing cores. That is from my first hand experience and you going to tell me i'm wrong?
Posted on Reply
#63
AusWolf
170 W is a lot of power for a 5 nm chip. Is AMD going to supply a 360 mm AIO with every unit?
Posted on Reply
#64
mtcn77
AusWolf170 W is a lot of power for a 5 nm chip. Is AMD going to supply a 360 mm AIO with every unit?
It doesn't have to run bad if they didn't detune it at the factory as done before. Previous chips were already maxing their tdp at stock. Not very smart.
Posted on Reply
#65
Richards
TumbleGeorge
U learn physics? Is not enough to know only to count number to has right opinion for this case I think :rolleyes:
But wait there's more: for to made 7nm and 5nm TSMC use different models machines from ASML. LoL :)
Its all marketing thats why they 7nm hardly beats intel's ancient node
AusWolf170 W is a lot of power for a 5 nm chip. Is AMD going to supply a 360 mm AIO with every unit?
It shows its still 7nm rebranded there's no 20% lower power
Posted on Reply
#66
Mussels
Freshwater Moderator
maze100the difference between a 1800x (zen 1 8 core) and 5800x (zen 3 8 core, without the optional v cache the arch support) is night and day, AMD improved ST performance by more than 50% and gaming perf by 2x - 3x

intel sandy to skylake (3 newer architectures) is no where near that
Thats what i was trying to show when i was half asleep with the images a few posts back
AMD in 2 gens, made a bigger % change than intel did in 5 gens

AMD got us from 4 cores average to 8 cores average pretty fast, and then focused on making the cores faster. That works for me (and most people) because the majority of software has yet to catch up to use all those extra threads yet (which is why 6 cores like the 5600x are still amazing for gaming)
Posted on Reply
#67
Minus Infinity
OMG I'll only be able to have 16 cores. Dammit I wanted 32 cores at 4 core prices.

Honestly, what normal person, even a power user would want more than 16 cores. They make threadripper for a reason, you pay to play if you need workstation class CPU.

Even as someone that runs fluid sims and such, my next CPU will be 12 core 6900X.
Posted on Reply
#68
bencrutz
Minus InfinityEven as someone that runs fluid sims and such, my next CPU will be 12 core 6900X.
you should aim for 6999x :D
Posted on Reply
#69
TumbleGeorge
RichardsIts all marketing thats why they 7nm hardly beats intel's ancient node
Why one AMD Threadripper with 64 cores beat in multitread with many one Intel with 56 cores? With most high difference than difference between number of cores? Questions, questions, questions. Answer is very complex. Yes for us is easy to to ignore some of the factors and to point out only the most prominent ones. Different core architectures, different frequencies, different TDP, more suitable for some purposes than for others.
Posted on Reply
#70
Ellothere
Seems more like a limit of 2 ccx’s and they have increased the amount of cores per ccx in the past. Just tossin that thought in the mix.
Posted on Reply
#71
Octopuss
sam_86314AMD pulling an Intel here?

Sounds like they're satisfied with their place in the market and have switched to stagnation mode.
What the hell do you need more than 16 cores on desktop for? Heck, more than 8 actually...
Posted on Reply
#72
AusWolf
RichardsIts all marketing thats why they 7nm hardly beats intel's ancient node


It shows its still 7nm rebranded there's no 20% lower power
It's not just that. The smaller you shrink your die(s) while letting them use the same (or more) power, the harder it/they will become to cool (it's called power density). Unfortunately, I learned this the hard way with Zen 2 - they're pretty much unsuitable for SFF builds, even though they're great products on their own merits. That's why I'm rocking 11th gen Intel. I would have stayed with AMD otherwise.
Posted on Reply
#73
milewski1015
RaijuStronger single-core performance is more versatile than having a load of weaker cores that only performs well in compute scenarios.
How do you figure? If your CPU is excellent at single core performance but very little cores, your usefulness in multi-core workloads is extremely hindered. Whereas if you had a CPU with only moderate to good single core performance but had more cores, you'd perform adequately in both single and multi-core workloads. Versatile is defined as "able to adapt or be adapted to many different functions or activities". A CPU that is only good at single-core workloads isn't versatile by definition.
Posted on Reply
#74
Punkenjoy
dragontamer5788There's a 2nd law: en.wikipedia.org/wiki/Gustafson's_law

When you can't scale any higher, you make the work harder. Back in the 00s, why would you ever go above 768p ?? But today, we are doing 1080p regularly and enthusiasts are all in 4k. With literally 16x the pixels, you suddenly have 16x more work and computers can scale to 16x the size.

4k seems to be the limit for how far is reasonable (at least for now). But there's also raytracing coming up: instead of making more pixels, we're simply making "each pixel" much harder to compute. I expect Gustafson's law to continue to apply until people are satisfied with computer-generated graphics. (and considering how much money Disney spends on Pixar rendering farms... you might be surprised at how much compute power is needed to get a "good cartoony" look like Pixar's Wreck it Ralph, let alone CGI Thanos in the movies).

Video game graphics are very far away from what people want still. The amount of work done per frame can continue to increase according to Gustafson's law for the foreseeable future. More realistic shadows, even on non-realistic games (ex: Minecraft with Raytracing or Quake with Raytracing) wows people.

--------------

For those who aren't in the know: Amdahl's law roughly states that a given task can only be parallelized to a certain amount. For example: if you're looking forward 1-move in Chess, the maximum parallelization you can get is roughly 20 (there are only ~20 moves available at any given position).

Therefore, when the "chess problem" is described as "improve the speed of looking forward 1-move ahead", you run into Ahmdal's law. You literally can't get any faster than 20x speedup (one core looking at each of the 20-moves available).

But... Gustafson's law is the opposite, and is the secret to scaling. If the "chess problem" is described as "improve the number of positions you look at within 30 seconds", this is Gustafson's law. If you have 400-cores, you have those 400-cores analyze 2-moves ahead (20x for the 1st move, 20x that for the 2nd move). All 400-cores will have work to do for the time period. When you get 8000 cores, you have one core look at 3 moves ahead (20x20x20 positions), and now you can look at 8000 positions in a given timeframe. Etc. etc. As you can see: "increasing the work done" leads to successful scaling.

----------

Some problems are Amdahl style (render this 768p image as fast as possible). Some other problems are Gustafson's (render as many pixels as you can within 1/60th of a second). The truth for any given problem lies somewhere in between the two laws.
The main problem of Gustafson's Law is Amdahl Law still apply.

And that also prove my point too, you talk about resolution and graphics. Like i said, things that can be easily parallelized are being run on accelerator like GPU.

It's true that you can increase the difficulty of the work of each section that can run independently of others, but you will still be limited by how fast you can run the single threaded portion of the code. By example. in your chess example, you still have to determine what are the move that you want to go with. You also still have to allocate all the move to all the cores so they can calculate it so they don't do duplicate work.

In the end, the workload that CPU will continue to run effectively in the future are code that tend to be more branch dependent or N+1 problem. Problem that are easily parallelized will be either accelerated using accelerator like GPU, fixed function like Quicksync or wide SIMD like AVX512 (That can even be widened further).
Posted on Reply
#75
dragontamer5788
PunkenjoyAnd that also prove my point too, you talk about resolution and graphics. Like i said, things that can be easily parallelized are being run on accelerator like GPU.
Trying to be run on a GPU.

Most movie renderers remain CPU-renderers today, because the GBs of texture + vertex data do not fit on the a measly 20GBs. Ex: Moana scene is like 93GBs base + 130GBs for animations (www.disneyanimation.com/resources/moana-island-scene/). We all know raytracing is best on raytracing-accelerated GPUs, but all that special hardware doesn't matter if the scene literally doesn't even fit in its RAM.

And Moana was rendered over 5 years ago. Today's movies are bigger and more detailed.

And before you say it: yeah, I know about the NVidia DGX + NVSwitch. But that'd still require a "remote access" of RAM if you were to distribute the scenes out to that architecture. It'd probably work but I don't think any such renderer exists yet. There's some fun blogposts about people trying to make the CPU+GPU team work across movie-scale datasets like the Moana scene (CPU acts as a glorified RAM-box, passing the needed data to the GPU. GPU renders the scene in small pieces that can fit inside of 8GB or 16GB chunks). But that's the stuff of research, and not practice.
Posted on Reply
Add your own comment
Nov 25th, 2024 01:11 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts