Possible AMD "Vermeer" Clock Speeds Hint at IPC Gain

BoboOOZ · May 23, 2020

HenrySomeone said:
Only up to a point. For instance, 3800x might and I repeat might offer slightly better experience than let's say 10600k at the very end of both chips' usability, like around 5 years from now, but both will be struggling by then, since single thread advancements will continue to be important despite what leagues of AMD fan(boy)s would tell you. 3900x (or 3950x for that matter) will never get you better (while still objectively good enough) framerates than 9900k / 10700k though, of that I am completely certain. A fine example are old 16 core Opterons compared even with 8 core FX chips (that clocked much better), not to mention something like a 2600k.

The only up to a point I definitely agree with.

The rest is more blurry, and it depends entirely on what direction the gaming industry will take from now on. At present, we are GPU bound in most games with high settings. Developers could take the approach of shifting more load towards the CPU, and parallelize massively their code, or could choose to do the minimum necessary to see performance improvements. Since I don't have my crystal ball, I have no way to predict which will occur.
But technically, it is perfectly possible to code an application in such a way that it runs faster on a 16 core with 85% clock speed versus on a 10 core at 100% clock speed. It has already been done for many applications and it can be done for the game too. When exactly this will happen, it's hard to predict, but it has to happen if we are to play open-world games at 400 fps in the future. Your example with opterons vs fx is flawed, because is based on insufficiently parallelized applications.

The last part I am pretty sure I completely disagree, I am quite sure future improvements of processor performance will rely much more on core counts than on IPC. And clock frequencies will stagnate at best, as both Intel and AMD continue to shrink their nodes, we will have 64 cores in a few years for home computers, but we will never reach 6GHz.

efikkan · May 24, 2020

BoboOOZ said:
Any large workload can be parallelized, if it's big enough, it means it can be broken into pieces that can be dealt with separately.

Everything has a cost. If you divide it into too small work chunks you'll end up with too much overhead. There are also dependencies which requires things to be executed in sequence (like a pipeline), limiting how large work chunks you can create before synchronizing. Another implication of this is just mutation of data, and those thinking that throwing in mutexes everywhere will solve it would be ignorant, you'll quickly end up with stalled threads and even deadlocks. Additionally there is OS scheduling overhead, which quickly can add latency up to 1ms or more, which becomes significant when your entire workload lives within a few ms window and you're trying to sync up hundreds or thousands of times per frame, but not significant if you have giant work chunks taking several seconds or even minutes.

BoboOOZ said:
Are you really trying to say that the PS5 will get by with doing most of the work on one 2GHz core?

What? I've never said anything like that.
I pointed out that we've had 8-core consoles for nearly 7 years now. Many predicted this would cause "fully multithreaded games" within "a couple of years" back when Xbox One and PS4 launched, just like you are doing now. But it didn't happen, because rendering doesn't work that way.

BoboOOZ said:
I think what you do not understand is the fact that having all the calls to the graphic API coming from a single thread doesn't equate at all to the fact that that thread is doing all the computing.

I understand fully, and I never claimed so either.
As I said, having multiple threads building a single queue makes no sense. Which is why the amount of threads interfacing with the GPU is limited to the distinct tasks (rendering passes or compute workloads) which can be separated and potentially parallelized.
Additionally, the driver uses up to several thread on its side, but that's out of our control.

BoboOOZ said:
The rest is more blurry, and it depends entirely on what direction the gaming industry will take from now on. At present, we are GPU bound in most games with high settings.

Well, that's the goal.
Ideally, all games should be GPU bound. That way you get the graphics performance you paid for.

BoboOOZ said:
Developers could take the approach of shifting more load towards the CPU, and parallelize massively their code, or could choose to do the minimum necessary to see performance improvements. Since I don't have my crystal ball, I have no way to predict which will occur.

This clearly illustrates that you don't understand how games works.
GPUs have specialized hardware to do all kinds of rendering tasks, like creating verticies, tessellation, rasterization, texture mapping, etc. While all of these can technically be emulated in software, the performance would be terrible.

Offloading the GPU to the CPU would be to go backwards. And what would be the purpose? GPUs are massively powerful at dense math, while CPUs are better at logic. Games typically do a lot of the logic parts on the CPU while building a queue, then sends batches of data to the GPU which it does the best.

I wouldn't advice anyone to base anything from their crystal balls, but rather understand the field of expertise before attempting to do qualified predictions. But I can list some of the current trends;
* GPUs are becoming more flexible, and the rendering pipeline is increasingly programmable. Some games will take advantage of this, and thus become less CPU bottlenecked.
* Most games are using a few generalized game engines. These engines are increasingly bloated, and will to some extend counteract the improvements on the GPU side. Some of these may spawn a lot of threads, most of which does fairly little work at all. The ever-increasing bloat of these engines is also responsible for the lack of benefits from lower API overhead in DirectX 12.
* Resource streaming will be more utilized, but slowly.
* GPU accelerated audio will probably become common eventually.

BoboOOZ said:
But technically, it is perfectly possible to code an application in such a way that it runs faster on a 16 core with 85% clock speed versus on a 10 core at 100% clock speed. It has already been done for many applications and it can be done for the game too.

I'm sorry, but this is where you go wrong. An arbitrary piece of code can't be parallelized any way you want, see my first paragraph.
A workload which mostly consists of large work chunks which can be processed mostly async can scale to almost any core count. The need for synchronization increase the overhead for additional threads, so with more synchronization a workload will always have a diminishing return with increasing thread count. Games is one of the workloads which is on the "worst" end of this scale.

BoboOOZ said:
I am quite sure future improvements of processor performance will rely much more on core counts than on IPC. And clock frequencies will stagnate at best, as both Intel and AMD continue to shrink their nodes, we will have 64 cores in a few years for home computers, but we will never reach 6GHz.

You're right about clock speeds stagnating (at least until we have different types of semiconductors), but the way forward is a balanced approach between more cores, more superscalar and more SIMD. Many are forgetting that the performance per core is the base scaling factor for multithreaded performance. Since most non-server workloads does not scale linearly, having faster cores actually helps you scale to more cores and suffer less from OS overhead. The key here is to strike the right balance between core count and core speed for your workload.

BoboOOZ · Sep 14, 2020

efikkan said:
Everything has a cost. If you divide it into too small work chunks you'll end up with too much overhead. There are also dependencies which requires things to be executed in sequence (like a pipeline), limiting how large work chunks you can create before synchronizing. Another implication of this is just mutation of data, and those thinking that throwing in mutexes everywhere will solve it would be ignorant, you'll quickly end up with stalled threads and even deadlocks. Additionally there is OS scheduling overhead, which quickly can add latency up to 1ms or more, which becomes significant when your entire workload lives within a few ms window and you're trying to sync up hundreds or thousands of times per frame, but not significant if you have giant work chunks taking several seconds or even minutes.

What? I've never said anything like that.
I pointed out that we've had 8-core consoles for nearly 7 years now. Many predicted this would cause "fully multithreaded games" within "a couple of years" back when Xbox One and PS4 launched, just like you are doing now. But it didn't happen, because rendering doesn't work that way.

I understand fully, and I never claimed so either.
As I said, having multiple threads building a single queue makes no sense. Which is why the amount of threads interfacing with the GPU is limited to the distinct tasks (rendering passes or compute workloads) which can be separated and potentially parallelized.
Additionally, the driver uses up to several thread on its side, but that's out of our control.

Well, that's the goal.
Ideally, all games should be GPU bound. That way you get the graphics performance you paid for.

This clearly illustrates that you don't understand how games works.
GPUs have specialized hardware to do all kinds of rendering tasks, like creating verticies, tessellation, rasterization, texture mapping, etc. While all of these can technically be emulated in software, the performance would be terrible.

Offloading the GPU to the CPU would be to go backwards. And what would be the purpose? GPUs are massively powerful at dense math, while CPUs are better at logic. Games typically do a lot of the logic parts on the CPU while building a queue, then sends batches of data to the GPU which it does the best.

I wouldn't advice anyone to base anything from their crystal balls, but rather understand the field of expertise before attempting to do qualified predictions. But I can list some of the current trends;
* GPUs are becoming more flexible, and the rendering pipeline is increasingly programmable. Some games will take advantage of this, and thus become less CPU bottlenecked.
* Most games are using a few generalized game engines. These engines are increasingly bloated, and will to some extend counteract the improvements on the GPU side. Some of these may spawn a lot of threads, most of which does fairly little work at all. The ever-increasing bloat of these engines is also responsible for the lack of benefits from lower API overhead in DirectX 12.
* Resource streaming will be more utilized, but slowly.
* GPU accelerated audio will probably become common eventually.

I'm sorry, but this is where you go wrong. An arbitrary piece of code can't be parallelized any way you want, see my first paragraph.
A workload which mostly consists of large work chunks which can be processed mostly async can scale to almost any core count. The need for synchronization increase the overhead for additional threads, so with more synchronization a workload will always have a diminishing return with increasing thread count. Games is one of the workloads which is on the "worst" end of this scale.

You're right about clock speeds stagnating (at least until we have different types of semiconductors), but the way forward is a balanced approach between more cores, more superscalar and more SIMD. Many are forgetting that the performance per core is the base scaling factor for multithreaded performance. Since most non-server workloads does not scale linearly, having faster cores actually helps you scale to more cores and suffer less from OS overhead. The key here is to strike the right balance between core count and core speed for your workload.

Revival of an old thread, but here's a new game that requires 8 cores as recommended settings. We'll see more and more of these, especially in open-world games:

You'll need some serious hardware to play Serious Sam 4

The Serious Sam series has never been known for its graphics, and while the next installment does look nice, it's certainly not on par visually with some...

www.techspot.com

Vayra86 · Sep 14, 2020

BoboOOZ said:
Revival of an old thread, but here's a new game that requires 8 cores as recommended settings. We'll see more and more of these, especially in open-world games:

You'll need some serious hardware to play Serious Sam 4

The Serious Sam series has never been known for its graphics, and while the next installment does look nice, it's certainly not on par visually with some...

www.techspot.com

If a simple shooter requires 8 cores at 3.3 Ghz I seriously question this devs' sanity, capability and overall product. They say its needed for thousands of actors. Hello, Total War wants a word?

Its also a nice way to identify shit console ports.

BoboOOZ · Sep 14, 2020

Vayra86 said:
If a simple shooter requires 8 cores at 3.3 Ghz I seriously question this devs' sanity, capability and overall product. They say its needed for thousands of actors. Hello, Total War wants a word?

SS is a great single player shooter, can' wait to see the new one.
Total war is a great game too, but all those soldiers do not have individual AI, the AI is at unit level. The complexity is left only to the GPU for rendering. In SS all these monsters are trying to find you and come at you.

Vayra86 said:
Its also a nice way to identify shit console ports.

I'm not that sure, all the previous versions were PC first. You can play on a 4 core, too, you'll just play at 720p/30fps

Vayra86 · Sep 14, 2020

BoboOOZ said:
SS is a great single player shooter, can' wait to see the new one.
Total war is a great game too, but all those soldiers do not have individual AI, the AI is at unit level. The complexity is left only to the GPU for rendering. In SS all these monsters are trying to find you and come at you.

I'm not that sure, all the previous versions were PC first. You can play on a 4 core, too, you'll just play at 720p/30fps

I'll give you another one

Enter the Matrix

Of course its obviously not fully dynamic. But I strongly doubt Serious Sam will do that with a supposedly infinite number of actors. That is why I'm saying... this does not have to be done on 8 core machines. And from the pov of being a capable game on most mainstream systems... I'd say its optimistic.

I did play the early SS's, they weren't bad at all, but very straightforward. That is why, again, this req seems so questionable. This game ran on a toaster CPU.

BoboOOZ · Sep 14, 2020

Vayra86 said:
I'll give you another one

View attachment 168708
Enter the Matrix Of course its obviously not fully dynamic. But I strongly doubt Serious Sam will do that with a supposedly infinite number of actors. That is why I'm saying... this does not have to be done on 8 core machines. And from the pov of being a capable game on most mainstream systems... I'd say its optimistic.

I did play the early SS's, they weren't bad at all, but very straightforward. That is why, again, this req seems so questionable. This game ran on a toaster CPU.

I don't know Matrix, so I can't comment on it, but I would imagine those agents are pretty dumb, otherwise, Neo dies :cool:

. But if you have next-gen games and they can run on equipment 10 years old, that's an indication that the developers aren't trying to give you more, with higher requiurement? What's the ponit of having good, new equipment, if it remains unused?

efikkan · Sep 14, 2020

BoboOOZ said:
Revival of an old thread, but here's a new game that requires 8 cores as recommended settings. We'll see more and more of these, especially in open-world games:

You'll need some serious hardware to play Serious Sam 4

The Serious Sam series has never been known for its graphics, and while the next installment does look nice, it's certainly not on par visually with some...

www.techspot.com

Considering the requirements doesn't even bother listing what class of CPU, just the ambiguous "8 cores" and "3.3 GHz", they probably didn't put much thought into it. I'm pretty sure a 4/6 core Comet Lake would outperform an good old 8-core Bulldozer in this game.

BoboOOZ · Sep 14, 2020

efikkan said:
Considering the requirements doesn't even bother listing what class of CPU, just the ambiguous "8 cores" and "3.3 GHz", they probably didn't put much thought into it. I'm pretty sure a 4/6 core Comet Lake would outperform an good old 8-core Bulldozer in this game.

Bulldozer was not an 8 core, that was settled by a class-action a while ago... And given that the required GPU's are at most 2 generations old, that gives a decent ballpark to what 8 core means.
Anyway, I'm looking forward to see if the required oomph will also translate to better graphics and gameplay, or it's just a lack of optimization.

Vayra86 · Sep 14, 2020

BoboOOZ said:
I don't know Matrix, so I can't comment on it, but I would imagine those agents are pretty dumb, otherwise, Neo dies . But if you have next-gen games and they can run on equipment 10 years old, that's an indication that the developers aren't trying to give you more, with higher requiurement? What's the ponit of having good, new equipment, if it remains unused?

The point is 9 times out of 10 you really never needed the good new equipment. its just a cost- or quality cutting measure that you pay for. Optimization and writing great software is an art form. Not everyone is talented, and lots of software is being written. Its close to being a factory product that rolls off the line and into a box.

In many cases a lack of talent, time and/or optimization is solved by iterative development. You get a game, and a day one patch to make it work. You get a patch every other week. Etc.

Make no mistake everything you see up to and including specs like these is just cold hard business, nothing else. New technology? Man, we had accurate reflections as early as Unreal 1 and given enough work on a rasterized approach we can already create scenes that rival ray traced content. Or are just ray traced content, baked in. Its 2020 and we're now thinking of automation. Why? Apparently there is an economical reality where it generates profit, or is likely to do so.

NPC's and AI are of a similar nature. The groundwork is decades old and still being iterated on. If they just took that and made it 'a lot bigger' then its easy to arrive at an 8 core requirement like this. You said it right, Total War found a trick around it. Enter the Matrix does something similar - the way that works is that every time the game picks 4-5 actors that are surrounding Neo, and makes them 'active', the rest is dancing around it creating an illusion of density. Yes, you see through it. And I guarantee you... even in SS4 with its fabulous system you will see through it. None of this is new. Dying Light for example... how many zombies exactly? Exactly. And again... that game is not CPU intensive.

Another example... that Vulkan / Mantle demo, what was it called? It did NOT melt CPUs. With tens of thousands of actors.

efikkan · Sep 14, 2020

BoboOOZ said:
Bulldozer was not an 8 core, that was settled by a class-action a while ago...

That's not how lawsuits work. AMD settled because a settlement is cheaper than the alternative, not because the claim was correct.
Bulldozer was an 8-core design, there is no doubt about that, albeit with major "shortcomings" in the design.

BoboOOZ said:
And given that the required GPU's are at most 2 generations old, that gives a decent ballpark to what 8 core means.
Anyway, I'm looking forward to see if the required oomph will also translate to better graphics and gameplay, or it's just a lack of optimization.

There is usually no relation between the age of the CPU and the GPU in the recommendations. A 7 year old Haswell, or even older CPUs can still be more relevant than GPUs of a similar age, not to mention being able to compete with brand new AMD CPUs.

People are generally putting way too much thought into these recommendations. Usually they are derived from the test systems which were primarily used during development, and can sometimes be very optimistic or conservative. Look at reviews if you want to see the reality, or gamble and buy the game yourself.

BoboOOZ · Sep 14, 2020

efikkan said:
Bulldozer was an 8-core design, there is no doubt about that, albeit with major "shortcomings" in the design.

The shortcoming was that there were only 4 fp units, so for many workflows it became effectively a 4 core. It is accurate to describe it as a 4 core with 2 integer units and one fp unit per core, because I'm pretty sure they didn't make cores with 1 integer unit and half an fp unit.

efikkan said:
There is usually no relation between the age of the CPU and the GPU in the recommendations. A 7 year old Haswell, or even older CPUs can still be more relevant than GPUs of a similar age, not to mention being able to compete with brand new AMD CPUs.

I would it say it is quite the contrary and what you are quoting is the exception, the epic Intel stagnation in the last decade

efikkan said:
People are generally putting way too much thought into these recommendations. Usually they are derived from the test systems which were primarily used during development, and can sometimes be very optimistic or conservative. Look at reviews if you want to see the reality, or gamble and buy the game yourself.

I'm not a game developer, but definitely by the way you and a couple of other guys are describing them around the forum, they must be a bunch of lazy ignorants , who aren't even capable to monitor in the windows task manager their thread/core utilization during their testing sessions...
I will probably buy it though, we'll say what that legion mode is all about.

seronx · Sep 14, 2020

BoboOOZ said:
The shortcoming was that there were only 4 fp units, so for many workflows it became effectively a 4 core. It is accurate to describe it as a 4 core with 2 integer units and one fp unit per core, because I'm pretty sure they didn't make cores with 1 integer unit and half an fp unit.

FPU's aren't part of the core.

K7 doesn't have a FPU in the core.
K8 doesn't have a FPU in the core.
Greyhound doesn't have a FPU in the core.
Husky doesn't have a FPU in the core.
Bobcat doesn't have a FPU in the core.
Jaguar doesn't have a FPU in the core.
Zen doesn't have a FPU in the core.

The only modern design from AMD to have a FPU inside the core is this one:

Single control unit, single instruction bus, single data bus, single superscalar datapath => one core.

AMD's Orochi design is more accurate to describe as four processors with two cores each. As by architect definition since before the 90s.

Retire unit (C0) & Retire unit (C1) => Two control units
Scheduler (C0) & Scheduler (C1) => Two instruction buses
Datapath (C0) & Datapath (C1) => Two datapaths
Load/Store (C0) & Load/Store (C1) => Two data buses
A Bulldozer processor is a dual-core design.

General consensus to marketing is one core in processor, just call the processor a core. In this, case AMD had two cores in a processor, and thus it is a dual-core unit.

Imagine reading a "technical" document... where ___ core contains core. When previous documents have... where ___ processor contains/builds on processor core/core.

System Name	Home
Processor	Ryzen 3600X
Motherboard	MSI Tomahawk 450 MAX
Cooling	Noctua NH-U14S
Memory	16GB Crucial Ballistix 3600 MHz DDR4 CAS 16
Video Card(s)	MSI RX 5700XT EVOKE OC
Storage	Samsung 970 PRO 512 GB
Display(s)	ASUS VA326HR + MSI Optix G24C4
Case	MSI - MAG Forge 100M
Power Supply	Aerocool Lux RGB M 650W

Processor	AMD Ryzen 9 5900X \|\|\| Intel Core i7-3930K
Motherboard	ASUS ProArt B550-CREATOR \|\|\| Asus P9X79 WS
Cooling	Noctua NH-U14S \|\|\| Be Quiet Pure Rock
Memory	Crucial 2 x 16 GB 3200 MHz \|\|\| Corsair 8 x 8 GB 1333 MHz
Video Card(s)	MSI GTX 1060 3GB \|\|\| MSI GTX 680 4GB
Storage	Samsung 970 PRO 512 GB + 1 TB \|\|\| Intel 545s 512 GB + 256 GB
Display(s)	Asus ROG Swift PG278QR 27" \|\|\| Eizo EV2416W 24"
Case	Fractal Design Define 7 XL x 2
Audio Device(s)	Cambridge Audio DacMagic Plus
Power Supply	Seasonic Focus PX-850 x 2
Mouse	Razer Abyssus
Keyboard	CM Storm QuickFire XT
Software	Ubuntu

System Name	Home
Processor	Ryzen 3600X
Motherboard	MSI Tomahawk 450 MAX
Cooling	Noctua NH-U14S
Memory	16GB Crucial Ballistix 3600 MHz DDR4 CAS 16
Video Card(s)	MSI RX 5700XT EVOKE OC
Storage	Samsung 970 PRO 512 GB
Display(s)	ASUS VA326HR + MSI Optix G24C4
Case	MSI - MAG Forge 100M
Power Supply	Aerocool Lux RGB M 650W

System Name	Tiny the White Yeti
Processor	7800X3D
Motherboard	MSI MAG Mortar b650m wifi
Cooling	CPU: Thermalright Peerless Assassin / Case: Phanteks T30-120 x3
Memory	32GB Corsair Vengeance 30CL6000
Video Card(s)	ASRock RX7900XT Phantom Gaming
Storage	Lexar NM790 4TB + Samsung 850 EVO 1TB + Samsung 980 1TB + Crucial BX100 250GB
Display(s)	Gigabyte G34QWC (3440x1440)
Case	Lian Li A3 mATX White
Audio Device(s)	Harman Kardon AVR137 + 2.1
Power Supply	EVGA Supernova G2 750W
Mouse	Steelseries Aerox 5
Keyboard	Lenovo Thinkpad Trackpoint II
VR HMD	HD 420 - Green Edition ;)
Software	W11 IoT Enterprise LTSC
Benchmark Scores	Over 9000

System Name	Home
Processor	Ryzen 3600X
Motherboard	MSI Tomahawk 450 MAX
Cooling	Noctua NH-U14S
Memory	16GB Crucial Ballistix 3600 MHz DDR4 CAS 16
Video Card(s)	MSI RX 5700XT EVOKE OC
Storage	Samsung 970 PRO 512 GB
Display(s)	ASUS VA326HR + MSI Optix G24C4
Case	MSI - MAG Forge 100M
Power Supply	Aerocool Lux RGB M 650W

System Name	SolarwindMobile
Processor	AMD FX-9800P RADEON R7, 12 COMPUTE CORES 4C+8G
Motherboard	Acer Wasp_BR
Cooling	It's Copper.
Memory	2 x 8GB SK Hynix/HMA41GS6AFR8N-TF
Video Card(s)	ATI/AMD Radeon R7 Series (Bristol Ridge FP4) [ACER]
Storage	TOSHIBA MQ01ABD100 1TB + KINGSTON RBU-SNS8152S3128GG2 128 GB
Display(s)	ViewSonic XG2401 SERIES
Case	Acer Aspire E5-553G
Audio Device(s)	Realtek ALC255
Power Supply	PANASONIC AS16A5K
Mouse	SteelSeries Rival
Keyboard	Ducky Channel Shine 3
Software	Windows 10 Home 64-bit (Version 1607, Build 14393.969)