Tuesday, May 19th 2020
Possible AMD "Vermeer" Clock Speeds Hint at IPC Gain
The bulk of AMD's 4th generation Ryzen desktop processors will comprise of "Vermeer," a high core-count socket AM4 processor and successor to the current-generation "Matisse." These chips combine up to two "Zen 3" CCDs with a cIOD (client I/O controller die). While the maximum core count of each chiplet isn't known, they will implement the "Zen 3" microarchitecture, which reportedly does away with CCX to get all cores on the CCD to share a single large L3 cache, this is expected to bring about improved inter-core latencies. AMD's generational IPC uplifting efforts could also include improving bandwidth between the various on-die components (something we saw signs of in the "Zen 2" based "Renoir"). The company is also expected to leverage a newer 7 nm-class silicon fabrication node at TSMC (either N7P or N7+), to increase clock speeds - or so we thought.
An Igor's Lab report points to the possibility of AMD gunning for efficiency, by letting the IPC gains handle the bulk of Vermeer's competitiveness against Intel's offerings, not clock-speeds. The report decodes OPNs (ordering part numbers) of two upcoming Vermeer parts, one 8-core and the other 16-core. While the 8-core part has some generational clock speed increases (by around 200 MHz on the base clock), the 16-core part has lower max boost clock speeds than the 3950X. Then again, the OPNs reference A0 revision, which could mean that these are engineering samples that will help AMD's ecosystem partners to build their products around these processors (think motherboard- or memory vendors), and that the retail product could come with higher clock speeds after all. We'll find out in September, when AMD is expected to debut its 4th generation Ryzen desktop processor family, around the same time NVIDIA launches GeForce "Ampere."
Sources:
Igor's Lab, VideoCardz
An Igor's Lab report points to the possibility of AMD gunning for efficiency, by letting the IPC gains handle the bulk of Vermeer's competitiveness against Intel's offerings, not clock-speeds. The report decodes OPNs (ordering part numbers) of two upcoming Vermeer parts, one 8-core and the other 16-core. While the 8-core part has some generational clock speed increases (by around 200 MHz on the base clock), the 16-core part has lower max boost clock speeds than the 3950X. Then again, the OPNs reference A0 revision, which could mean that these are engineering samples that will help AMD's ecosystem partners to build their products around these processors (think motherboard- or memory vendors), and that the retail product could come with higher clock speeds after all. We'll find out in September, when AMD is expected to debut its 4th generation Ryzen desktop processor family, around the same time NVIDIA launches GeForce "Ampere."
37 Comments on Possible AMD "Vermeer" Clock Speeds Hint at IPC Gain
I pointed out that we've had 8-core consoles for nearly 7 years now. Many predicted this would cause "fully multithreaded games" within "a couple of years" back when Xbox One and PS4 launched, just like you are doing now. But it didn't happen, because rendering doesn't work that way. I understand fully, and I never claimed so either.
As I said, having multiple threads building a single queue makes no sense. Which is why the amount of threads interfacing with the GPU is limited to the distinct tasks (rendering passes or compute workloads) which can be separated and potentially parallelized.
Additionally, the driver uses up to several thread on its side, but that's out of our control. Well, that's the goal.
Ideally, all games should be GPU bound. That way you get the graphics performance you paid for. This clearly illustrates that you don't understand how games works.
GPUs have specialized hardware to do all kinds of rendering tasks, like creating verticies, tessellation, rasterization, texture mapping, etc. While all of these can technically be emulated in software, the performance would be terrible.
Offloading the GPU to the CPU would be to go backwards. And what would be the purpose? GPUs are massively powerful at dense math, while CPUs are better at logic. Games typically do a lot of the logic parts on the CPU while building a queue, then sends batches of data to the GPU which it does the best.
I wouldn't advice anyone to base anything from their crystal balls, but rather understand the field of expertise before attempting to do qualified predictions. But I can list some of the current trends;
* GPUs are becoming more flexible, and the rendering pipeline is increasingly programmable. Some games will take advantage of this, and thus become less CPU bottlenecked.
* Most games are using a few generalized game engines. These engines are increasingly bloated, and will to some extend counteract the improvements on the GPU side. Some of these may spawn a lot of threads, most of which does fairly little work at all. The ever-increasing bloat of these engines is also responsible for the lack of benefits from lower API overhead in DirectX 12.
* Resource streaming will be more utilized, but slowly.
* GPU accelerated audio will probably become common eventually. I'm sorry, but this is where you go wrong. An arbitrary piece of code can't be parallelized any way you want, see my first paragraph.
A workload which mostly consists of large work chunks which can be processed mostly async can scale to almost any core count. The need for synchronization increase the overhead for additional threads, so with more synchronization a workload will always have a diminishing return with increasing thread count. Games is one of the workloads which is on the "worst" end of this scale. You're right about clock speeds stagnating (at least until we have different types of semiconductors), but the way forward is a balanced approach between more cores, more superscalar and more SIMD. Many are forgetting that the performance per core is the base scaling factor for multithreaded performance. Since most non-server workloads does not scale linearly, having faster cores actually helps you scale to more cores and suffer less from OS overhead. The key here is to strike the right balance between core count and core speed for your workload.
www.techspot.com/news/86731-youll-need-serious-hardware-play-serious-sam-4.html
Its also a nice way to identify shit console ports.
Total war is a great game too, but all those soldiers do not have individual AI, the AI is at unit level. The complexity is left only to the GPU for rendering. In SS all these monsters are trying to find you and come at you. I'm not that sure, all the previous versions were PC first. You can play on a 4 core, too, you'll just play at 720p/30fps :D
Enter the Matrix ;) Of course its obviously not fully dynamic. But I strongly doubt Serious Sam will do that with a supposedly infinite number of actors. That is why I'm saying... this does not have to be done on 8 core machines. And from the pov of being a capable game on most mainstream systems... I'd say its optimistic.
I did play the early SS's, they weren't bad at all, but very straightforward. That is why, again, this req seems so questionable. This game ran on a toaster CPU.
Anyway, I'm looking forward to see if the required oomph will also translate to better graphics and gameplay, or it's just a lack of optimization.
In many cases a lack of talent, time and/or optimization is solved by iterative development. You get a game, and a day one patch to make it work. You get a patch every other week. Etc.
Make no mistake everything you see up to and including specs like these is just cold hard business, nothing else. New technology? Man, we had accurate reflections as early as Unreal 1 and given enough work on a rasterized approach we can already create scenes that rival ray traced content. Or are just ray traced content, baked in. Its 2020 and we're now thinking of automation. Why? Apparently there is an economical reality where it generates profit, or is likely to do so.
NPC's and AI are of a similar nature. The groundwork is decades old and still being iterated on. If they just took that and made it 'a lot bigger' then its easy to arrive at an 8 core requirement like this. You said it right, Total War found a trick around it. Enter the Matrix does something similar - the way that works is that every time the game picks 4-5 actors that are surrounding Neo, and makes them 'active', the rest is dancing around it creating an illusion of density. Yes, you see through it. And I guarantee you... even in SS4 with its fabulous system you will see through it. None of this is new. Dying Light for example... how many zombies exactly? Exactly. And again... that game is not CPU intensive.
Another example... that Vulkan / Mantle demo, what was it called? It did NOT melt CPUs. With tens of thousands of actors.
Bulldozer was an 8-core design, there is no doubt about that, albeit with major "shortcomings" in the design. There is usually no relation between the age of the CPU and the GPU in the recommendations. A 7 year old Haswell, or even older CPUs can still be more relevant than GPUs of a similar age, not to mention being able to compete with brand new AMD CPUs.
People are generally putting way too much thought into these recommendations. Usually they are derived from the test systems which were primarily used during development, and can sometimes be very optimistic or conservative. Look at reviews if you want to see the reality, or gamble and buy the game yourself.
I will probably buy it though, we'll say what that legion mode is all about.
K7 doesn't have a FPU in the core.
K8 doesn't have a FPU in the core.
Greyhound doesn't have a FPU in the core.
Husky doesn't have a FPU in the core.
Bobcat doesn't have a FPU in the core.
Jaguar doesn't have a FPU in the core.
Zen doesn't have a FPU in the core.
The only modern design from AMD to have a FPU inside the core is this one:
Single control unit, single instruction bus, single data bus, single superscalar datapath => one core.
AMD's Orochi design is more accurate to describe as four processors with two cores each. As by architect definition since before the 90s.
Retire unit (C0) & Retire unit (C1) => Two control units
Scheduler (C0) & Scheduler (C1) => Two instruction buses
Datapath (C0) & Datapath (C1) => Two datapaths
Load/Store (C0) & Load/Store (C1) => Two data buses
A Bulldozer processor is a dual-core design.
General consensus to marketing is one core in processor, just call the processor a core. In this, case AMD had two cores in a processor, and thus it is a dual-core unit.
Imagine reading a "technical" document... where ___ core contains core. When previous documents have... where ___ processor contains/builds on processor core/core.