Tuesday, May 19th 2020
Possible AMD "Vermeer" Clock Speeds Hint at IPC Gain
The bulk of AMD's 4th generation Ryzen desktop processors will comprise of "Vermeer," a high core-count socket AM4 processor and successor to the current-generation "Matisse." These chips combine up to two "Zen 3" CCDs with a cIOD (client I/O controller die). While the maximum core count of each chiplet isn't known, they will implement the "Zen 3" microarchitecture, which reportedly does away with CCX to get all cores on the CCD to share a single large L3 cache, this is expected to bring about improved inter-core latencies. AMD's generational IPC uplifting efforts could also include improving bandwidth between the various on-die components (something we saw signs of in the "Zen 2" based "Renoir"). The company is also expected to leverage a newer 7 nm-class silicon fabrication node at TSMC (either N7P or N7+), to increase clock speeds - or so we thought.
An Igor's Lab report points to the possibility of AMD gunning for efficiency, by letting the IPC gains handle the bulk of Vermeer's competitiveness against Intel's offerings, not clock-speeds. The report decodes OPNs (ordering part numbers) of two upcoming Vermeer parts, one 8-core and the other 16-core. While the 8-core part has some generational clock speed increases (by around 200 MHz on the base clock), the 16-core part has lower max boost clock speeds than the 3950X. Then again, the OPNs reference A0 revision, which could mean that these are engineering samples that will help AMD's ecosystem partners to build their products around these processors (think motherboard- or memory vendors), and that the retail product could come with higher clock speeds after all. We'll find out in September, when AMD is expected to debut its 4th generation Ryzen desktop processor family, around the same time NVIDIA launches GeForce "Ampere."
Sources:
Igor's Lab, VideoCardz
An Igor's Lab report points to the possibility of AMD gunning for efficiency, by letting the IPC gains handle the bulk of Vermeer's competitiveness against Intel's offerings, not clock-speeds. The report decodes OPNs (ordering part numbers) of two upcoming Vermeer parts, one 8-core and the other 16-core. While the 8-core part has some generational clock speed increases (by around 200 MHz on the base clock), the 16-core part has lower max boost clock speeds than the 3950X. Then again, the OPNs reference A0 revision, which could mean that these are engineering samples that will help AMD's ecosystem partners to build their products around these processors (think motherboard- or memory vendors), and that the retail product could come with higher clock speeds after all. We'll find out in September, when AMD is expected to debut its 4th generation Ryzen desktop processor family, around the same time NVIDIA launches GeForce "Ampere."
37 Comments on Possible AMD "Vermeer" Clock Speeds Hint at IPC Gain
It's always bin low current > high clocks OR high current > low clocks. But a FX could easily consume up to 220W and even 300W in extreme conditions easily.
however, I would eagerly wait for AMD's Radeon next-gen GPUs offering.
RTX 2080 Ti bottlenecks the fastest Skylake model while the fastest Zen 2 model clearly bottlenecks the RTX 2080 ti. You know Ampere is going to make Zen 2 look even worse right?
Nevertheless, IPC improvements is the area to focus on going forward, and any IPC improvement is appreciated. Intel's rated boost clock speeds are too optimistic, and they usually throttle quite a bit when the power limit kicks in. So in reality, with high sustained load on multiple cores, AMD often matches or exceeds Intel in actual clock speeds.
I don't think AMD should be pushing too hard on unstable boost speeds, what we need is good sustained performance. AMD needs to work on the areas where they fall behind Intel, primarily the CPU front-end and memory controller latency. The CPU front-end is one of the largest area of improvement in Sunny Cove over Skylake, so AMD needs to step up here.
"Indeed!"
1) 1080p gaming on $1.3k makes no sense. We do it, to figure "how CPUs will behave in the future". It's an arguable theory, that assumes that future performance could be deducted by testing stuff at unrealistic resolutions
2) New games show completely different behavior. Games that actually do use CPU power (lot's of AI stuff going in that game) get us to that picture which is outright embarrassing to Intel
Mkay?
Low resolution tests of archaic games are only good for easing the pain of the blue fans.
On top of low resolution tests using the fastest card money can buy being questionable on its own.
Other than that, I agree with you that more threads will probably age better in gaming, given the architecture of next-gen consoles.
While we probably will continue to see games use a little more threads in general, this is mostly for non-rendering tasks; audio, networking, video encoding, etc. It doesn't make sense do rendering (which consists of building queues for the GPU) over more than 1-3 threads, with each thread doing its separate task like a render pass, viewport, particle simulation or resource loading. While it is technically possible to have multiple threads build a single GPU queue, the synchronization overhead would certainly kill any perceived "performance advantage".
In the next years single thread performance will continue to be important, but only to the point where the GPU is fully saturated. So with the next generations from AMD and Intel we should expect Intel's gaming advantage to shrink a bit.
Since then, due to the plateau in frequency, in many domains, software has been written differently so that it can take advantage of massive parallelization. Of course, parallelization requires a shift in paradigm, and software, firmware and hardware advances. And it is much more difficult to write fully parallel software than monolithic, but it is feasible and it has already been done in many applications.
In gaming, until now there has not been a strong drive in this direction, because the average consumer computer thread count wasn't that high. But that is over with the PS5&co, these consoles have 8 cores clocked rather low. If games that are being written right now, would only use 2-3 cores, that means they would suck big time. So I'm pretty sure that next-gen games will be quite good at using multiple threads, and we will start feeling this in PC gaming in less than 2 year's time.
Family 19h = Zen3 speculatively to Zen4.
It is very much likely going to be an architectural overhaul within that of Bobcat to Jaguar overhauling at least;
www.extremetech.com/gaming/142163-amds-next-gen-bobcat-apu-could-win-big-in-notebooks-and-tablets-if-it-launches-on-time
www.techpowerup.com/180394/amd-jaguar-micro-architecture-takes-the-fight-to-atom-with-avx-sse4-quad-core
Ex:
Bobcat dual-core (14h) => two separate L2s
Jaguar dual-core (16h) => one unified L2
::
Zen2 octo-core (17h) => two separate CCXs
Zen3 octo-core (19h) => one unified CCX
That depends on your definition of being "fully parallel". If you have a workload of independent work chunks that can be processed without synchronization, you can scale almost linearly until you reach a bottleneck in hardware or software. This mostly applies to large workloads of independent chunks, and the overhead of thread communication is negligible because of the chunk size vs. time scale. Examples include large encoding jobs, web servers, software rendering etc.
On the opposite end of the spectrum are highly synchronized workloads, where any workload will reach the point of diminishing returns due to overhead as threading isn't free.
There is also instruction level parallelism, but that's a topic of its own. These are common misconceptions, even among programmers. While games have been using more than one thread for a long time, using many threads for rendering haven't happened despite Xbox One and PS4 launching nearly 7 years ago with 8 cores.
Firstly games work on a very small time scale, e.g. 8.3ms if you want 120 Hz, there is very little room for overhead before you encounter serious stutter. Rendering with DirectX, OpenGL or Vulkan works by using API calls to build a queue for the GPU pipeline. The GPU pipeline itself isn't fully controlled by the programmer, but at certain points in the pipeline it executes programmable pieces of code called "shader programs"(the name is misleading, as it's much more than shading). While it is technically possible to have multiple GPU queues (doing different things) or even to have multiple threads cooperate building a single queue, it wouldn't make sense doing so since the API calls needs to be executed in order, so you need synchronization, and the overhead of synchronization is much more substantial than building the entire queue from a single thread. This is the reason why even after all these years of multi-core CPUs all games use 1 thread per GPU workload. Having a pool of worker threads to build a single queue makes no sense today or several years from now. If you need to offload something, you should offload the non-rendering stuff, but even then do limited synchronization, as the individual steps in a rendering lives within <1ms, which leaves very little time for constantly syncing threads to do a tiny bit of work.
As someone who has been using OpenGL and DirectX since the early 2000s, I've seen the transition from a fixed function pipeline to a gradually more programmable pipeline. The long term trend (10+ years) is to continue offloading the rendering logic to the GPU, hopefully one day achieving a completely programmable pipeline from the GPU. As we continue to take steps in that direction, the CPU will become less of a bottleneck. The need for more threads for games will be dictated by whatever non-rendering work the games needs.
- It do not mean that an app is not using 100% cpu on a 8 core chip that the 2-4 extra core vs a 4 or 6 core that the additional core aren't helpful. The goal is always to run a specific list of thing in the shortest timeframe. That may be run something that won't utilise 100% of a core for the whole frame on a different core. Overall the latency is reduced, but it won't use 100% of that core.
- Having more thread in a program add an overhead that require more power to overcome. A faster cpu will be able to overcome that better than a slower one.
- Latency is still king in game. Core to core latency is still something that need to be taken into consideration. And depending on the workload, that core to core latency can be transformed in a Core to L3 cache or Core to Memory latency, slowing things down quite a bit.
- AMD FX CPU had a lot of
corethread and still do reasonably well in some title with frame time consistency (meaning no big fps drop), but that do not mean at all that they can run these games faster. just smoother with lower fps. They had an hard time against an intel 2500k at the time. A 2500k can have some difficulties with frame time consistency in modern title, but still deliver better average FPS in many title.- on that subject, a 2600k witch is very similar to a 2500k do way better in many game these days than it did at launch. remove some minor MHz differency and the main difference is going from 4core/4thread to 4core/8thread.
So to recap, in my opinion. a 3950x right now might do better in the future when game developper will be used to the 8 core / 16 thread of the next gen console (right now they are on a 8 core/8 thread slow jaguar cpu). but a newer CPU with less thread but better IPC and frequency could also do a much better job at running these games.
this is why cpu like the 3300x and the 3600 make so much sense right now. i do not think, except on very specific case that these super high end parts are really worth it. If the CPU race is restarted, spending 300 buck every 1.5 years will give better results than spending 600 bucks for 3 + years.
But, as usual, there's a Gaussian distribution, so it's rather normal to have the highest end of the market comprised of only 5%-10% of the total.
The rest is more blurry, and it depends entirely on what direction the gaming industry will take from now on. At present, we are GPU bound in most games with high settings. Developers could take the approach of shifting more load towards the CPU, and parallelize massively their code, or could choose to do the minimum necessary to see performance improvements. Since I don't have my crystal ball, I have no way to predict which will occur.
But technically, it is perfectly possible to code an application in such a way that it runs faster on a 16 core with 85% clock speed versus on a 10 core at 100% clock speed. It has already been done for many applications and it can be done for the game too. When exactly this will happen, it's hard to predict, but it has to happen if we are to play open-world games at 400 fps in the future. Your example with opterons vs fx is flawed, because is based on insufficiently parallelized applications.
The last part I am pretty sure I completely disagree, I am quite sure future improvements of processor performance will rely much more on core counts than on IPC. And clock frequencies will stagnate at best, as both Intel and AMD continue to shrink their nodes, we will have 64 cores in a few years for home computers, but we will never reach 6GHz.