AMD "Zen 2" IPC 29 Percent Higher than "Zen"

bug · Nov 12, 2018

Smartcom5 said:
Excuse me sir, but you misspelled IPS! When people will finally learn the difference ffs?!

There's the IPC, and then there's IPS.
IPC or I/c → Instructions per (Clock-) Cycle
IPS or I/s → Instructions per Second

The letter one, thus IPS, often is used synonymously with and for actual Single-thread-Performance – whereas AMD no longer and surely not to such an extent lags behind in numbers compared to Intel now as they did at the time Bulldozer was the pinnacle of the ridge.

Rule of thumb:
IPC does not scale with frequency but is rather fix·ed (within margins, depends on context and kind of [code-] instructions¹, you got the idea).
IPS is the fixed value of the IPC in a time-relation or at a time-figure pretty much like the formula → IPC×t, simply put.

So your definition of IPC quoted above would rather be called „Instructions per Clock at the Wall“ like IPC@W.
So please, stop using right terms and definitions for wrong contexts, learn the difference between those two and get your shit together please! View attachment 110387

¹ The value IPC is (depending on kind) absolute² and fixed, yes.
However, it completely is crucially depending on the type and kind of instructions and can vary rather stark by using different kind of instructions – since, per definition, the figure IPC only reflects the value of how many instructions can be processed on average per (clock-) circle.

On synthetic code like instructions with low logical depth or level and algorithmic complexity, which are suited to be processed rather shortly, the resulting value is obviously pretty high – whereas on instructions with a rather high complexity and long length, the IPC-value can only reach rather low figures. In this particular matter, even the contrary can be the case, so that it needs more than one or even a multitude of cycles to process a single given complex instruction. In this regard we're speaking of the reciprocal multiplicative, thus the inverse (-value).
… which is also standardised as being defined as (Clock-) Cycles per Instruction or C/I, short → CPI.
² In terms of non-varying, as opposed to relative.

Read:
Wikipedia • Instructions per cycle
Wikipedia • Instructions per second
Wikipedia • Cycles per instruction

Smartcom

No, I meant just what I said/wrote

HD64G · Nov 12, 2018

Valantar said:
While you're right that we don't know yet that the CCXes have grown to 8 cores (though IMO this seems likely given that every other Zen2 rumor has been spot on), that drawing is ... nonsense. First off, it proposes using IF to communicate between CCXes on the same die, which even Zen1 didn't do. The sketch directly contradicts what AMD said about their design, and doesn't at all account for the I/O die and its role in inter-chiplet communication. The layout sketched out there is incredibly complicated, and wouldn't even make sense for a theoretical Zen1-based 8-die layout. Remember, IF uses PCIe links, and even in Zen1 the PCIe links were common across two CCXes. The CCXes do thus not have separate IF links, but share a common connection (through the L3 cache, IIRC) to the PCIe/IF complex. Making these separate would be a giant step backwards in terms of design and efficiency. Remember, the uncore part of even a 2-die Threadripper consumes ~60W. And that's with two internal links, 64 lanes of PCIe and a quad-channel memory controller. The layout in the sketch above would likely consume >200W for IF alone.

Now, let's look at that sketch. In it, any given CCX is one hop away from 3-4 other CCXes, 2 hops from 3-5 CCXes, and 3 hops away from the remaining 7-10 CCXes. In comparison, with EPYC (non-Rome) and TR, all cores are 1 hop away from each other (though the inter-CCX hop is shorter/faster than the die-to-die IF hop). Even if this is "reduced latency IF" as they call it, that would be ridiculous. And again: what role does the I/O die play in this? The IF layout in that sketch makes no use of it whatsoever, other than linking the memory controller and PCIe lanes to eight seemingly random CCXes. This would make NUMA management an impossible flustercuck on the software side, and substrate manufacturing (seriously, there are six IF links in between each chiplet there! The chiplets are <100mm2! This is a PCB, not an interposer! You can't get that kind of trace density in a PCB.) impossible on the hardware side. Then there's the issue of this design requiring each CCX to have 4 IF links, but 1/4 of the CCXes only gets to use 3 links, wasting die area.

On the other hand, let's look at the layout that makes sense both logically, hardware and software wise, and adds up with what AMD has said about EPYC: Each chiplet has a single IF interface, that connects to the I/O die. Only that, nothing more. The I/O die has a ring bus or similar interconnect that encompasses the 8 necessary IF links for the chiplets, an additional 8 for PCIe/external IF, and the memory controllers. This reduces the number of IF links running through the substrate from 30 in your sketch (6 per chiplet pair + 6 between them) to 8. It is blatantly obvious that the I/O die has been made specifically to make this possible. This would make every single core 1 hop (through the I/O die, but ultimately still 1 hop) away from any other core, while reducing the number of IF links by almot 1/4. Why else would they design that massive die?

Red lines. The I/O die handles low-latency shuffling of data between IF links, while also giving each chiplet "direct" access to DRAM and PCIe. All over the same single connection per chiplet. The I/O die is (at least at this time) a black box, so we don't know whether it uses some sort of ring bus, mesh topology, or large L4 cache (or some other solution) to connect these various components. But we do know that a layout like this is the only one that would actually work. (And yes, I know that my lines don't add up in terms of where the IF link is physically located on the chiplets. This is an illustration, not a technical drawing.)

More on-topic, we need to remember that IPC is workload dependent. There might be a 29% increase in IPC in certain workloads, but generally, when we talk about IPC it is average IPC across a wide selection of workloads. This also applies when running test suites like SPEC or GeekBench, as they run a wide variety of tests stressing various parts of the core. What AMD has "presented" (it was in a footnote, it's not like they're using this for marketing) is from two specific workloads. This means that a) this can very likely be true, particularly if the workloads are FP-heavy, and b) this is very likely not representative of total average IPC across most end-user-relevant test suites. In other words, this can be both true (in the specific scenarios in question) and misleading (if read as "average IPC over a broad range of workloads").

Agreed. Interesting graph but and I also think it has mistakes. AMD put this central die in the middle of the chiplets to allow all of them be as close as possible to it. And they put the memory controller there also to cancel the need of those chiplets to communicate at all. The CPU will use as many cores as needed by the sw and use the IO chip to do the rest. And that is why imho this arch is brilliant and the only way to increase core count without increase latency to the moon. We are warching a true revolution in computing here. My 5 cents.

bug · Nov 12, 2018

Vayra86 said:
Eh... IPS in my mind is In Plane Switching for displays.

He spelled it fine, you didn't read it right.

Happens to me too from time to time. Especially when I read or post in a hurry.

Markosz · Nov 12, 2018

Oh, investor meeting... then let's take half of what they said

Valantar · Nov 12, 2018

Smartcom5 said:
Excuse me sir, but you misspelled IPS! When people will finally learn the difference ffs?!

Vayra86 said:
Eh... IPS in my mind is In Plane Switching for displays.

He spelled it fine, you didn't read it right.

Agreed. There's nothing wrong with saying "Intel has a clock speed advantage, but AMD might beat them in actual performance through increasing IPC." There's nothing in that saying that clock speed affects IPC, only that clock speed is a factor in actual performance. Which it is. What @Smartcom5 is calling "IPS" is just actual performance (which occurs in the real world, and thus includes time as a factor, and thus also clock speed) and not the intentional abstraction that IPC is. This seems like a fundamental misunderstanding of why we use the term IPC in the first place (to separate performance from the previous misunderstood oversimplification that was "faster clocks=more performance").

Dante Uchiha · Nov 12, 2018

btarunr said:
There are two ways AMD could built a 16-core AM4 processor:

Two 8-core chiplets with a smaller I/O die that has 2-channel memory, 32-lane PCIe gen 4.0 (with external redrivers), and the same I/O as current AM4 dies such as ZP or PiR.

A monolithic die with two 8-core CCX's, and fully integrated chipset like ZP or PiR. Such a die wouldn't be any bigger than today's PiR.

I think option two is more feasible for low-margin AM4 products.

That's not realistic. 16c is not feasible for consumers:

-16c with high clocks would have a high TDP, the current motherboards would have been problems to support them.
-16c would have to be double the current value of the 2700x, and even then AMD would have a lower profit/cpu sold.
- 8c CPU is more than enough for gaming, even for future releases.

Would you buy a 3700x @ 16c at U$ 599~ ? Or would be better a 3700x with "just 8c", low latency, optimized for gaming at U$ 349~399 ?

R0H1T · Nov 12, 2018

We're not getting 32 PCIe 4.0 lanes on AM4, I'd be (really) shocked if that were the case.

Valantar with the entire I/O & MC off the die it opens up a world of possibilities with Zen, having said that I'll go back again to the point I made in other threads. The 8 core CCX makes sense for servers & perhaps HEDT, however when it comes to APU (mainly notebooks) I don't see a market for 8 cores there. I also don't see AMD selling an APU with 6/4 cores disabled, even if it is high end desktop/notebooks.

The point I'm making is that either AMD makes two CCX, one with 8 cores & the other with 4, or they'll probably go with the same 4 core CCX. The image I posted is probably misconstrued, I also don't know for certain if the link shown inside the die is IF or just a logical connection (via L3?) between 2 CCX.

Valantar · Nov 12, 2018

HD64G said:
Interesting graph but I think it has mistakes. AMD put this central die in the middle of the chiplets to allow all of them be as close as possible to it. And they put the memory controller there also to cancel the need of those chiplets to communicate at all. The CPU will use as many cores as needed by the sw and use the IO chip to do the rest. And that is why imho this arch is brilliant and the only way to increase core count without increase latency to the moon. We are warching a true revolution in computing here. My 5 cents.

You're phrasing this as if you're arguing against me, yet what you're saying is exactly what I'm saying. Sounds like you're replying to the wrong post. The image I co-opted came from the quoted post, I just sketched in how I believe they'll lay this out.

TheinsanegamerN · Nov 12, 2018

If AMD managed a 15% IPC increase over OG zen, I would be amazed. I was expecting around 10%.

There is no way they will hit 20-29%. That is just wishful thinking on AMD's part, most likely in specific scenarios.

Of course, I'd love to e proved wrong here.

Valantar · Nov 12, 2018

R0H1T said:
We're not getting 32 PCIe 4.0 lanes on AM4, I'd be (really) shocked if that were the case.

Valantar with the entire I/O & MC off the die it opens up a world of possibilities with Zen, having said that I'll go back again to the point I made in other threads. The 8 core CCX makes sense for servers & perhaps HEDT, however when it comes to APU (mainly notebooks) I don't see a market for 8 cores there. I also don't see AMD selling an APU with 6/4/2 cores disabled, even if it is high end desktop/notebooks.

The point I'm making is that either AMD makes two CCX, one with 8 cores & the other with 4, or they'll probably go with the same 4 core CCX. The image I posted is probably misconstrued, I also don't know for certain if the link shown inside the die is IF or just a logical connection between 2 CCX.

I partially agree with that - it's very likely they'll put out a low-power 4-ish-core chiplet for mobile. After all, the mobile market is bigger than the desktop market, so it makes more sense for this to get bespoke silicon. What I disagree with is the need for the 8-core to be based off the same CCX as the 4-core. If they can make an 8-core CCX, equalising latencies between cores on the same die, don't you think they'd do so? I do, as that IMO qualifies as "low-hanging fruit" in terms of increasing performance from Zen/Zen+. This would have performance benefits for every single SKU outside of the mobile market. And, generally, it makes sense to assume that scaling down core count per CCX is no problem, so having an 8-core version is no hindrance to also having a 4-core version.

How I envision AMD's Zen2 roadmap:

Ryzen Mobile:
15-25W: 4-core chiplet + small I/O die (<16 lanes PCIe, DC memory, 1-2 IF links), either integrated GPU on the chiplet or separate iGPU chiplet
35-65W: 8-core chiplet + small I/O die (<16 lanes PCIe, DC memory, 1-2 IF links), separate iGPU chiplet or no iGPU (unlikely, iGPU useful for power savings)

Ryzen Desktop:
Low-end: 4-core chiplet + medium I/O die (< 32 lanes PCIe, DC memory, 2 IF links), possible iGPU (either on-chiplet or separate)
Mid-range: 8-core chiplet + medium I/O die (< 32 lanes PCIe, DC memory, 2 IF links), possible iGPU on specialized SKUs
High-end: 2x 8-core chiplet + medium I/O die (< 32 lanes PCIe, DC memory, 2 IF links)

Threadripper:
(possible "entry TR3": 2x 8-core chiplet + large I/O die (64 lanes PCIe, QC memory, 4 IF links), though this would partially compete with high-end Ryzen just with more RAM B/W and PCIe and likely only have a single 16-core SKU, making it unlikely to exist)
Main: 4x 8-core chiplet + large I/O die (64 lanes PCIe, QC memory, 4 IF links)

EPYC:
Small: 4x 8-core chiplet + XL I/O die (128 lanes PCIe, 8C memory, 8 IF links)
Large: 8x 8-core chiplet + XL I/O die (128 lanes PCIe, 8C memory, 8 IF links)

Uncertiainty:
-Mobile might go with an on-chiplet iGPU and only one IF link on the I/O die, but this would mean no iGPU on >4-core mobile SKUs (unless they make a third chiplet design), while Intel already has 6-cores with iGPUs. As such, I'm leaning towards 2 IF links and a separate iGPU chiplet for ease of scaling, even if the I/O die will be slightly bigger and IF power draw will increase.

Laying out the roadmap like this has a few benefits:
-Only two chiplet designs across all markets.
-Scaling happens through I/O dice, which are made on an older process, are much simpler than CPUs, and should thus be both quick and cheap to make various versions of.
-A separate iGPU chiplet connected through IF makes mobile SKUs easier to design, and the GPU die might be used in dGPUs also.
-Separate iGPU chiplets allow for multiple iGPU sizes - allowing more performance on the high end, or less power draw on the low end.
-Allows for up to 8-core chips with iGPUs in both mobile and desktop.

Of course, this is all pulled straight out of my rear end. Still, one is allowed to dream, no?

TheinsanegamerN said:
If AMD managed a 15% IPC increase over OG zen, I would be amazed. I was expecting around 10%.

There is no way they will hit 20-29%. That is just wishful thinking on AMD's part, most likely in specific scenarios.

Of course, I'd love to e proved wrong here.

Well, they claim to have measured a 29.4% increase. That's not wishful thinking at least. But as I pointed out in a previous post:

Valantar said:
We need to remember that IPC is workload dependent. There might be a 29% increase in IPC in certain workloads, but generally, when we talk about IPC it is average IPC across a wide selection of workloads. This also applies when running test suites like SPEC or GeekBench, as they run a wide variety of tests stressing various parts of the core. What AMD has "presented" (it was in a footnote, it's not like they're using this for marketing) is from two specific workloads. This means that a) this can very likely be true, particularly if the workloads are FP-heavy, and b) this is very likely not representative of total average IPC across most end-user-relevant test suites. In other words, this can be both true (in the specific scenarios in question) and misleading (if read as "average IPC over a broad range of workloads").

TheinsanegamerN · Nov 12, 2018

Valantar said:
Well, they claim to have measured a 29.4% increase. That's not wishful thinking at least. But as I pointed out in a previous post:

AMD also "claimed" to have dramatically faster CPUs with bulldozer, and "claimed" Vega would be dramatically faster then it ended up being. AMD here "claims" to have measured a 29.4% increase in IPC. But that might have been in a workload that uses AVX, and is heavily threaded, or somehow built to take full advantage of ryzen.

I'll wait for third party benchmarks. AMD has made way too many *technically true claims over the years.

*Technically true in one specific workload, overall the performance boost was less then half what AMD claimed, but it was true in one workload, so technically they didnt lie.

Vya Domus · Nov 12, 2018

randomUser said:
If you your task requires 1000 instructions to be completed, then:
Zen1 will finish this task in 1000 clock cycles;
Zen2 will finish this task in 775 clock cycles.

That's not how this works, not all instruction see the same improvement.

TheinsanegamerN said:
I'll wait for third party benchmarks. AMD, Intel, Nvidia have made way too many *technically true claims over the years.

Fixed it.

GlacierNine · Nov 12, 2018

Valantar said:
I partially agree with that - it's very likely they'll put out a low-power 4-ish-core chiplet for mobile. After all, the mobile market is bigger than the desktop market, so it makes more sense for this to get bespoke silicon. What I disagree with is the need for the 8-core to be based off the same CCX as the 4-core. If they can make an 8-core CCX, equalising latencies between cores on the same die, don't you think they'd do so? I do, as that IMO qualifies as "low-hanging fruit" in terms of increasing performance from Zen/Zen+. This would have performance benefits for every single SKU outside of the mobile market. And, generally, it makes sense to assume that scaling down core count per CCX is no problem, so having an 8-core version is no hindrance to also having a 4-core version.

I disagree, for one very simple reason - Tooling up production for 2 different physical products/dies would likely be more expensive than the material savings in not using as much silicon per product. This stuff is not cheap to do, and in CPU manufacture, volume savings are almost always much more dramatic than design/material savings.

Serving Mainstream, HEDT, and Server customers from a single die integrated into multiple packages, is one of the main reasons AMD are in such good shape right now - Intel has to produce their Mainstream, LCC, HCC, and XCC dies and then bin and disable cores on all 4 of them for each market segment. AMD only has to produce and bin one die, to throw onto a variety of packages at *every level* of their product stack.

It's not even worth producing a second die unless the move would bring in not only more profit, but enough extra profit to completely cover the cost of tooling up for that. Bear in mind here that I mean something very specific:

If AMD spends 1bn to produce a second die, and rakes in 1.5bn extra profit over last year, that doesn't necessarily mean tooling up for the extra die was worth it. What if their profits still would have gone up by 1bn anyway, using a single die in production? If that were the case, tooling up just cost AMD a cool $1,000,000,000 in order to make $500,000,000. Sure, they might have gained a bit more marketshare, but not only did it lose them money, it also ended up making their product design procedures more complex and caused additional overheads right the way up through every level of the company, keeping track of the two independent pieces of silicon. It also probably means having further stratification in motherboards and chipsets, whereas right now AMD are very flexible in what they can do to bring these packages to older chipsets or avoid bringing in new ones.

Edit: Not to mention, that using a single, much higher capability die, has other benefits - Like for example being able to provide customers with a *much* longer support period for upgrades - something that has already won them sales with their "AM4 until 2020" approach bringing in consumers who are sick of Intel's socket and chipset-hopping.

Or simply being able to unlock CCXs on new products as and when the market demands that - After all, why would you intentionally design a product that reduces your ability to respond to competition, when your competition is Intel, who you *know* are scrambling to use their higher R&D budget to smack you down again before you get too far ahead?

dirtyferret · Nov 12, 2018

I could "potentially" be making 29% more money next year if the company owner doesn't get in the way.

B-Real · Nov 12, 2018

Prima.Vera said:
Bulldozer, Excavator, ... no thank you. No more hyping until the community benches are out.

You will see this after they are released.

Even if there will be only a ~10% increase from Zen+, they will be on par with Intel in FHD games tested with a 2080Ti.

GlacierNine · Nov 12, 2018

bug said:
95W+ or scarcity are not new to the mainstream market
Even the price is not that out of this world, but at $500 it won't gain 10% market share, so yeah, not that mainstream after all.

"95W+" is a bit misleading. Nobody should be looking at the 9900K and pretending it's simply a return to the hotter chips of yore. The fact is, it's actually a dramatically hotter chip than almost anything that has come before it, and the only reason we're able to tame it is because the coolers we use these days are so much more capable. At the time we were dealing with Intel Prescott chips, one of the best coolers you could buy was the Zalman CNPS9500. Noctua were only just about to release the *first* NH-U12. The undisputed king of the hill for air cooling was the Tuniq Tower 120, soon to be displaced by the original Thermalright Ultra 120.

The NH-D15 didn't exist. There were no AIOs of any kind, and that's why back then, we all struggled to cool Prescott Cores and first Gen i7s.

For example, The i7 975 was a 130W part. The fastest Pentium 4 chips were officially 115W. Intel's Datasheets of that time don't specify how TDP was calculated, but if we assume that they were doing what they do now, which is quote TDP at base clocks under a "close to worst case" workload, then we're probably in good shape.

The i7-975 then, had a 3,333MHz base clock, a 3.467 All-Core boost, and a 3.6GHz single core boost. Not a lot of boost happening here, only an extra 133MHz on all cores. You'd expect no real increase in temperatures under your cooler from such a mild overclock, unless you were OC'ing something like an old P3, so we can probably assume that means the Intel TDP from then, if measured according to today's standards, was probably pretty close to "correct" - You could expect your i7 975 to stick pretty close to that 130W TDP figure in a real world scenario. And this was legitimately a hard to cool chip! Even the best air coolers sometimes struggled.

Compare that to the 9900K, which is breaking 150W power consumption all over the internet, and you suddenly realise - The only reason these chips are surviving in the wild is because:

1 - Intel's current Arch will maintain it's maximum clocks way up into the 90+ Celsius range
2 - People are putting them under NH-D15s - and even then we're seeing temperature numbers that, back in the P4 days, would have been considered "Uncomfortable" and "dangerous".

The 9900K is, as far as I can tell, simply the most power hungry and hard to cool processor that Intel has ever released on a mainstream platform. It runs at the *ragged edge* of acceptability. You can't just brush this sort of thing off with "The market has seen 95W chips before". That's not what the 9900K actually is. It's something much, much more obscene.

Smartcom5 · Nov 12, 2018

bug said:
No, I meant just what I said/wrote

Gosh, I'm really sorry, was my bad!

Picked the wrong quote, was meant to quote @WikiFM …

WikiFM said:
… I thought Zen was still way behind Intel in single threaded performance or IPC.

Smartcom

bug · Nov 12, 2018

GlacierNine said:
"95W+" is a bit misleading. Nobody should be looking at the 9900K and pretending it's simply a return to the hotter chips of yore. The fact is, it's actually a dramatically hotter chip than almost anything that has come before it, and the only reason we're able to tame it is because the coolers we use these days are so much more capable. At the time we were dealing with Intel Prescott chips, one of the best coolers you could buy was the Zalman CNPS9500. Noctua were only just about to release the *first* NH-U12. The undisputed king of the hill for air cooling was the Tuniq Tower 120, soon to be displaced by the original Thermalright Ultra 120.

That is completely wrong. 9900k is a 95W chip and will work within a 95W power envelope. It has potential to work faster when unconstrained, but it will work with a 95W heat sink. Old Pentium Ds were 130W chips and back then, Intel's guidance was only for average power draw, not maximal (kind of like those 95W mean today, though not exactly the same).
That said, there's no denying what Intel has now is redesign trying to fit more tricks into the current process node which should be long behind us. Thus, it's an architecture stretched past its intended lifetime.

Smartcom5 · Nov 12, 2018

Valantar said:
What @Smartcom5 is calling "IPS" is just actual performance (which occurs in the real world, and thus includes time as a factor, and thus also clock speed) and not the intentional abstraction that IPC is.

I'm sorry but I'm not just 'calling' it as such, I just pointed out how things are actually standardised. IPC, IPS and CPI in fact are known and common figures, hence the wiki-links. But as you can see, the whole thing isn't as nearly as trivial as it might look to be.

That's why actual Performance is usually by default measured using the figure of the actually absolute and fixed unit FLOPS (Floating Point Operations Per Second) or MIPS (Million Instructions per Second) – hence the performance of instructions per (clock-) cycle while performing a processing of a equally pre-defined kind of instruction (in this case, floating-point numbers).

Smartcom

HD64G · Nov 12, 2018

Valantar said:
You're phrasing this as if you're arguing against me, yet what you're saying is exactly what I'm saying. Sounds like you're replying to the wrong post. The image I co-opted came from the quoted post, I just sketched in how I believe they'll lay this out.

My mistake indeed and I edited my post to correct the misunderstanding. Cheers! :toast:

GlacierNine · Nov 12, 2018

bug said:
That is completely wrong. 9900k is a 95W chip and will work within a 95W power envelope. It has potential to work faster when unconstrained, but it will work with a 95W heat sink. Old Pentium Ds were 130W chips and back then, Intel's guidance was only for average power draw, not maximal (kind of like those 95W mean today, though not exactly the same).
That said, there's no denying what Intel has now is redesign trying to fit more tricks into the current process node which should be long behind us. Thus, it's an architecture stretched past its intended lifetime.

Oh please, stop the apologism. The 9900K will work within a 95W power envelope, yes. At 3.6GHz base clock, with occasional jumps to higher speeds where the cooling solution's "thermal capacitance" can be leveraged.

But these chips and this silicon aren't designed to be 3.6GHz parts in daily use. They are ~4.7GHz parts that Intel reduced the base clocks on, in order to be able to claim a 95W TDP. If you had the choice between running a 7700K and a 9900K at base clocks, the 7700K would actually get you the better gaming performance in most games. Would you say that's Intel's intention? To create a market where a CPU 2 generations old, with half the cores, outperforms their current flagship in exactly the task Intel advertise the 9900K to perform?

Or would you say that actually, Intel has transitioned from using boost clock as "This is extra performance if you can cool it", to using boost clock as the figure expected to sell the CPU, and therefore the figure most users expect to see in use?

You can clearly see this in the progression of the flagships, each generation.

6700K - 4.0GHz Base, 4 Cores, 95W TDP
7700K - 4.2GHz Base, 4 Cores, 95W TDP
8700K - 3.7GHz Base, 6 Cores, 95W TDP
9900K - 3.6GHz Base, 8 Cores, 95W TDP.

Oh well would you look at that - As soon as Intel started adding cores, they dropped the base clocks dramatically in order to keep their "95W TDP at base clocks" claim technically true. But look at the all core boost clocks:

4.0GHz, 4.4GHz, 4.3GHz, 4.7GHz

They dipped by 100MHz on the 8700K, to prevent a problem similar to the 7700K, which was known to spike in temperature even under adequate cooling, only to come back up on the 9900K, but this time with Solder TIM to prevent that from happening.

Single core is the same story - 4.2, 4.5, 4.7, 5.0. A constant increase in clockspeed each generation.

Like I said - Boost is no longer a boost. Boost has become the expected performance standard of Intel chips. Once you judge the chips on that basis, the 9900K reveals itself to be a power hungry monster that makes the hottest Prescott P4 chips look mild in comparison.

bug · Nov 12, 2018

GlacierNine said:
The 9900K will work within a 95W power envelope, yes. At 3.6GHz base clock, with occasional jumps to higher speeds where the cooling solution's "thermal capacitance" can be leveraged.
Oh please, stop the apologism. These chips aren't 95W, 3.6GHz parts that Intel have magically made capable of overclocking themselves by 1.1GHz on all cores. They are ~4.7GHz parts that Intel reduced the base clocks on, in order to be able to claim a 95W TDP. If you could go back in time and cast a magic spell that

You can clearly see this in the progression of the flagships, each generation.

6700K - 4.0GHz Base, 4 Cores, 95W TDP
7700K - 4.2GHz Base, 4 Cores, 95W TDP
8700K - 3.7GHz Base, 6 Cores, 95W TDP
9900K - 3.6GHz Base, 8 Cores, 95W TDP.

Oh well would you look at that - As soon as Intel started adding cores, they dropped the base clocks dramatically in order to keep their "95W TDP at base clocks" claim technically true. But look at the all core boost clocks:

4.0GHz, 4.4GHz, 4.3GHz, 4.7GHz

They dipped by 100MHz on the 8700K, to prevent a problem similar to the 7700K, which was known to spike in temperature even under adequate cooling, only to come back up on the 9900K, but this time with Solder TIM to prevent that from happening.

Single core is the same story - 4.2, 4.5, 4.7, 5.0. A constant increase in clockspeed each generation.

Intel's game here has been to transition from "Intel Turbo Boost Technology allows your processor to go beyond base specs" as their marketing angle, to a standpoint of "If your cooling can't allow the chip to boost constantly, then you're wasting the potential of your CPU". It's not even wrong - you *are* wasting the potential of your extremely expensive CPU if you don't manage to run it *WELL* into boost clock these days.

I'm not sure where you and I disagree. All these CPUs will work at 95W at their designated baseline clocks. With beefier heat sinks you can extract more performance. Nothing has changed, except the boost algorithms that have become smarter. Would you prefer a hard 95W limitation instead or what's your beef here?

efikkan · Nov 12, 2018

It seems to me like this article is based on a bad translation referring to a 29% performance uplift (partially due to increased FPU width). For starters, to estimate IPC the clock speed would have to be at completely fixed (no boost). Secondly, in reality performance is not quite as simple as clock times "IPC", due to memory latency becoming a larger bottleneck with higher clocks.

A 29% IPC uplift would certainly be welcome, but keep in mind this is about twice the accumulated improvements from Sandy Bridge -> Skylake. I wonder how this thread would turn out if someone claimed Ice Lake would offer 29% IPC gains? :rolleyes:

Let's not have another Vega Victory Dance™. We need to clam down this extreme hype and be realistic. Zen 2 is an evolved Zen, it will probably do tweaks and small improvements across the design, but it will not be a major improvement over Zen.

londiste · Nov 12, 2018

efikkan said:
It seems to me like this article is based on a bad translation referring to a 29% performance uplift (partially due to increased FPU width).

It actually isn't, the bad translation part I mean. This is from AMD themselves (see note 1):
http://ir.amd.com/news-releases/new...performance-datacenter-computing-next-horizon

R0H1T · Nov 12, 2018

GlacierNine said:
I disagree, for one very simple reason - Tooling up production for 2 different physical products/dies would likely be more expensive than the material savings in not using as much silicon per product. This stuff is not cheap to do, and in CPU manufacture, volume savings are almost always much more dramatic than design/material savings.

Serving Mainstream, HEDT, and Server customers from a single die integrated into multiple packages, is one of the main reasons AMD are in such good shape right now - Intel has to produce their Mainstream, LCC, HCC, and XCC dies and then bin and disable cores on all 4 of them for each market segment. AMD only has to produce and bin one die, to throw onto a variety of packages at *every level* of their product stack.

It's not even worth producing a second die unless the move would bring in not only more profit, but enough extra profit to completely cover the cost of tooling up for that. Bear in mind here that I mean something very specific:

If AMD spends 1bn to produce a second die, and rakes in 1.5bn extra profit over last year, that doesn't necessarily mean tooling up for the extra die was worth it. What if their profits still would have gone up by 1bn anyway, using a single die in production? If that were the case, tooling up just cost AMD a cool $1,000,000,000 in order to make $500,000,000. Sure, they might have gained a bit more marketshare, but not only did it lose them money, it also ended up making their product design procedures more complex and caused additional overheads right the way up through every level of the company, keeping track of the two independent pieces of silicon. It also probably means having further stratification in motherboards and chipsets, whereas right now AMD are very flexible in what they can do to bring these packages to older chipsets or avoid bringing in new ones.

Edit: Not to mention, that using a single, much higher capability die, has other benefits - Like for example being able to provide customers with a *much* longer support period for upgrades - something that has already won them sales with their "AM4 until 2020" approach bringing in consumers who are sick of Intel's socket and chipset-hopping.

Or simply being able to unlock CCXs on new products as and when the market demands that - After all, why would you intentionally design a product that reduces your ability to respond to competition, when your competition is Intel, who you *know* are scrambling to use their higher R&D budget to smack you down again before you get too far ahead?

The market (retail?) you're talking about is also huge, in fact bigger than enterprise even for Intel.
If the (extra) power savings materialize for ULP & ULV products then it makes sense to deploy a 4 core CCX over there, however an 8 core CCX will have better latencies & probably higher clocks as well.

bug said:
I'm not sure where you and I disagree. All these CPUs will work at 95W at their designated baseline clocks. With beefier heat sinks you can extract more performance. Nothing has changed, except the boost algorithms that have become smarter. Would you prefer a hard 95W limitation instead or what's your beef here?

Fake 95W TDP?

Processor	Intel i5-12600k
Motherboard	Asus H670 TUF
Cooling	Arctic Freezer 34
Memory	2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s)	EVGA GTX 1060 SC
Storage	500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s)	Dell U3219Q + HP ZR24w
Case	Raijintek Thetis
Audio Device(s)	Audioquest Dragonfly Red :D
Power Supply	Seasonic 620W M12
Mouse	Logitech G502 Proteus Core
Keyboard	G.Skill KM780R
Software	Arch Linux + Win10

Processor	AMD Ryzen 5 5600@80W
Motherboard	MSI B550 Tomahawk
Cooling	ZALMAN CNPS9X OPTIMA
Memory	2*8GB PATRIOT PVS416G400C9K@3733MT_C16
Video Card(s)	Sapphire Radeon RX 6750 XT Pulse 12GB
Storage	Sandisk SSD 128GB, Kingston A2000 NVMe 1TB, Samsung F1 1TB, WD Black 10TB
Display(s)	AOC 27G2U/BK IPS 144Hz
Case	SHARKOON M25-W 7.1 BLACK
Audio Device(s)	Realtek 7.1 onboard
Power Supply	Seasonic Core GC 500W
Mouse	Sharkoon SHARK Force Black
Keyboard	Trust GXT280
Software	Win 7 Ultimate 64bit/Win 10 pro 64bit/Manjaro Linux

Processor	Intel i5-12600k
Motherboard	Asus H670 TUF
Cooling	Arctic Freezer 34
Memory	2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s)	EVGA GTX 1060 SC
Storage	500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s)	Dell U3219Q + HP ZR24w
Case	Raijintek Thetis
Audio Device(s)	Audioquest Dragonfly Red :D
Power Supply	Seasonic 620W M12
Mouse	Logitech G502 Proteus Core
Keyboard	G.Skill KM780R
Software	Arch Linux + Win10

System Name	Hotbox
Processor	AMD Ryzen 7 5800X, 110/95/110, PBO +150Mhz, CO -7,-7,-20(x6),
Motherboard	ASRock Phantom Gaming B550 ITX/ax
Cooling	LOBO + Laing DDC 1T Plus PWM + Corsair XR5 280mm + 2x Arctic P14
Memory	32GB G.Skill FlareX 3200c14 @3800c15
Video Card(s)	PowerColor Radeon 6900XT Liquid Devil Ultimate, UC@2250MHz max @~200W
Storage	2TB Adata SX8200 Pro
Display(s)	Dell U2711 main, AOC 24P2C secondary
Case	SSUPD Meshlicious
Audio Device(s)	Optoma Nuforce μDAC 3
Power Supply	Corsair SF750 Platinum
Mouse	Logitech G603
Keyboard	Keychron K3/Cooler Master MasterKeys Pro M w/DSA profile caps
Software	Windows 10 Pro

System Name	Avell old monster - Workstation T1 - HTPC
Processor	i7-3630QM\i7-5960x\Ryzen 3 2200G
Cooling	Stock.
Memory	2x4Gb @ 1600Mhz
Video Card(s)	HD 7970M \ EVGA GTX 980\ Vega 8
Storage	SSD Sandisk Ultra li - 480 GB + 1 TB 5400 RPM WD - 960gb SDD + 2TB HDD

System Name	Skunkworks 3.0
Processor	5800x3d
Motherboard	x570 unify
Cooling	Noctua NH-U12A
Memory	32GB 3600 mhz
Video Card(s)	asrock 6800xt challenger D
Storage	Sabarent rocket 4.0 2TB, MX 500 2TB
Display(s)	Asus 1440p144 27"
Case	Old arse cooler master 932
Power Supply	Corsair 1200w platinum
Mouse	squeak
Keyboard	Some old office thing
Software	Manjaro

System Name	Good enough
Processor	AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard	ASRock B650 Pro RS
Cooling	2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory	32GB - FURY Beast RGB 5600 Mhz
Video Card(s)	Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage	1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s)	LG UltraGear 32GN650-B + 4K Samsung TV
Case	Phanteks NV7
Power Supply	GPS-750C

Processor	faster at instructions than yours
Motherboard	more nurturing than yours
Cooling	frostier than yours
Memory	superior scheduling & haphazardly entry than yours
Video Card(s)	better rasterization than yours
Storage	more ample than yours
Display(s)	increased pixels than yours
Case	fancier than yours
Audio Device(s)	further audible than yours
Power Supply	additional amps x volts than yours
Mouse	without as much gnawing as yours
Keyboard	less clicky than yours
VR HMD	not as odd looking as yours
Software	extra mushier than yours
Benchmark Scores	up yours

System Name	Custom build, AMD/ATi powered.
Processor	AMD FX™ 8350 [8x4.6 GHz]
Motherboard	AsRock 970 Extreme3 R2.0
Cooling	be quiet! Dark Rock Advanced C1
Memory	Crucial, Ballistix Tactical, 16 GByte, 1866, CL9
Video Card(s)	AMD Radeon HD 7850 Black Edition, 2 GByte GDDR5
Storage	250/500/1500/2000 GByte, SSD: 60 GByte
Display(s)	Samsung SyncMaster 950p
Case	CoolerMaster HAF 912 Pro
Audio Device(s)	7.1 Digital High Definition Surround
Power Supply	be quiet! Straight Power E9 CM 580W
Software	Windows 7 Ultimate x64, SP 1

Processor	AMD Ryzen 9 5900X \|\|\| Intel Core i7-3930K
Motherboard	ASUS ProArt B550-CREATOR \|\|\| Asus P9X79 WS
Cooling	Noctua NH-U14S \|\|\| Be Quiet Pure Rock
Memory	Crucial 2 x 16 GB 3200 MHz \|\|\| Corsair 8 x 8 GB 1333 MHz
Video Card(s)	MSI GTX 1060 3GB \|\|\| MSI GTX 680 4GB
Storage	Samsung 970 PRO 512 GB + 1 TB \|\|\| Intel 545s 512 GB + 256 GB
Display(s)	Asus ROG Swift PG278QR 27" \|\|\| Eizo EV2416W 24"
Case	Fractal Design Define 7 XL x 2
Audio Device(s)	Cambridge Audio DacMagic Plus
Power Supply	Seasonic Focus PX-850 x 2
Mouse	Razer Abyssus
Keyboard	CM Storm QuickFire XT
Software	Ubuntu

Processor	Ryzen 7800X3D
Motherboard	ROG STRIX B650E-F GAMING WIFI
Memory	2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s)	INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage	2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s)	42" LG C2 OLED, 27" ASUS PG279Q
Case	Thermaltake Core P5
Power Supply	Fractal Design Ion+ Platinum 760W
Mouse	Corsair Dark Core RGB Pro SE
Keyboard	Corsair K100 RGB
VR HMD	HTC Vive Cosmos