Tuesday, November 6th 2018
AMD Zen 2 "Rome" MCM Pictured Up Close
Here is the clearest picture of AMD "Rome," codename for the company's next-generation EPYC socket SP3r2 processor, which is a multi-chip module of 9 chiplets (up from four). While first-generation EPYC MCMs (and Ryzen Threadripper) were essentially "4P-on-a-stick," the new "Rome" MCM takes the concept further, by introducing a new centralized uncore component called the I/O die. Up to eight 7 nm "Zen 2" CPU dies surround this large 14 nm die, and connect to it via substrate, using InfinityFabric, without needing a silicon interposer. Each CPU chiplet features 8 cores, and hence we have 64 cores in total.
The CPU dies themselves are significantly smaller than current-generation "Zeppelin" dies, although looking at their size, we're not sure if they're packing disabled integrated memory controllers or PCIe roots anymore. While the transition to 7 nm can be expected to significantly reduce die size, groups of two dies appear to be making up the die-area of a single "Zeppelin." It's possible that the CPU chiplets in "Rome" physically lack an integrated northbridge and southbridge, and only feature a broad InfinityFabric interface. The I/O die handles memory, PCIe, and southbridge functions, featuring an 8-channel DDR4 memory interface that's as monolithic as Intel's implementations, a PCI-Express gen 4.0 root-complex, and other I/O.
Source:
Tom's Hardware
The CPU dies themselves are significantly smaller than current-generation "Zeppelin" dies, although looking at their size, we're not sure if they're packing disabled integrated memory controllers or PCIe roots anymore. While the transition to 7 nm can be expected to significantly reduce die size, groups of two dies appear to be making up the die-area of a single "Zeppelin." It's possible that the CPU chiplets in "Rome" physically lack an integrated northbridge and southbridge, and only feature a broad InfinityFabric interface. The I/O die handles memory, PCIe, and southbridge functions, featuring an 8-channel DDR4 memory interface that's as monolithic as Intel's implementations, a PCI-Express gen 4.0 root-complex, and other I/O.
71 Comments on AMD Zen 2 "Rome" MCM Pictured Up Close
Also will less than 64 cores CPUs be using dummy or faulty silicon as before?
Also will the 2 dies next to each other communicate directly?
I don't think Ryzen 3000 will use the I/O die, since it is 14 nm and is huge and perhaps very expensive. That means that Rome either has IMC and PCI disabled, or more likely Ryzen will have a different die.
So the performance of a 2950X in the MSDT segment could hurt Intel plenty.
First the bad news, motherboards are going to get even more expensive, as with PCIe 4.0 the boards are going to need some kind of "re-driver" for the PCIe 4.0 signals and apparently at least one is required, but if you want dual x16 slots on boards, supposedly two are needed. These are expensive parts and will increase board costs.
From my understanding, AMD is going full-on PCIe 4.0, so not only the lanes from the CPU to the chipset will be PCIe 4.0, but also the lanes to all peripherals. This means AMD will be the first company to offer full PCIe 4.0 support on a consumer board, unless Intel can get something out before the Ryzen 3xxx series launches. Expect a vastly improved chipset, but I can't reveal too much as yet, as I don't want to get people in trouble for leaking information that isn't even remotely public as yet. All I can say is that I think everyone will be a lot happier with AMD's high-end chipset for the Ryzen 3xxx series, as it doesn't have any of the weird limitations that the current chipsets have. There won't be any bandwidth starved peripherals this time around.
Not bad, however the software is ridiculously overly-behind for multi core able applications, and I am not talking only about Game Engines.
intel did nothing in 10 yrs
that cpu looks so beautiful
because of the past 10 yrs and intel's "good consumer policies", i just hate them, truly hate them
PS: let's not forget how and what intel did to AMD for reaching a now shaking 1 position
Despite that AMD didn't clarify its FPU architecture, THIS IS F**KING AWESOME. Threadripper 3000s will be perfect for HPC.
IO Hub will inevitably increase memory latency as no CPU chiplets will have direct access to memory. AMD's wording was quite clever, they said someting along the lines of no more variable RAM access latency :). This does not look right for the desktop/gaming CPU especially if the competition keeps going the current way. Does desktop even need more than 8 cores at this point?
As for consumer facing chips, this looks absolutely brilliant. Why? Because of the innate scalability and flexibility of this design. They still have the "one die to rule them all" design, only now it's one 8-core CCX. No more inter-core IF at 8 cores or less. No more inter-CCX latencies or other issues at or below 8 cores, which is plenty for >99% of consumers. As for the I/O die, they can make however many designs they want, and they'll be relatively cheap Lego-like designs. X number of IF ports, Y DRAM channels, various other I/O blocks, cut, paste, done, fab. On a proven and well known process node from a supplier where they have plentiful capacity and favorable pricing. This will let AMD diversify their product portfolio while maintaining fab/die portfolio simplicity for the complex logic dice. Ryzen gets a small I/O die with dual channel memory, < 32 PCIe lanes and support for up to two active dice. This could be very small, given the dramatic reduction in I/O needs from EPYC. TR gets four active dice, quad channel, 64 PCIe, at half the size of the full-fat EPYC I/O chip (or less, given there's no need for IF for multi-socket platforms). APUs could either re-use the Ryzen one, just replacing one die with a GPU over IF, or get a bespoke solution with different I/O. Mobile could get a tiny I/O die with just the basics (a couple of SATA, 4-ish USB, video output, 8-16 PCIe).
My only real concern here is the cost and complexity of implementing an mcm design on smaller packages, given how thick and massive TR4 packages and substrates are, but then again, linking 2-3 dice with short and straight runs of IF should be far simpler than the 4-way crosswise layout of Threadripper. At least outside of mobile, this should be entirely doable (at now that we see that they can do 9 dice on a single substrate in TR4).
As for memory latency, it'll obviously increase compared to having the DRAM controller on-die, but the increase ought to be smaller than current TR/EPYC die-to-die hops given that the die with the DRAM controllers now only does I/O, and should have a far more optimized layout for this. This isn't ideal, but likely not a performance killer either. Given that they say their memory controllers are quite improved, it'll likely be a wash for consumer use cases.
Add to this what looks like a significant IPC increase in a wide swath of applications, clock speed increases (1.25x at same power according to the slides, so 8 cores at 4.5GHz (3.6GHz*1.25) at 95W if we're going off the 1800x. This sounds quite optimistic, but given how badly GloFo 14nm scaled above 4GHz, it might be possible. If we get 3-4-core turbo above 4.5GHz, that'd be an amazing gaming chip.
My main question now is whether AMD is able to attach HMB2 to an APU in this same way, or if that still requires an interposer/EMIB. I'm skeptical of this being possible, but if it is - hot damn, next-gen APUs could be amazing.
Why i'm afraid? If AMD does end ahead of Intel, what's to stop AMD from pricing their chips like Intel?