Sunday, March 5th 2023
AMD's Zen 4 I/O Die Detailed Courtesy of ISSCC Presentation
Although we've known most of the details of AMD's I/O die in its Zen 4 processors, until now, AMD hadn't shared a die shot of the cIOD, but thanks to its ISSCC 2023 presentation, we not only have a die shot of the cIOD, but some friendly people on the internet have also made annotations for us mere mortals. There are no big secrets here, but based on the annotations by @Locuza_ we now know for certain that it's not possible to use the current I/O die with three CCDs, as it only has two GMI3 interfaces, to which the CCDs are connected.
If you're wondering about the 2x 40-bit memory interface, it's for ECC memory support outside of the on-die ECC support of DDR5 memory. Also note that DDR5 memory is two times 32-bit in non ECC mode. That said, it's up to the motherboard makers to implement support for ECC memory, but it would appear all Zen 4 CPUs support it. The addition of a GPU, even a basic one like this, takes up a fair bit of space inside the cIOD, especially once you add things like video decoders/encoders and so on. In fact, it appears that the parts related to the GPU and video decoders/encoders take up at least a third of the space inside the I/O die, yet thanks to a significant die shrink from the Zen 3 era cIOD, it's physically smaller in the Zen 4 processors, while having an estimated 58 percent increase in transistors.
Sources:
@Locuza_ (on Twitter), @lixnjen (on Twitter)
If you're wondering about the 2x 40-bit memory interface, it's for ECC memory support outside of the on-die ECC support of DDR5 memory. Also note that DDR5 memory is two times 32-bit in non ECC mode. That said, it's up to the motherboard makers to implement support for ECC memory, but it would appear all Zen 4 CPUs support it. The addition of a GPU, even a basic one like this, takes up a fair bit of space inside the cIOD, especially once you add things like video decoders/encoders and so on. In fact, it appears that the parts related to the GPU and video decoders/encoders take up at least a third of the space inside the I/O die, yet thanks to a significant die shrink from the Zen 3 era cIOD, it's physically smaller in the Zen 4 processors, while having an estimated 58 percent increase in transistors.
15 Comments on AMD's Zen 4 I/O Die Detailed Courtesy of ISSCC Presentation
Ya got that right
wow
Zen4c I think may have more per ccd bu they are more like Intels E cores
This is presuming AMD figures out how to clock CCDs with stacked cache higher.
Besides, 3D cache indirectly limits max clockspeed, and that's probably not something AMD wants for each and every SKU, although who knows, they might be able to get rid of that limit in the future, somehow. The Zen 4 CCD's are 13 % smaller, they have 8 MB more cache, etc.
If they care about OEM and "economy enthusiast" market, which means Ryzen 5 and Ryzen 3, making 6-core or even 4-core processors by throwing half (or more) of the CCD away looks like a horrible waste of silicon. But maybe they don't care, or intend to supply that part of the market with monolithic APUs primarily.
On the other hand, an Epyc with 12 CCDs by 8 cores each seems overly complex, with a huge 12-way exchange for the IFOP network, and lost performance due to too much inter-CCD communication. Wouldn't 8 CCDs by 12 cores each work better?
As for stacking - no, it's just impossible we could ever see a cache chiplet on top of every compute chiplet. Think of it, there's about 50% additional silicon area, the process of thinning and bonding, less than 100% packaging yield, and there are thermal issues that can be minimised but never removed. All that for a benefit that certainly exists but isn't universal, neither in desktop nor in server and HPC applications.
Stacked cache will continue to be an option, possibly with additional twists (two stories high? extending L2 too? usable on APUs too? usable in IOD too, haha?). Because AMD just has to do something ingenious and totally unexpected from time to time. Of course an E core is far weaker. But it's very small and can't be dragged down by running two threads at the same time!
An E core is about 1/3 the area of a P core; some say 1/4 but looking at die shots, I measured an E cluster with L2 cache to be about 4/3 of a P core, with L3 slices not included. A fair MT comparison would be running a fixed number of threads (4, 8, 12 or 16) on E cores compared to P cores, with each P core struggling with two threads.
Something I'd really love to see in the future would be more pcie lanes, especially as regular threadripper seems to be done for, there's a gap in the lineup for anyone that needs more expansion but can't really step up and pay threadripper pro money. Even regular threaripper had this problem, doing an am5 threaripper could be really cool 5 Ghz all core not enought for you? ;)
What I'd want is for them to not limit the 7800x3d so much, from the listed specs the clock reduction vs 7950x3d could be pretty brutal just for the sake of segmentation :shadedshu:
If so, your argument is pretty weak..
7800X3D max boost clocks 5Ghz
7950X3D Max boost clocks (For X3D Die) ~5.25Ghz
So you are only giving up 2-300Mhz tops.
I want my Task manager processor view to be readable ! :D
But seriously, for most people, faster core are better than more core. Else intel would just load a CPU with 40 E-Cores and call it a day.
Also, AMD stated that one of the benefits of chiplets was they did one I/O and after, they could focus fully on the CCD and save time. The I/O and memory controller are supposedly harder to do and longer to develop than logic die.
The downside is AMD might be stuck at DDR5-6000 for few gen if they don't improve their I/O die. But they could always take the hit with more cache.
I want moar speed! (says the person that won't be buying either of them)