- Joined
- Oct 9, 2007
- Messages
- 47,293 (7.53/day)
- Location
- Hyderabad, India
System Name | RBMK-1000 |
---|---|
Processor | AMD Ryzen 7 5700G |
Motherboard | ASUS ROG Strix B450-E Gaming |
Cooling | DeepCool Gammax L240 V2 |
Memory | 2x 8GB G.Skill Sniper X |
Video Card(s) | Palit GeForce RTX 2080 SUPER GameRock |
Storage | Western Digital Black NVMe 512GB |
Display(s) | BenQ 1440p 60 Hz 27-inch |
Case | Corsair Carbide 100R |
Audio Device(s) | ASUS SupremeFX S1220A |
Power Supply | Cooler Master MWE Gold 650W |
Mouse | ASUS ROG Strix Impact |
Keyboard | Gamdias Hermes E2 |
Software | Windows 11 Pro |
AMD's "Zen 3" CCD, or compute complex die, the physical building-block of both its client- and enterprise processors, possibly has a core count limitation owing to the way the various on-die bandwidth-heavy components are interconnected, says an AnandTech report. This cites what is possibly the first insights AMD provided on the CCD's switching fabric, which confirms the presence of a Ring Bus topology. More specifically, the "Zen 3" CCD uses a bi-directional Ring Bus to connect the eight CPU cores with the 32 MB of shared L3 cache, and other key components of the CCD, such as the IFOP interface that lets the CCD talk to the I/O die (IOD).
Imagine a literal bus driving around a city block, picking up and dropping off people between four buildings. The "bus" here resembles a strobe, the buildings resemble components (cores, uncore, etc.,) while the the bus-stops are ring-stops. Each component has its ring-stops. To disable components (eg: in product-stack segmentation), SKU designers simply disable ring-stops, making the component inaccessible. A bi-directional Ring Bus would see two "vehicles" driving in opposite directions around the city block. The Ring Bus topology comes with limitations of scale, mainly resulting from the latency added from too many ring-stops. This is precisely why coaxial ring-topology faded out in networking.
Intel realized in the early 2010s that it could not scale up CPU core counts on its monolithic processor dies beyond a point using Ring Bus, and had to innovate the Mesh Topology. The Mesh is a more advanced ringbus but with additional points of connectivity between components, making halfway between a Ring Bus and full-interconnectivity (in which each component is directly interconnected with the other, an impractical solution at scale). AMD's recipe for extreme core-count processors, such as the 64-core EPYC, is in using 8-core CCDs (each with an internal bi-directional Ring Bus), that are networked at the sIOD.
It's interesting to note here, that AMD didn't always use a Ring Bus on its CCDs. Older "Zen 2" chiplets with 4-core CCX (CPU complex) used full interconnectivity between four components (i.e. four CPU cores and their slices of the shared L3 cache). This was illustrated more looking at the slide, where AMD mentioned "same latency" for a core to access every other L3 slice (which wouldn't quite be possible even with a bi-directional Ring Bus). This begins to explain AMD's rationale behind the 4-core CCX. Eventually the performance benefit of a monolithic 8-core CCX interconnected with a bi-directional Ring Bus won out, so AMD went with this approach for "Zen 3."
For the future, AMD might need to let go of Ring Bus to scale beyond a certain number of CPU cores per CCD, AnandTech postulates. This is for the same reason Intel ditched Ring Bus for high core-count processors—latency. The CCD of the future could be made up of three distinct dies stacked up: the topmost die could be made up of cache, the middle die of the CPU cores, and the bottom die of a Mesh Interconnect. The next logical step would be to scale this interconnect layer into a silicon interposer with several CPU+cache dies stacked on top.
View at TechPowerUp Main Site
Imagine a literal bus driving around a city block, picking up and dropping off people between four buildings. The "bus" here resembles a strobe, the buildings resemble components (cores, uncore, etc.,) while the the bus-stops are ring-stops. Each component has its ring-stops. To disable components (eg: in product-stack segmentation), SKU designers simply disable ring-stops, making the component inaccessible. A bi-directional Ring Bus would see two "vehicles" driving in opposite directions around the city block. The Ring Bus topology comes with limitations of scale, mainly resulting from the latency added from too many ring-stops. This is precisely why coaxial ring-topology faded out in networking.
Intel realized in the early 2010s that it could not scale up CPU core counts on its monolithic processor dies beyond a point using Ring Bus, and had to innovate the Mesh Topology. The Mesh is a more advanced ringbus but with additional points of connectivity between components, making halfway between a Ring Bus and full-interconnectivity (in which each component is directly interconnected with the other, an impractical solution at scale). AMD's recipe for extreme core-count processors, such as the 64-core EPYC, is in using 8-core CCDs (each with an internal bi-directional Ring Bus), that are networked at the sIOD.
It's interesting to note here, that AMD didn't always use a Ring Bus on its CCDs. Older "Zen 2" chiplets with 4-core CCX (CPU complex) used full interconnectivity between four components (i.e. four CPU cores and their slices of the shared L3 cache). This was illustrated more looking at the slide, where AMD mentioned "same latency" for a core to access every other L3 slice (which wouldn't quite be possible even with a bi-directional Ring Bus). This begins to explain AMD's rationale behind the 4-core CCX. Eventually the performance benefit of a monolithic 8-core CCX interconnected with a bi-directional Ring Bus won out, so AMD went with this approach for "Zen 3."
For the future, AMD might need to let go of Ring Bus to scale beyond a certain number of CPU cores per CCD, AnandTech postulates. This is for the same reason Intel ditched Ring Bus for high core-count processors—latency. The CCD of the future could be made up of three distinct dies stacked up: the topmost die could be made up of cache, the middle die of the CPU cores, and the bottom die of a Mesh Interconnect. The next logical step would be to scale this interconnect layer into a silicon interposer with several CPU+cache dies stacked on top.
View at TechPowerUp Main Site