- Joined
- Oct 9, 2007
- Messages
- 47,297 (7.53/day)
- Location
- Hyderabad, India
System Name | RBMK-1000 |
---|---|
Processor | AMD Ryzen 7 5700G |
Motherboard | ASUS ROG Strix B450-E Gaming |
Cooling | DeepCool Gammax L240 V2 |
Memory | 2x 8GB G.Skill Sniper X |
Video Card(s) | Palit GeForce RTX 2080 SUPER GameRock |
Storage | Western Digital Black NVMe 512GB |
Display(s) | BenQ 1440p 60 Hz 27-inch |
Case | Corsair Carbide 100R |
Audio Device(s) | ASUS SupremeFX S1220A |
Power Supply | Cooler Master MWE Gold 650W |
Mouse | ASUS ROG Strix Impact |
Keyboard | Gamdias Hermes E2 |
Software | Windows 11 Pro |
AMD built its Ryzen Threadripper HEDT (high-end desktop) processor as a multi-chip module (MCM) of two 8-core "Summit Ridge" dies, each with its own dual-channel memory controller, and PCI-Express interface. This is unlike the competing Core "Skylake-X" from Intel, which is a monolithic 18-core die with a quad-channel DDR4 interface and 44-lane PCIe on one die. AMD has devised some innovative methods of overcoming the latency issues inherent to an MCM arrangement like the Ryzen Threadripper, by tapping into its nUMA technology innovation.
To the hardware, four 8 GB DDR4 memory modules populating the four memory channels of a Ryzen Threadripper chip is seen as 16 GB controlled by each of the two "Summit Ridge" dies. To the software, it is a seamless block of 32 GB. Blindly interleaving the four 8 GB memory modules for four times the bandwidth of a single module isn't as straightforward as it is on the Core X, and is fraught with latency issues. A thread being processed by a core on die-A, having half of its memory allocation on memory controlled by a different die, is hit with latency. AMD is overcoming this by treating memory on a Ryzen Threadripper machine like a 2-socket machine, in which each socket has its own memory.
Software needs to be optimized to see Threadripper as featuring two memory allocation modes - Distributed Mode, and Local Mode. In Distributed Mode, all four memory channels are interleaved with a priority of giving the app access to the highest bandwidth. In Local Mode, the an app loads memory controlled by a particular die first, and only then begins to load memory controlled by the neighboring die. The priority here is latency. In its internal tests, the Distributed Mode yields higher memory bandwidth at the expense of latency (not by much, though); while the Local Mode does the opposite (provides the least latency at the expense of bandwidth).
AMD exhaustively marketed the Ryzen Threadripper as featuring 64 PCI-Express gen 3.0 lanes. They weren't counting the general-purpose lanes from the chipset, because those are gen 2.0. AMD arrived at the number 64 by adding up 32 PCIe gen 3.0 lanes from each of the two "Summit Ridge" silicons, including the 4 lanes typically reserved as chipset-bus (the interconnect between the processor and the AMD X399 chipset). On a typical Threadripper-powered machine 4 out of 64 lanes are permanently allocated as chipset-bus. 32 lanes are wired out as PEG (PCI-Express Graphics) lanes, driving either two graphics cards at full x16 bandwidth, or four cards at x8 bandwidth, each. But wait, that still leaves us with 28 lanes. These can either be used to wire out a third set of PEG slots (one x16 or two x8), or up to three M.2 slots with x4 bandwidth, leaving the remaining lanes for other onboard controllers.
Holding it all together is AMD InfinityFabric, a high-performance interconnect which connects two quad-core CCX units within a "Summit Ridge" dies, and the two "Summit Ridge" dies themselves on the Threadripper MCM. The interconnect keeps memory latency under 133 ns for a core to address the "farthest" memory (DIMMs controlled by the neighboring die. And is energy-efficient in that it consumes 2 pico-Joules per bit pushed. Threadripper features an inter-die, bi-directional bandwidth of 102.22 GB/s.
View at TechPowerUp Main Site
To the hardware, four 8 GB DDR4 memory modules populating the four memory channels of a Ryzen Threadripper chip is seen as 16 GB controlled by each of the two "Summit Ridge" dies. To the software, it is a seamless block of 32 GB. Blindly interleaving the four 8 GB memory modules for four times the bandwidth of a single module isn't as straightforward as it is on the Core X, and is fraught with latency issues. A thread being processed by a core on die-A, having half of its memory allocation on memory controlled by a different die, is hit with latency. AMD is overcoming this by treating memory on a Ryzen Threadripper machine like a 2-socket machine, in which each socket has its own memory.
Software needs to be optimized to see Threadripper as featuring two memory allocation modes - Distributed Mode, and Local Mode. In Distributed Mode, all four memory channels are interleaved with a priority of giving the app access to the highest bandwidth. In Local Mode, the an app loads memory controlled by a particular die first, and only then begins to load memory controlled by the neighboring die. The priority here is latency. In its internal tests, the Distributed Mode yields higher memory bandwidth at the expense of latency (not by much, though); while the Local Mode does the opposite (provides the least latency at the expense of bandwidth).
AMD exhaustively marketed the Ryzen Threadripper as featuring 64 PCI-Express gen 3.0 lanes. They weren't counting the general-purpose lanes from the chipset, because those are gen 2.0. AMD arrived at the number 64 by adding up 32 PCIe gen 3.0 lanes from each of the two "Summit Ridge" silicons, including the 4 lanes typically reserved as chipset-bus (the interconnect between the processor and the AMD X399 chipset). On a typical Threadripper-powered machine 4 out of 64 lanes are permanently allocated as chipset-bus. 32 lanes are wired out as PEG (PCI-Express Graphics) lanes, driving either two graphics cards at full x16 bandwidth, or four cards at x8 bandwidth, each. But wait, that still leaves us with 28 lanes. These can either be used to wire out a third set of PEG slots (one x16 or two x8), or up to three M.2 slots with x4 bandwidth, leaving the remaining lanes for other onboard controllers.
Holding it all together is AMD InfinityFabric, a high-performance interconnect which connects two quad-core CCX units within a "Summit Ridge" dies, and the two "Summit Ridge" dies themselves on the Threadripper MCM. The interconnect keeps memory latency under 133 ns for a core to address the "farthest" memory (DIMMs controlled by the neighboring die. And is energy-efficient in that it consumes 2 pico-Joules per bit pushed. Threadripper features an inter-die, bi-directional bandwidth of 102.22 GB/s.
View at TechPowerUp Main Site
Last edited by a moderator: