- Joined
- Jan 23, 2025
- Messages
- 2 (0.67/day)
Processor | R5 3600 @ 4.3 GHz |
---|---|
Motherboard | B450 Tomahawk Max |
Cooling | Noctua NH-D14 |
Memory | 2x8 GB DDR4-3800 CL14 |
Video Card(s) | Arc A770 LE |
Storage | Samsung 960 EVO |
Benchmark Scores | https://www.3dmark.com/spy/40037915 |
Hello everyone, I'm making this post because following the RTX 5090 launch I was confused about how exactly GDDR7 operates, especially in the PAM3 encoding mode which is used in Blackwell. I've done some reading, and I believe I'm prepared to explain for those of you who are interested, but part of the reason I'm doing this is because I'm hoping someone will correct me if I'm wrong! I of course encourage you to read the spec yourself as it's freely available from JEDEC's website (you just have to make an account). I know it's 340 pages long, but there are only a few pages I'm interested in.
Before we begin, we need to define some terms. First of all is the difference between bit-rate and baud, which becomes an important distinction with GDDR7. Bit-rate is the straightforward number of bits per second that are being transferred. Baud, on the other hand, is the number of symbols per second. For GDDR6, which used NRZ encoding (Non-Return to Zero), the bit-rate and the baud are equal, because each symbol in NRZ can only communicate one bit's worth of information. However, with GDDR7's PAM3 encoding (Pulse-Amplitude Modulation), each symbol transferred can actually communicate one of three distinct values, or one ternary digit. This is awkward because the amount of information carried by a PAM3 symbol is between 1 and 2 bits; there isn't a perfect correspondence.
Secondly, the clock. The clock we normally report for memory speeds is called CK4. CK4 is a clock internal to the memory chip that runs at a quarter the speed of the bus clock. In the case of the 5090, CK4 = 1750 MHz. As a point of comparison, I'll also include the 5700 XT, which uses GDDR6 also at CK4 = 1750 MHz. Because CK4 is equal in both cases, the bus speed of both the 5090 and the 5700 XT is 14 gigabaud (1.75 GHz * 4 * 2). We multiply by four because CK4 is a quarter the bus speed, then again by two because of DDR (Double Data Rate).
This brings me to the fundamental question I was asking myself. The 5700 XT and the 5090 both run at 14 Gbaud, yet the 5700 XT is 14 Gbps and the 5090 is 28 Gbps. How can it be possible that the 5090's bit-rate is double that of the 5700 XT's, when PAM3 encodes less than twice as much data as NRZ? Clearly there's more to PAM3 mode than meets the eye.
This figure from the spec illustrates exactly what I'm talking about:
Each memory chip is thought of as being 32 bits wide. That's how a 5700 XT with a 256 bit wide memory bus can have 8 8Gb GDDR6 chips for a total of 8 GB, or how a 5090 with a 512 bit bus can have 16 16Gb GDDR7 chips for a total of 32 GB.
In the case of GDDR6, each memory chip is split into two independent 16 bit sub-channels. When you send a read request to a GDDR6 channel, you get back a burst of 256 bits of data. That's a burst length of 16 bits across 16 data lines. GDDR7 chips are upgraded in this regard; they're each split into 4 8 bit sub-channels. In order to maintain a burst size of 256 bits when operating in NRZ mode, the burst length is doubled from 16 to 32 bits (GDDR7 can operate in NRZ mode, but it doesn't in the 50 series to my knowledge).
Section 2.9.2 of the GDDR7 spec explains how a burst is PAM3 encoded before being transmitted, and this is where the answer lies. The 256 bits of data are encoded as a set of 176 PAM3 symbols; the details of how that happens are in there. The crucial piece is this: "In PAM3 mode GDDR7 SGRAMs transfer a total of 176 symbols per burst access over 11 data lines (burst length 16 * 11 DQs = 176 symbols)". So, to make up for the fact that PAM3 can't quite encode twice as much data as NRZ, they just add three more data lines! This makes the burst length 16 again, which maintains double the speed of NRZ mode.
I suppose you could argue that the 5090 actually has a 512 * 11/8 = 704 bit bus at 14 Gbaud, and that saying it's 512-bit at 28 Gbps is a convenient way to maintain comparability with older GDDR standards. Either way it still ends up being 1,792 GB/s in total.
I hope some of you found that interesting. And again, if I got anything wrong please feel free to correct me!
Before we begin, we need to define some terms. First of all is the difference between bit-rate and baud, which becomes an important distinction with GDDR7. Bit-rate is the straightforward number of bits per second that are being transferred. Baud, on the other hand, is the number of symbols per second. For GDDR6, which used NRZ encoding (Non-Return to Zero), the bit-rate and the baud are equal, because each symbol in NRZ can only communicate one bit's worth of information. However, with GDDR7's PAM3 encoding (Pulse-Amplitude Modulation), each symbol transferred can actually communicate one of three distinct values, or one ternary digit. This is awkward because the amount of information carried by a PAM3 symbol is between 1 and 2 bits; there isn't a perfect correspondence.
Secondly, the clock. The clock we normally report for memory speeds is called CK4. CK4 is a clock internal to the memory chip that runs at a quarter the speed of the bus clock. In the case of the 5090, CK4 = 1750 MHz. As a point of comparison, I'll also include the 5700 XT, which uses GDDR6 also at CK4 = 1750 MHz. Because CK4 is equal in both cases, the bus speed of both the 5090 and the 5700 XT is 14 gigabaud (1.75 GHz * 4 * 2). We multiply by four because CK4 is a quarter the bus speed, then again by two because of DDR (Double Data Rate).
This brings me to the fundamental question I was asking myself. The 5700 XT and the 5090 both run at 14 Gbaud, yet the 5700 XT is 14 Gbps and the 5090 is 28 Gbps. How can it be possible that the 5090's bit-rate is double that of the 5700 XT's, when PAM3 encodes less than twice as much data as NRZ? Clearly there's more to PAM3 mode than meets the eye.
This figure from the spec illustrates exactly what I'm talking about:
Each memory chip is thought of as being 32 bits wide. That's how a 5700 XT with a 256 bit wide memory bus can have 8 8Gb GDDR6 chips for a total of 8 GB, or how a 5090 with a 512 bit bus can have 16 16Gb GDDR7 chips for a total of 32 GB.
In the case of GDDR6, each memory chip is split into two independent 16 bit sub-channels. When you send a read request to a GDDR6 channel, you get back a burst of 256 bits of data. That's a burst length of 16 bits across 16 data lines. GDDR7 chips are upgraded in this regard; they're each split into 4 8 bit sub-channels. In order to maintain a burst size of 256 bits when operating in NRZ mode, the burst length is doubled from 16 to 32 bits (GDDR7 can operate in NRZ mode, but it doesn't in the 50 series to my knowledge).
Section 2.9.2 of the GDDR7 spec explains how a burst is PAM3 encoded before being transmitted, and this is where the answer lies. The 256 bits of data are encoded as a set of 176 PAM3 symbols; the details of how that happens are in there. The crucial piece is this: "In PAM3 mode GDDR7 SGRAMs transfer a total of 176 symbols per burst access over 11 data lines (burst length 16 * 11 DQs = 176 symbols)". So, to make up for the fact that PAM3 can't quite encode twice as much data as NRZ, they just add three more data lines! This makes the burst length 16 again, which maintains double the speed of NRZ mode.
I suppose you could argue that the 5090 actually has a 512 * 11/8 = 704 bit bus at 14 Gbaud, and that saying it's 512-bit at 28 Gbps is a convenient way to maintain comparability with older GDDR standards. Either way it still ends up being 1,792 GB/s in total.
I hope some of you found that interesting. And again, if I got anything wrong please feel free to correct me!