Conclusion
We always make it a point to test the PCIe scaling of a new graphics architecture here at TPU. PCI-Express Gen 4 has had a good run since its debut in 2019 with the Radeon RX 5000 series. NVIDIA implemented Gen 4 for its RTX 30-series Ampere and RTX 40-series Ada architectures; and now transitions to PCI-Express Gen 5, while retaining backwards compatibility with all older generations. PCIe Gen 5 has been widely available on the Intel platform, going back to 12th Gen Core "Alder Lake" from 2021, however, not all LGA1700 motherboards had Gen 5 x16 slots initially. With 13th Gen "Raptor Lake" and 700-series chipset, Gen 5 became more widespread. This is roughly the time when AMD introduced Socket AM5, Ryzen 7000 series, and with them, its first Gen 5 platform. With the new GeForce RTX 5090 and Blackwell, we finally have our first high-end PCIe Gen 5 x16 GPU.
x16 4.0
Remember what we said about PCI-Express 5.0 x16 availability being spotty on mid-range platforms? This makes it crucial for us to see how the RTX 5090 performs on PCI-Express 4.0 x16. Since LGA1700 processors don't put out a dedicated Gen 5 NVMe connection, Intel's motherboard partners resorted to wiring out M.2 Gen 5 NVMe slots by subtracting lanes from the Gen 5 x16 PEG interface, reducing it to PCI-Express 5.0 x8, which is exactly the same bandwidth as PCI-Express 4.0 x16—an invaluable data-point for the RTX 5090. We are happy to report that performance loss in this mode is well contained, and you lose about 1% performance, across all three resolutions. There are barely any outliers to report about from our set of game tests.
x8 3.0
The next data point is PCI-Express 3.0 x16, or the same bandwidth as PCI-Express 4.0 x8, or PCI-Express 5.0 x4. This is where we begin to see the RTX 5090 strain the interface, losing 4% of performance on average, but there's a catch. The 4% delta is only observed in 1080p and 1440p, resolutions where the bottleneck tilts closer to the CPU—in our case, an AMD Ryzen 7 9800X3D which is the fastest gaming CPU. It's only with 4K Ultra HD that the performance loss is closer to 2% on average. Our testing over the years has shown us that framerate is what drives PCIe bandwidth (in most games), and you're getting more frames at lower resolution, which is why the impact of PCIe bandwidth is bigger. We're not sure who is still using PCIe Gen 3—perhaps those still on "Skylake" machines, or using Ryzen Socket AM4 processors on older AMD 400-series motherboards. If you're pairing such a machine with a $2,000 RTX 5090, then congrats on the performance uplift.
and slower
We now get into a few academically relevant PCIe configurations, which you'll likely only end up with if you happened to mess up your PC assembly, and installed the RTX 5090 on the wrong kind of PCIe x16 slot, such as the x4 slot that tends to be wired to the chipset. In modern platforms such as the AMD 600-series or Intel 600-series, chipset-attached slots that are x16 physically tend to run at Gen 4 x4. Here the RTX 5090 faces the axe, losing 11% at 1080p on average, 10% at 1440p, and 6% at 4K Ultra HD. In some game tests such as Cyberpunk 2077 at 1080p, it ends up slower than even an RTX 4090 in its native configuration.
This made us wonder how bad can it get for the RTX 5090 if we ran it at speeds comparable to PCI-Express 1.1 x16 or 2.0 x8 or 3.0 x4, or 4.0 x2, or 5.0 x1, on other words, a single Gen 5 lane. Not as bad as we thought! The RTX 5090 loses more than a quarter of its performance, and falls behind even the RTX 4080 at 1080p and 1440p, clocking 74% and 76%, respectively, averaged across all our games. This even pinches in at 4K, where it loses 16% performance.
Ray Tracing
Ray tracing workloads post similar performance losses as our classic raster 3D game tests, although, we noticed that you lose absolutely no performance with PCI-Express 4.0 x16 or 5.0 x8, not even that 1%. This is probably because the performance becomes more GPU limited with the addition of the ray tracing stack, and so there are fewer frames rendered, which moves less data across the bus—similar to conventional raster 3D at 4K Ultra HD.
DLSS Upscaling and Frame Generation
DLSS is a more complex workload, using AI machinery in the GPU, and possibly some CPU resources too. Every time the CPU gets involved, the relevant data has to be copied from the GPU to the system's main memory—over the PCIe bus. That's why in this article we've specifically looked at the PCIe performance impact with DLSS 4, too. Our results confirm that DLSS Frame Generation is only rather lightly impacted by PCIe bandwidth. Upscaling on the other hand sees a stronger loss in performance, but it's not problematic in any way. We're talking about a few percentage points here. These results do confirm that even when DLSS is used, you won't run into trouble with a slower PCIe setup.
All in all, if you're on a Gen 4 x16 platform, relax, you lose close to nothing with an RTX 5090, and any performance delta from our numbers is only because your processor is slower than a 9800X3D. This also applies to the Alder Lake and Raptor Lake crowd that plans to use an RTX 5090 on a Gen 5 x16 slot that's running at x8 because a Gen 5 NVMe SSD is eating away the remaining lanes. If you're on older Gen 3 x16 or Gen 4 x4 (i.e. you accidentally installed the card in the wrong slot), it would be hard to spot a noticeable performance loss, although there is one to be had.
TechPowerUp GPU-Z is a nice way to know if you've messed up your installation. As for the academically-relevant PCIe settings, well, these are our numbers, your RTX 5090 won't crash, but a quarter of its performance will be lost.