NVIDIA GeForce RTX 4090 PCI-Express Scaling 161

NVIDIA GeForce RTX 4090 PCI-Express Scaling

(161 Comments) »

Conclusion

The GeForce RTX 4090 "Ada" is a monstrous graphics card that delivered massive generational performance uplifts. When we set out to do this feature article, our main curiosity was on how the RTX 4090 performed with half the bandwidth of its native PCIe setting of PCI-Express 4.0 x16. Forcing the motherboard to limit the processor's PEG interface to Gen 3 (i.e. PCI-Express 3.0 x16), accomplishes this. Gen 3 x16 is also an identical amount of bandwidth to Gen 4 x8. This PCIe mode will be most relevant to those planning to build 13th Gen Intel Core "Raptor Lake" + RTX 4090 machines who also plan to use next-gen PCIe Gen 5 NVMe SSDs. We are speaking with various motherboard manufacturers, and they report that most of their premium Intel Z790 chipset products come with Gen 5 NVMe slots that subtract 8 PCIe Gen 5 lanes from the main PEG interface when the M.2 slot is in use. If you have no M.2 SSD installed in the Gen 5 slot, then the graphics card will run with a full x16 lanes configuration, but when you install an SSD, the GPU slot will be limited to PCI-Express 4.0 x8 (with bandwidth identical to PCI-Express 3.0 x16).

Just to clarify, if you have an Intel motherboard with M.2 Gen 5 and M.2 Gen 4 and choose to install your SSD into the M.2 Gen 4 slot, you will not lose any PCIe bandwidth for the graphics card.

"Raptor Lake" PC builders can breathe a huge sigh of relief—we're happy to report that the GeForce RTX 4090 loses a negligible, inconsequential amount of performance in PCI-Express 3.0 x16 (Gen 4 x8-comparable) mode. Averaged across all tests, at the 4K Ultra HD resolution, the RTX 4090 loses 2% performance with Gen 3 x16. Even in lower, CPU-limited resolutions, the performance loss is barely 2-3 percent. When looking at individual game tests, there is only one test that we can put our finger on, where the performance loss is significant, and that's "Metro: Exodus," which sees its framerate drop by a significant 15% at 4K UHD, with similar performance losses seen at lower resolutions.

While in the past the rule has always been "lower resolution = higher FPS = higher performance loss from lower PCIe bandwidth", in today's dataset we find a couple of games that behave differently. For example, Elden Ring, Far Cry 6 and Guardians of the Galaxy clearly show a bigger loss in performance at higher resolution. It seems that these games, which are all fairly new and use modern DX12 engines, do something differently. I suspect that while the other games transfer a relatively constant amount of data for each frame, and thus get more limited at higher FPS, these titles transfer data that's based on the native render resolution. So if you go from 1080p (8 MB per frame), to 1440p (14 MB per frame) and 4K (32 MB per frame), the increase in per-frame traffic outgrows the traffic reduction due to the lower FPS rate.

We also tested the RTX 4090 in PCI-Express 2.0 x16. This is the kind of bandwidth you get if you try to use the card on older machines with Gen 3 x8 bandwidth for whatever reason (think AMD Ryzen 3000G APUs), or if you accidentally plug the card into one of the electrical Gen 4 x4 PCIe slots of your motherboard. Doing so won't really saturate your chipset bus, as Intel has increased that to DMI 4.0 x8 with the Z690 and Z790 chipsets. This is also the kind of bandwidth comparable to using eGPU boxes (which convert an 80 Gbps Thunderbolt or USB4 interface to a PCI-Express 4.0 x4 slot. Here, the performance loss is a little more pronounced, averaging 8% at 4K UHD resolution, and going as high as 18%, seen with "Metro: Exodus" at 4K.

We also tested the academically-relevant PCI-Express 1.1 x16 bus, or bandwidth that was available to graphics cards some 16 years ago. This is comparable bandwidth to using some of the older eGPU boxes that wire out a 40 Gbps Thunderbolt 3 to a PCI-Express 3.0 x4 slot. Though surprisingly well-contained in some games, the overall performance loss is pronounced, averaging 19% across all tests at the 4K UHD resolution. This can get as bad as 30%—nearly a third of your performance lost, or performance levels comparable to an RTX 3090 Ti @ Gen 4 x16.

If you're happy with what Ryzen 7000 "Zen 4" offers you, go ahead and build a machine. Regardless of the motherboard and chipset, you'll get a Gen 4 x16 slot for your graphics card, and a Gen 5 x4 NVMe slot for your future SSD, which won't eat into the x16 slot's bandwidth. If, however, the 13th Gen Core "Raptor Lake" impresses you, and you only have a Gen 4 NVMe SSD, try to use an M.2 slot that's wired to the chipset or the M.2 slot wired to the processor's Gen 4 x4 NVMe interface, but avoid using the slot that's capable of "Gen 5" bandwidth, as that will cut into the x16 PEG slot's bandwidth, and although inconsequential, you still lose 0-2 percent frame-rates (why even lose that much?). If you get your hands on a Gen 5 NVMe SSD a few months from now, and want to use it with a 13th Gen Core platform; go ahead and use that Gen 5-capable slot, the 2% graphics performance is an acceptable tradeoff for storage performance in excess of 10 GB/s, which could come in handy for those in media production, etc.

If you're into external GPU enclosures, pay attention to the kind of bandwidth those boxes offer. The ones with 40 Gbps upstream bandwidth (Thunderbolt 3), offer downstream bandwidth to the GPU comparable to PCI-Express 1.1 x16, which means you stand to lose a third of the performance. Some of the newer eGPU enclosures in development are being designed for 80 Gbps interfaces such as Thunderbolt 4 or USB4—comparable bandwidth to PCI-Express 2.0 x16 or 3.0 x8. Here, your performance loss drops to within 15-20%, which is still somewhat acceptable.
Discuss(161 Comments)
View as single page
Jul 17th, 2024 14:42 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts