NVIDIA GeForce RTX 4090 PCI-Express Scaling with Core i9-13900K 101

NVIDIA GeForce RTX 4090 PCI-Express Scaling with Core i9-13900K

(101 Comments) »

Conclusion

This update is a refresh of the GeForce RTX 4090 PCI-Express Scaling article that we did on October 2022, using our Ryzen 7 5800X GPU test system. Since then we've upgraded to a Raptor Lake Core i9-13900K, which offers considerably better CPU performance to ensure that it doesn't bottleneck even the fastest graphics cards available.

What makes this kind of PCIe bandwidth testing even more relevant today is the fact that users of Gen 5 SSDs on Z790 motherboards will have to sacrifice eight GPU lanes to support the fastest SSD speeds. When a M.2 SSD is installed in the CPU-attached Gen 5 slot, the graphics card will run at x8 instead of x16, with the remaining eight lanes being re-routed to the SSD. Just to clarify, if you have an Intel motherboard with M.2 Gen 5 and M.2 Gen 4 and choose to install your SSD into the M.2 Gen 4 slot, you will not lose any PCIe bandwidth for the graphics card.

For this round of testing we've listened to your feedback and added Minimum FPS (1% low) reporting, and the benchmark results also include data for "ray tracing enabled," which, generally speaking, is a little bit more demanding on both the CPU and PCI-Express bandwidth available, because there's significant CPU work involved to update and sync the GPU's BVH acceleration structure used during ray tracing.

Even with the fastest CPU available for Intel, the GeForce RTX 4090 loses a negligible, inconsequential amount of performance in PCI-Express 4.0 x8 mode. Averaged across all tests, at the 4K Ultra HD resolution, the RTX 4090 clocks in just 2% slower than with Gen 4 x16. Surprisingly, the average difference is fairly constant across all resolutions: 2% for Gen 4 x16 vs x8, 6-7% for x16 4.0 vs x8 3.0. Only for the x16 4.0 vs x8 2.0 matchup there's a different performance loss at 4K than at 1440p/1080p (17% vs 21%). The differences vary greatly between games, but no title is really problematic—at 4K the biggest delta was 8.3% (Far Cry 6), and at 1440p the biggest differences were seen in Elden Ring. Horizon Zero Dawn is often cited as a game that PCI-Express scaling tests must use, but I disagree with that. The game is a bad console port where the devs failed at properly managing their resources, so a lot of data moves across the PCIe bus. While transfers between CPU and GPU memory on consoles are almost free, it is an expensive operation on PC and virtually all game programmers are aware of that, or we'd see much bigger outliers in the 25 games that we've tested.

While in the past the rule has always been "lower resolution = higher FPS = higher performance loss from lower PCIe bandwidth," in today's dataset we find a couple of games that behave differently. For example, Elden Ring, Far Cry 6, Mount and Blade II Bannerlord, Spider-Man Remastered, Guardians of the Galaxy and Watch Dogs Legion show a bigger loss in performance at higher resolution. It seems that these games, which are all fairly new and most use modern DX12 engines, do something differently. I suspect that while the other games transfer a relatively constant amount of data for each frame, and thus get more limited at higher FPS, these titles transfer data that's based on the native render resolution. So if you go from 1080p (8 MB per frame), to 1440p (14 MB per frame) and 4K (32 MB per frame), the increase in per-frame traffic outgrows the traffic reduction due to the lower FPS rate.

We also tested the RTX 4090 in PCI-Express 2.0 x16. This is the kind of bandwidth you get if you try to use the card on older machines with Gen 3 x8 bandwidth for whatever reason (think AMD Ryzen 3000G APUs), or if you accidentally plug the card into one of the electrical Gen 4 x4 PCIe slots of your motherboard. Doing so won't really saturate your chipset bus, as Intel has increased that to DMI 4.0 x8 with the Z690 and Z790 chipsets. This is also the kind of bandwidth comparable to using eGPU boxes (which convert an 80 Gbps Thunderbolt or USB4 interface to a PCI-Express 4.0 x4 slot. Here, the performance loss is a little more pronounced, averaging 6%, which still isn't problematic.

As expected, ray tracing shows that PCI-Express performance matters "more" than with RT disabled, but the differences are really just "more," not a single game's performance falls off a cliff, even with PCIe bandwidth cut by 75%. The same is true for Minimum FPS. Here the differences are bigger, too, but they are roughly in-line with what we've seen for average FPS.
Discuss(101 Comments)
View as single page
Jul 18th, 2024 13:31 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts