After a long benchmarking session, we finally have a clear picture of what to expect when NVIDIA's GeForce GTX 980 is running in a PCIe bandwidth-limited situation. Adding the latest titles to our benchmarking suite has also revealed some interesting results, but let's not get ahead of ourselves.
For the majority of games, there is no significant performance difference between x16 3.0 and x8 3.0 (and x16 2.0, which offers the same bandwidth). The average difference is only 1%, which you'd never notice. Even such bandwidth-restricted scenario as x16 1.1 or x8 2.0, offered by seriously old motherboards, only saw a small difference of around 5%. The same goes for x4 3.0, which is the bandwidth offered by the x4 slots on some recent motherboards. It's worth noting here that not all x4 slots are wired to the CPU. Some of the cheaper motherboards have their PCIe x16 (electrical x4) slots wired to the chipset instead of the CPU, which could severely clog the chipset bus (the connection between the CPU and the chipset, limited to a mere 2 GB/s on the Intel platform). Refer to the block-diagram in your motherboard's manual.
Real performance losses only become apparent in x8 1.1 and x4 2.0, where the performance drop becomes noticeable with around 15%. We also tested x4 1.1, though of more academic interest, and saw performance drop by up to 25%, an indicator that PCIe bandwidth can't be constrained indefinitely without a serious loss in performance.
Contrary to intuition, the driving factor for PCI-Express bus width and speed for most games is the framerate, not resolution, and our benchmarks conclusively show that the performance difference between PCIe configurations shrinks at higher resolution. This is because the bus transfers a fairly constant amount of scene and texture data - for each frame. The final rendered image never moves across the bus except in render engines that do post-processing on the CPU, which has gotten much more common since we last looked at PCIe scaling. Yet the reduction in FPS due to a higher resolution is still bigger than the increase in pixel data even then.
The most surprising find to me is the huge performance hit some of the latest games take when running on limited PCIe bandwidth. The real shocker here is certainly Ryse: Son of Rome, based on Crytek's latest CryEngine 4. The game seems to constantly stream large amounts of data between the CPU and GPU, taking a large 10% performance hit by switching to the second-fastest x8 3.0 configuration. At x4 1.1, the slowest setting we tested, performance is torn down to less than a third, while running lower resolutions! Shocking!
Based on id's idTech5 engine, another noteworthy title with large drops in performance is Wolfenstein: The New Order. Virtual Textures certainly look great in-game, providing highly detailed, non-repeating textures, but they also put a significant load on the PCI-Express bus. One key challenge here is to have texture data ready for display in-time. Sometimes too late, it manifests as the dreaded texture pop-in some users have been reporting.
Last but not least, World of Warcraft has received a new rendering engine for its latest expansion Warlords of Draenor. While the game doesn't look much different visually, Blizzard made large changes under the hood, changing to a deferred rendering engine which not only disallows MSAA in-game, but also requires much improved PCI-Express bandwidth.
Overall, I'm a bit worried that game developers who are currently enjoying the technical advancements of the new consoles will neglect to properly optimize their ports for the PC, which not only requires looking at controller-input configurations, but also makes solving more technical challenges, like optimized PCI-Express bandwidth usage, a requisite.