# GeForce GTX 980 PCI-Express Scaling



## W1zzard (Oct 30, 2014)

PCI-Express x16 3.0 is well established in the market, and the majority of gamers are using the interface. But what happens if you end up in a slot-bandwidth-constrained situation? We are testing NVIDIA's latest GeForce GTX 980 flagship in 17 games, at four resolutions, including 4K, to assess what performance to expect.

*Show full review*


----------



## Hilux SSRG (Oct 31, 2014)

Thanks for doing an update W1zzard.


----------



## kikicoco1334 (Oct 31, 2014)

very nicely done dude! it was a really good read


----------



## TheDeeGee (Oct 31, 2014)

Interesting to see that 3.0 8x is sometimes faster than 3.0 16x.


----------



## yogurt_21 (Oct 31, 2014)

idk it seems extremely odd on how dynamic and ryse, wow, and wolfenstein differences are. It really seems like there is a frame limiter detecting the speed of the bus and adjusting the limit accordingly.


----------



## Delta6326 (Oct 31, 2014)

Nice review still no real difference.

Wow big boost from WoW MoP to WoW WoD. 161FPS to 231FPS at 1080p


----------



## dj-electric (Oct 31, 2014)

It's funny that even today people will prefer a 5930K over a 5820K in a dual-GPU system over basically nothing.


----------



## BiggieShady (Oct 31, 2014)

I get the Id Tech 5 engine with it's constant mega texture streaming, but I don't get what Ryse has to move over the PCIE bus other than draw calls. All resources that kind of games need are usually preloaded in VRAM at level loading.


----------



## RealNeil (Oct 31, 2014)

Thanks for the post W1zzard, it was a good read.


----------



## W1zzard (Oct 31, 2014)

BiggieShady said:


> All resources that kind of games need are usually preloaded in VRAM at level loading.


Not anymore



yogurt_21 said:


> It really seems like there is a frame limiter detecting the speed of the bus and adjusting the limit accordingly.


I see no mechanism how a game could do that (detecting PCIe bandwidth is not trivial, I know from GPU-Z). Also why would a game do that, and why would a gamedev invest time for it


----------



## Avrion (Oct 31, 2014)

Great article. 
As someone with a dual-gpu card, a 7990, is there any chance you could benchmark just a few games with a dual-gpu card?
The reason why I'm asking, is that I'm still running a x58 system with an overclocked i7 920 @ 4.2ghz but it's still pcie 2.0.


----------



## Petey Plane (Oct 31, 2014)

Nice article, but it does prove that it's pointless to waste money on a socket 2011 platform for a gaming machine, or, that 3 way 8x SLI on a Z97 platform is going to only show a 0-5% decrease in performance for a CPU, RAM and MOBO setup that costs at least 50% less.  In fact, that loss in performance would likely be mitigated by the faster clock speeds native to the Devil's Canyon chips.

Socket 2011 gaming rigs are for people with more money than sense.


----------



## Avrion (Oct 31, 2014)

Petey Plane said:


> Socket 2011 gaming rigs are for people with more money than sense



Strictly speaking about pci-e lanes, yes but those 4 or 8 extra threads of a 2011v3 chip might come in handy in the future, quite a few game engines multithread pretty well already.


----------



## General Lee (Oct 31, 2014)

I wonder if AMD cards would behave differently? I was thinking of buying the next high end card from them for my pci-e 2.0 board and was just thinking about this a few days ago. If new games like Ryse and Wolfenstein start showing a difference it might be finally time to start planning for an update for my 2500K.


----------



## HM_Actua1 (Oct 31, 2014)

Excellent article guys! thanks!


----------



## Nosada (Oct 31, 2014)

so much for "ZOMG YOU NEED 3.0 x16 ON NEW CARDS TO GET MAX PERFORMANCE, x8 IS ONLY HALF FPS!!!!!!!11111ONEELEVEN'


----------



## W1zzard (Oct 31, 2014)

General Lee said:


> I wonder if AMD cards would behave differently? I was thinking of buying the next high end card from them for my pci-e 2.0 board and was just thinking about this a few days ago. If new games like Ryse and Wolfenstein start showing a difference it might be finally time to start planning for an update for my 2500K.





Avrion said:


> Great article.
> As someone with a dual-gpu card, a 7990, is there any chance you could benchmark just a few games with a dual-gpu card?
> The reason why I'm asking, is that I'm still running a x58 system with an overclocked i7 920 @ 4.2ghz but it's still pcie 2.0.


I have no plans for any other PCIe scaling tests, not until new cards are released from AMD.


----------



## krimetal (Oct 31, 2014)

First page:
"While PCI-Express 1.0 pushes 250 MB/s per direction, PCI-Express 2.0 pushes 500 MB/s, and PCI-Express 3.0 doubles that to 1 GB/s. While the resulting absolute bandwidth of PCI-Express 3.0 x16, 32 GB/s, might seem like overkill, the ability to push that much data per lane could come to the rescue of configurations such as 8-lanes (x8) and 4-lanes (x4)."


PCI-Express 3.0 at 16x has a ~16GB/s bandwidth, not 32GB/s


----------



## newtekie1 (Oct 31, 2014)

W1z is right, it's 32GB/s.

1GB/s x 16 x 2 = 32GB/s.

PCI-E is a duplex connection, so each lane is 1GB/s in two directions, so the total bandwidth is 2GB/s per lane(1GB/s in each direction).  So the total bandwidth for an x16 3.0 slot is 32GB/s.


----------



## RealNeil (Oct 31, 2014)

xenocea said:


> Thank you for this article. It makes me feel more at ease that my i7 2700k which only supports x16 2.0 is not going to restrict current high end cards.



I don't get any restriction with my i7-2600 System. I have a pair of R9-280X-OC cards in it.
I just bought a third 280X-OC card, but I'll have to swap out my motherboard to run Tri-Crossfire with it.


----------



## GhostRyder (Nov 1, 2014)

Wow great article @W1zzard, its nice to see some formal testing updated with a recent card about PCI-E bandwidth.  Its such an odd subject to get into because there are not many areas that testing to this extent is done to show people when they question it.


----------



## Sasqui (Nov 1, 2014)

∆ IIRC, the story was pretty much the same with AGP...


----------



## Serpent of Darkness (Nov 1, 2014)

@ wizard,

"The most surprising find to me is the huge performance hit some of the latest games take when running on limited PCIe bandwidth. The real shocker here is certainly Ryse: Son of Rome, based on Crytek's latest CryEngine 4. The game seems to constantly stream large amounts of data between the CPU and GPU, taking a large 10% performance hit by switching to the second-fastest x16 3.0 configuration. At x4 1.1, the slowest setting we tested, performance is torn down to less than a third, while running lower resolutions! Shocking!

Based on id's idTech5 engine, another noteworthy title with large drops in performance is Wolfenstein: The New Order. Virtual Textures certainly look great in-game, providing highly detailed, non-repeating textures, but they also put a significant load on the PCI-Express bus. One key challenge here is to have texture data ready for display in-time. Sometimes too late, it manifests as the dreaded texture pop-in some users have been reporting.

Last but not least, World of Warcraft has received a new rendering engine for its latest expansion Warlords of Draenor. While the game doesn't look much different visually, Blizzard made large changes under the hood, changing to a deferred rendering engine which not only disallows MSAA in-game, but also requires much improved PCI-Express bandwidth."


All you're really proven is there is a slight gain or loss in certain scenarios, but you don't go into depth as to why PCIe 16x 2.0 performs equal to or better than 3.0.   Doubt it really matters.  Though, I agree with some points of your message, but you aren't really proving any more than a drop or gain in average of what, around 10% in any of the other scenarios.  It seems informative, but also a waste of your own time.  In addition to that, games that are either MMOs or highly-progressed games like Crysis 3, BF4, Wolfenstein 3D, and others, will make better use of 3.0 over 2.0.  One good example of this will probably be Star Citizens in the not to distant future.  I would highly suggest using Planetside 2 and EQN for upcoming benches. For the higher resolutions (above 1080p), you'll probably see a higher use in 3.0 if you enabled more AA at 4k resolutions.

Here's an idea.  Instead of sitting in the Shrine in World of Warcraft, why don't you conduct test during a Garrosh 25 man fight at Ultra Settings.  Tell us what the results are of the PCIe Lane Saturations after that.  I would think that would be more vital information than just staring at a wall to stare at the in-game FPS meter to see how high your FPS can get.  Also, why don't you measure the same games with 2-way, 3-way, and 4-way SLI.  It's not like NVidia has anything to hide right...


----------



## FourtyTwo (Nov 1, 2014)

Great, thorough review! Will become a reference article.


----------



## arbiter (Nov 1, 2014)

Svarog said:


> Interesting to see that 3.0 8x is sometimes faster than 3.0 16x.



I probably within the realm of error you could say something in video slightly diff happened cause the small difference or cpu usage.



General Lee said:


> I wonder if AMD cards would behave differently? I was thinking of buying the next high end card from them for my pci-e 2.0 board and was just thinking about this a few days ago. If new games like Ryse and Wolfenstein start showing a difference it might be finally time to start planning for an update for my 2500K.



I would probably expect AMD cards would yield pretty much same kinda results.


----------



## birdie (Nov 1, 2014)

Sandy Bridge users have nothing to worry about: PCIe 2.0 @16x is more than enough for the GTX980.


----------



## Aquinus (Nov 1, 2014)

birdie said:


> Sandy Bridge users have nothing to worry about: PCIe 2.0 @16x is more than enough for the GTX980.


It seems to me that in most cases 2.0 @ 8 lanes is enough as well. I find that if you /really/ want more PCI-E lanes, the real reason is not to drive more GPUs but to add other components such as a network card, RAID controller, or a PCI-E SSD which is much more common need on workstation and servers than general consumer devices.


----------



## Champ (Nov 1, 2014)

Some people believe and preach everything some would post on a forum. I'm one of those people that need scientific backing.  This was a great test and we need more myth busting.


----------



## msimax (Nov 1, 2014)

i would also like to see multi gpu testing since I've been pondering moving to x99 for my 3 290x to see if it will offer better peformance than 8x 4x 4x in games when using max AA and a move to 4k


----------



## stren (Nov 2, 2014)

Good work but not unexpected given that single GPU is not expected to really be harsh on pcie bw.  Another +1 on SLI testing, that's where PCIE bandwidth suddenly becomes more limited.  The only problem is that the works suddenly gets out of control if you try and be as thorough.  Personally I'd assume anyone running the latest GPUs is also running something recent e.g. pcie3 and so only run

2 way SLI - 4790K vs 4790K + PLX vs 5960X e.g. 8x/8x vs pseudo 16x/x16x vs real 16x/16x
4 way sli - 4790K PLX vs 5960X e.g. pseudo 8x8x8x8x vs real 8x8x8x8x

Of course particularly once you get to 4 way, you're going to need a good OC on the CPU and a really demanding monitor setup to avoid being cpu throttled.


----------



## Brusfantomet (Nov 3, 2014)

Great test, but it does make me wonder how a CF setup with a XDMA engine (290 or 285) will handle the loss of bandwidth.

My expectation is that in titles where there is no difference between 16x gen 3 and 16x gen 2 for the 980 there will be no/minuscule difference for CF 290X, but in titles where the bandwidth mattered it will have a bigger impact. a preliminary test with two games that were bandwidth sensitive and two that were not would give a interesting indication.


----------



## Wark0 (Nov 3, 2014)

Great review. Do you plan to do the same in SLI ?


----------



## Aquinus (Nov 3, 2014)

Wark0 said:


> Great review. Do you plan to do the same in SLI ?


I would expect the results to be the same if both cards are running at the same speed for any given speed but, I'm curious as well.


----------



## jihadjoe (Nov 3, 2014)

Its always funny when people cite high resolutions like 3x1080p, 1440p, or 4k as a reason to get more PCIe lanes, when it's actually the lowest resolutions that see the most scaling.


----------



## Ubersonic (Nov 5, 2014)

Delta6326 said:


> Nice review still no real difference.
> 
> Wow big boost from WoW MoP to WoW WoD. 161FPS to 231FPS at 1080p



It's not actually a good thing, the extra FPS at high settings is mostly due to Blizzard dropping MSAA in favour of FXAA or CMAA, so basically the game runs faster but looks a lot worse, the only way to get decent AA currently is to set the game to DX9 mode then force MSAA through drivers, which results in worse FPS than MoP as it's using the DX9 path instead of the more efficient DX11 one, hopefully they will end up adding SSAA as a lot of people are raging about this on the forums


----------



## Delta6326 (Nov 5, 2014)

Ubersonic said:


> It's not actually a good thing, the extra FPS at high settings is mostly due to Blizzard dropping MSAA in favour of FXAA or CMAA, so basically the game runs faster but looks a lot worse, the only way to get decent AA currently is to set the game to DX9 mode then force MSAA through drivers, which results in worse FPS than MoP as it's using the DX9 path instead of the more efficient DX11 one, hopefully they will end up adding SSAA as a lot of people are raging about this on the forums


Oh interesting, I haven't played since WoD.


----------



## Deleted member 138597 (Jan 3, 2015)

Dj-ElectriC said:


> It's funny that even today people will prefer a 5930K over a 5820K in a dual-GPU system over basically nothing.


 Correct, even I set out for 5930K and was like "OOO! It has 40 lanes, so it can run dual GPUs in x16. You know, x16 is da bestest for GPUs". But seeing this review, I now even consider 4790K in the list (and probably will cross out the 5930K).


----------



## Deleted member 138597 (Jan 3, 2015)

@W1zzard

You didn't do any synthetic benchmarks though.

Is this result limited to Maxwell only or other architectures too, like Kepler, Fermi and AMD GPU architectures?

And as other people are asking, does SLI have a impact on the result? Why Nvidia chose to limit SLI to at least PCI-E 3.0 x8 and AMD on the other hand PCI-E 3.0 x4?


----------



## msimax (Jan 4, 2015)

Shamonto Hasan Easha said:


> Correct, even I set out for 5930K and was like "OOO! It has 40 lanes, so it can run dual GPUs in x16. You know, x16 is da bestest for GPUs". But seeing this review, I now even consider 4790K in the list (and probably will cross out the 5930K).


the 5930k comes in handy if you have multiple add on cards instead of a dead 4th slot


----------



## CrAsHnBuRnXp (Jan 19, 2015)

Delta6326 said:


> Nice review still no real difference.
> 
> Wow big boost from WoW MoP to WoW WoD. 161FPS to 231FPS at 1080p


And yet people still believe that WoW is CPU limited. But its no longer the case which is clearly proven here.


----------



## Zkanuck (Mar 20, 2015)

+1 on SLI & Crossfire testing.

I don't think you'd have to go to all the trouble to test all different modes - just the only ones that matter between Z97 vs X99 - SLI or crossfire 2-way at 8x8, 16x8, 16x16 and  3-way or 4-way configs with Z97(PLX) vs X99 28/40 lane CPUs.


----------



## cerealkeller (Oct 29, 2016)

It would be awesome if you could re-do this benchmark with a set of 1080s.  The word is that there is actually a measurable difference with the GTX 1080s on PCI-E 3.0 x16 compared to slower configs

This is a great thread though, I've kept it in my bookmarks for a long time and have referenced it on occasion.


----------



## W1zzard (Oct 29, 2016)

cerealkeller said:


> It would be awesome if you could re-do this benchmark with a set of 1080s.  The word is that there is actually a measurable difference with the GTX 1080s on PCI-E 3.0 x16 compared to slower configs
> 
> This is a great thread though, I've kept it in my bookmarks for a long time and have referenced it on occasion.


Thanks for reminding, I'll try to find time, won't be that soon though, doing a full rebench right now with new games and drivers, but after that maybe.


----------



## Aquinus (Oct 29, 2016)

W1zzard said:


> Thanks for reminding, I'll try to find time, won't be that soon though, doing a full rebench right now with new games and drivers, but after that maybe.


Definitely something to consider with modern mainstream platforms given the number of PCI-E lanes but, even my 3820 will do PCI-E 3.0 and 40 lanes, which is honestly a lot of I/O capacity (even more so if you consider how old my machine is now.) I honestly don't think that PCI-E 3.0 @ 8x is any better than 16x and if 16 is more than 5% faster than 8, I'll be surprised. Once things like textures have already been loaded, what GPUs crave from the CPU is latency until bandwidth becomes the limiting factor. Of course, the application itself and how it's implemented plays a big part in this but, generally speaking, it seems to  be the case that PCI-E demand doesn't scale at the same rate as performance itself does.

Either way, I look forward to an updated PCI-E scaling review. You've always done a great job with this particular review, @W1zzard... not to set the bar high or anything.


----------



## stinzza (Apr 9, 2017)

thanks for as always a lovly read

just wondering if now programs and games will start to multithread more since amd finally enterd the
scene, and that the 512mb/s bus of the 16x 2.0 will not be enough to handle more info over the bus handeled by the new cpus and gpu?

im sitting on an old 2500k z68 chipset with 1 16x 2.0 and 1 "electric" 4x pcie lane that goes through the chipset then to the cpu instead of pcie16x lanes straight to the cpu,
hence lag.. it says that in the rev 2.0 of my motherboard version, if one grafics card in populated with an i7 3770k cpu, it conforms
to 16x 3.0.. is that real 16x 3.0 or will latancy be way higher since the pcie 4x lane goes throuh the chipset then the cpu?

should i then bye a 2700k that is higher binned, easyer with heat and overclock and 2.0, or opt for ivy 3770k and pcie 3.0(?) and a bit faster clock for clock? and say a titan xp...


----------

