Introduction
The main reason for running into this kind of article was with the recent “exclamations” about the GTX 960’s 128bit wide memory interface. The GPU offers a 112GB/s memory bandwidth, and many believe that this narrow interface will not provide enough memory bandwidth for games. This card is primarily aimed at the midrange crowd, wanting to run modern titles (both AAA and independent), at a native resolution of 1080p.
Memory bandwidth usage is actually incredibly difficult to measure, but it’s the only way of making known once and for all, what the real 1080p requirement is for memory bandwidth. Typically using GPU-Z, what we have available to us is “Memory Controller Load”. This is a percentage figure does not accurately measure the total GB/s bandwidth that is being used. The easiest way to explain it is it acts similar to the percentage CPU utilisation Task Manager shows. Another example would be GPU Load, wherein various types of load can cause the same percentage figure measurement, but can have very different power usage readings, leading us to assume one 97% load can be much more intensive than another. Something else that only NVidia cards allow measurements of is PCIe Bus usage. AMD has yet to allow such a measurement, and thanks to
@W1zzard for throwing me a test build of GPU-Z, I could run some Bus usage benchmarks. I had a fair few expectations from the figures, but the results I got were a little less than expected.
Something I need to make clear before you read on, my memory bandwidth usage figures (GB/s)
are not 100% accurate. They have been estimated and extrapolated using performance percentages of the benchmark figures I’ve got, as such, most of this article will be relying largely on those estimations.
Only a fool would consider it as fact. NVidia has said themselves that Bus usage is wholly inaccurate, and most of us are aware that Memory Controller Load (%) cannot represent the exact bandwidth usage (GB/s) with total precision. All loads are different.
All of the following benchmarks were run 4 times for each game on each resolution for accuracy. Every preset is set to High where Very High is unavailable. The only graphical alteration to my video settings was turning off VSync and Motion Blur.
Choices of Games
I’ve chosen to run with 4 games which I felt represented a fair array of game types. For CPU orientated, I’ve run with Insurgency. This is Source engine based, highly CPU intensive, and should cover most games running that sort of requirement. It has a reasonable VRAM requirement, but is overall quite light on general GPU usage, so it should stress the memory somewhat.
To represent the independent games, while also holding a high VRAM requirement, I’ve run with Starpoint Gemini II. This game has massive VRAM requirements, and is quite a GPU heavy game.
I’ve chosen two other games for the AAA area, one very generalised game, and one that boasted massive 4GB VRAM requirements for general high res play. Far Cry 4 felt like a good representative for the AAA genre that has balance in both general performance of the CPU, GPU, and moderate VRAM requirements. Middle Earth: Shadow of Mordor was my choice for the AAA genre to slaughter my VRAM and hopefully put my GPU memory controller and VRAM to the test.
*****
1440p – Overall Correlations
I’ve started off with benchmarks running on 1440p to clearly identify what kind of GPU power is required for this resolution. I understand that the 112GB/s bandwidth we’re aiming for is designed to cope with 1080p, but hopefully you’ll see just what you need.
First off, we’ll take a look at all four games, and the performance of the GPU Core(%), Memory Controller Load(%), and VRAM Usage(MB). (
The following data has been sorted by “Largest to Smallest” PCIe Bus Usage).
What I expected to see was the Memory Controller Load to be in direct correlation with VRAM usage. What we can clearly see here is that Memory Controller Load is in absolute correlation with the GPU Load. VRAM usage seems to make little difference to the way either performs except in edge cases.
Next up, we’ll look directly at the correlation between PCIe-Bus Usage(%) and VRAM usage(MB).
Besides the Insurgency graph, it appears that there is no direct correlation between the PCIe Bus and VRAM. I had to run these benchmarks multiple times, as I was a little confused that the PCIe Bus usage was always so low, or in some cases, idle.
Next let’s look at the overall correlation between Memory Controller Load (%) and the PCIe Bus usage (%)
You can see there’s literally no particular change in PCIe Bus usage overall. When the Memory Controller Load peaks, the data for the PCIe Bus shows no reaction to the change.
Finally let’s take a look at the individual Memory Bandwidth Usage (GB/s) figures overall.
Note, these figures are not 100% accurate, and follow the 100% = 224GB/s rule.
We can see in most cases the Memory Bandwidth usage (GB/s) is actually extremely erratic over the period. Shadow of Mordor showed the only real case where the usage was relatively persistent throughout the benchmark. You’ll also probably notice that it hits a rather high figure at peak load.
Let’s look at what these figures equate to overall. For this I’ve used the 95th percentile rule to remove freak results from both the low and high end of the scale.
Note, these figures indicate bandwidth with Maxwell compression methods (~30%) in mind.
We’ll see most of these figures are relatively high, though none manage to reach the limit of my 970’s 224GB/s bandwidth available at any time. The only exception is Starpoint Gemini II, which despite eating VRAM when available, didn’t appear to put much load on the Memory Controller. If we took the Memory Controller Load figure as a good representation of actual bandwidth usage, the 970 is never really in danger of being overwhelmed. We can clearly see however that the peak figures would be too much for a 960’s 112GB/s available bandwidth. If we ran by the average figures instead, the 960 could cope with a couple of the games, but it would still choke on the big titles during average gameplay. We can’t discount the peak figures though, so you’d certainly see issues at the 1440p resolution.
For the sake of estimation and sheer curiosity, here is what the estimated Memory Bandwidth Usage would be
if Maxwell was exactly 30% efficient at compression, without the compression.
The 970 would still cope, except in peak cases during Shadow of Mordor, where the required bandwidth exceeds that of the available 224GB/s.
Obviously all these figures are mere estimates, so the actual cases may vary in real world examples.
*****
1080p – Overall Correlations
These are the main benchmarks we’ll be looking at for our 112GB/s bandwidth limit on the 960. The card is aimed at this resolution, so hopefully we’ll see some post-Maxwell compression figures dropping us in that area.
Let’ take a look at the overall figures for this, and look for similarities between 1440p correlation (or lack of). The previous charts showed Memory Controller Load linked with GPU Load and not VRAM Usage.
This surprised me a little bit. If you look relatively closed at the peaks and drops, all three measurements appear to correlate rather well at this resolution. The VRAM drops actually appear to associate with the drops in Memory Controller Load as well as GPU Usage. Certainly an interesting turn of events.
Next let’s take a look at the PCIe Bus usage and VRAM. There were no direct correlations in the 1440p benchmarks.
This time things look a little more interesting, but unexplained. Far Cry 4 shows no real correlation at all. The rest of the games however seem to show a drop in PCIe Bus usage every time there’s a drop in VRAM usage, before the VRAM usage steadily rises before dropping again.
Next up is the Bus and Memory Controller figures.
This time again, no real correlation. A similar result to the 1440p benchmark. No unexpected surprises there.
Here are the figures you’re more interested in however. Let’s take a look at the overall Memory Controller Usage over the benchmarks. This should show us approximate (
again inaccurately) how much bandwidth 1080p seems to scream for.
This time Shadow of Mordor follows suit and starts to become a little more erratic along with the rest. We can see some interesting peaks in usage, as well as a general idea of what the average is overall. The plateau at the beginning of Far Cry 4 is particularly interesting.
Next, here are those overall figures in a more pleasant representation. Here we can see exactly what the figures are. Again, using the 95th percentile rule for these results to remove the serious spikes,
these results are not 100% accurate.
Shadow ofMordor slaughters all, even in the average benchmark. Far Cry 4 scrapes the barrel in the average figures, but again, the peak proves to be above the 112GB/s mark. The Source engine game as well as SPG2 however prove to be completely viable solutions.
Here’s what the results would look like without the
estimated ~30% Maxwell compression.
Shadow of Mordor peaks within percentile points of the available bandwidth on a 770 (224GB/s), but all other games remain below to 200GB/s mark.
Conclusion
Something you have to bear in mind when looking at these figures (besides the fact they are most certainly not 100% accurate), is that it’s plausible memory bandwidth acts similar to VRAM. There are many occasions where people can see VRAM usages in an average game hit a certain mark, let’s say 1800MB on a 2GB card. Other people, running the same settings, but with a 4GB card may see usages above and beyond 2GB, almost as though the game is using the available VRAM simply because it can. Is it possible that games utilise memory bandwidth in a similar fashion? Possibly, but we don’t really know. It could be possible that the same benchmark, when run on a 770 which shares identical bandwidth with the 970 (224GB/s) may provide higher results due to the lack of compression, but prove to be less than the 30% assumption. Maybe the video card wouldn’t “stretch it’s legs” and would be more conservative with bandwidth usage if it had less available. It’d be an interesting benchmark to see.
If we treated these bandwidth figures as a reference (
which you most certainly should not), we could then assume that the GTX 960’s 128bit wide memory interface simply does not provide enough bandwidth to play AAA titles at Very High (or High where not available) and Ultra Presets on 1080p. If we went by average figures, it would get by OK, but struggle at peak loads. In terms of Independent titles, along with Source engine games, it’d do just fine. It may be the case that at 1080p turning off a little eye candy would put the game within the 112GB/s limit and remove that bottleneck in AAA titles.
The main issue is that more and more AAA titles may follow the example of games like Shadow of Mordor and require more and more VRAM and eat up more bandwidth. If things plateau at that sort of figure, perhaps the 112GB/s would cope. In the event AAA titles became more advanced in their fidelity, the 960 might find itself quickly outpaced by rivals offering a more sensible bandwidth ceiling.
Finally, I’ll leave you again with the same bold statement, that the (GB/s) figures in these benchmarks are merely estimates of a largely inaccurate form of extrapolating memory bandwidth usage figures. By no means should you base a purchase on these, as the percentage representation of memory bandwidth is open to extremely broad interpretation.
If anyone would be so kind as to run a benchmark of these games on a 770 and send the log over to me, I can more accurately show bandwidth usage BEFORE Maxwell compression. I’d also be delighted to see user’s benchmarks on GTX 960’s to prove these estimates horribly wrong.