But TLDR; Yes, it was definitely a bottleneck, and swapping to a board that has both slots at PCIE 3.0 16x gave me i think on average an extra 25-40%ish fps over the old motherboard. So thank you topmytseries5 for your initial findings on PCIE bandwidth bottlenecks. It seems games definitely use more lane bandwidth than synthetic benchmarks, as games saw a performance boost, whereas synthetic benchmarks were more or less the same. Though I am curious as to why Rise of the Tombraider has more than 100% scaling for me now haha.
Nice video, although I can't agree on the findings on the first two examples.
Tomb raider: 100% and more SLI scaling simply isn't technically possible. There must be something radically different with either the example on the left or that one on the right. Unfortunately that excludes this example for consideration, until these numbers make sense.
RE2: Something similar here: The example in the middle clearly does not use the 2nd card at all, while that one on the right does perfect SLI. That can't be just because of PCIe bus width. It has to be because of some differences in setup. Again, this example does not qualify for consideration.
So we're left with the final two examples, where the numbers all make sense, and should be counted towards the final result, which is in line with my experiences running a PCIe 2.0/x4. In the majority of games, the performance hit is negligible. But once in a while you come across a game that has the habit of issuing huge number of draw calls across the bus, so the performance takes a significant hit.
Only in these relatively few cases, upgrading PCIe slots will help.
I said it before, but a surefire way to check whether the PCIe bandwidth is an issue, is monitoring 'Bus Usage' in nVidia Inspector.
Every percentage used for Bus Usage is no longer available for actual GPU tasks, so you want to see values lower than ~10%.