I get that 3.0 is faster than 2.0 but since all my equipment is 3.0 I need more lanes and even if they made it 4.0 my 3.0 equipment wouldn't perform any faster with insufficient lanes. It seems I still don't understand the situation though. Let's say I got a 8700k CPU. On intel's website - it says I have 16PCIe lanes - some web sites it says there are 28 and some say 40. Some people are talking about CPUs having more that are for the motherboard. I don't know how many more there are for the motherboard which go to the slots (if any) and how do I know they aren't used up for resources like USB ports, onboard sata and raid controllers, on board sound cards, on board wifi, on board nic ports. My guess is those alone might be using at least a dozen pcie lanes. So again unless some manufacturers tell us how many lanes are available fixed or otherwise for the slots it seems like a crap shoot at best.
That is why it is important that the chipset provide 24 lanes. The total Intel platform provides 40 lanes. 16 are attached directly to the CPU(or more specifically the northbridge inside the CPU die). The other 24 come from the PCH(southbridge) chip on the motherboard, this chip is then linked to the CPU with a PCI-E x4 link. But that x4 link only becomes an issue when transferring from a storage device to system memory(opening a program/game).
Most manufacturers make it pretty clear where the PCI-E lanes are coming from, or it is pretty easy to figure it out. The 16 lanes from the CPU are supposed to only be used for graphics. The first PCI-E x16 slot is almost always connected to the CPU. If there is a second slot PCI-E x16 slot, then almost always the first slot will become an x8 and the second will be an x8 as well, because they are sharing the 16 lanes from the CPU. The specs of the motherboard will tell you this. You'll see something in the specs like "single at x16 ; dual at x8 / x8" Some even say "single at x16 ; dual at x8 / x8 ; triple at x8 / x4 / x4". In that case, all 3 PCI-E x16 slots are actually connected to the CPU, but when multiple are used, they run at x8 or x4 speed.
Any other slot that doesn't share bandwidth like this, is pretty much guaranteed to be using the chipset lanes and not the ones directly connected to the CPU.
I know, but I never figured out how these PCIe complexes are built. If, like you say, they use DMA to talk to each other "directly", that would be great. And I suspect that's what's going on, but I never confirmed it.
I remember back in the days when the CPU had to handle data transfers, it was so slow. Does anyone else remember the days when burning a CD would max out your CPU, and if you tried to open anything else on the computer, the burn would fail? That was because DMA wasn't a thing(and buffer underrun protection wasn't a thing yet either).
More lanes is nice to have though and Threadripper is really the only game in town at that level/price.
Threadripper isn't a perfect solution either. In fact, it introduces a new set of issues. The fact that Threadripper is really a MCM, and the PCI-E lanes coming from the CPU are actually split up like they are on two different CPUs, leads to issues. If a device connected to one CPU wants to talk to another, it has to be done over the Infinity Fabric, which introduces latency. And it really isn't that much better than going over Intel's DMI link from the PCH to the CPU. It also had issues with RAID, due to the drives essentially being connected to two different storage controllers, but I think AMD has finally worked that one out.