And you think Maxwell/Pascal achieves efficiency through, I don't know, magic pixies? Dear god. NVIDIA made a smart move to implement TBR when it actually mattered the most, while everyone was struggling to shrink 28nm down to 16/14nm. It's what gave them the power advantage because that was really the only way to make it more efficient with the given 28nm fab process. You can't just pull efficiency out of thin air. AMD simply decided not to do that and work with 28nm, probably to minimize development costs (lets be honest, they were financially occupied with Ryzen). It's what made Maxwell 2 so power efficient and once they shrunk that down to 14nm, they had additional edge with that on Pascal. It's not rocket science, it's simple understanding of GPU designs. And I mean understanding on a very high level. Anyone here should get it.
NVIDIA was throwing money into R&D and it paid off in a form of efficiency. Framebuffer compression, TBR rasterizer, it probably cost them a lot, but it paid off. AMD used a different approach saving costs by tweaking what they already had and worked with year old design which, despite being a bit more power hungry still delivered performance. Hawaii core tweaked a bit into Grenada made R9 390X competitive against GTX 980. They were essentially trading blows through games. Only reason why I grabbed GTX 980 was because I was curious. I've had Radeons for years and there was a lot of buzz around GTX 980 being efficient and on paper delivering higher DX12 support level. Which was a bit gimped by the lack of functional async, but whatever. So I said, lets give it a try. One may argue inferiority, but in my books, if it delivers performance, I frankly don't care how, be it through finesse of Maxwell 2 or through brute force of R9 390X. They both worked.