The Samsung node seems fine for GA-104 and below.
Peformance/Watt of the 3060/3060Ti/3070 is fantastic - way better than that previous-gen TSMC 12nm and competitive with the RDNA2 stuff made on TSMC 7nm.
I think the real problem is that Ampere's design simply isn't power efficient at scale. Nvidia and Nvidia apologists are quick to blame Samsung but the real blame here is on Nvidia for making a GPU with >28bn transistors that is >50% bigger than the previous gen. There's no way that was ever not going to be a power-guzzling monster. To put it into context, Nvida's GA-102 has more transistors than the 6900XT and you have to remember that almost a third of the transistors in Navi21 are just cache that use very little power.
The price of RT - I mentioned this during Turing releases too, and you're right. For smaller chips its 'okay' - but let's not forget the fact that even 'okay' GA104 has substantially higher TDP for each tier than previous gen Turing and that one already pushed it up too. The peak clocks are lower. So its efficient... but still big and it won't clock high, and I suppose only a cheaper Samsung node makes it marketable. If you think of it, the 3090 is really the worst possible chip they can bake on it, yet its positioned at the top with ditto pricing. Cut down...
Put it next to 1080ti efficiency/temps/TDP and it gets pretty laughable.
Is it Nvidia doing too many transistors...? Remember, lower peak clocks... so that's going to be an early turning point when you size up. If a node can't support big chips well, isn't it just a node with more limitations than the others? Its not the smallest, either. Its not the more efficient one with smaller chips, either. I have no conclusive answer tbh, but the competition offers perspective.
Ampere's design, had it been on 7nm TSMC, would probably sit 20-40W lower with no trouble depending on what SKU you look at. But even so, those chips are still effin huge... and I'm actually seeing Nvidia moving towards an 'AMD' move here, going bigger and bigger making their chips harder to market in the end. There are limits to size, Turing was already heavily inflated and trend seems to continue as Nvidia plans for full blown RT weight going forward. It is really as predicted... I think AMD is being VERY wise postponing that expansion of GPU hardware, look at what's happening around us - shortages don't accomodate mass sales of huge chips at all. Gaming for the happy few won't ever survive very long. I think Nvidia is going to get forced to either change or price itself out of the market.
And...meanwhile we don't see massive aversion to less capable RT-chips either. Its still 'fun to have' but never 'can't be missed', only a few are on that train if I look around. Its still just a box of effects that can be fixed in other ways and devs still do that, even primarily so. Consoles not being very fast in RT is another nail in the coffin - and those are certainly going to last longer than Ampere or Nvidia's next gen. I think the judge is still not out on what the market consensus is going to be - but multi functional chips that can do a little bit OR get used for raw performance are the only real way forward, long term. RT effects won't get easier. Another path I'm seeing is some hard limit (or baseline) of non-RT performance across the entire stack, and just RT bits getting scaled up, if the tech has matured in gaming. But
something's gonna give.
Nvidia have been focusing on RT core ray tracing and DLSS tensor cores performance, instead of improving performance per watt of cuda cores, we have, made zero performance per watt improvements since pascal and that's the reason why AMD cards now are slightly more efficient than Nvidia.
Imo ray tracing was a mistake and the performance benefits of DLSS could have easly been made up with actual upgrades in performance gen after gen instead of ray tracing, but people are going to disagree with me, and thats fine, Nvidia spent hundreds of millions to convince laymen consumers ray tracing was the future, when in reality it made things worse for benefits only an eagle eye would see.
We should not be surprised if CUDA efficiency has reached its peak. I think they already moved there with Maxwell and Pascal cemented it with high peak clocks with power delivery tweaks and small feature cuts. Given a good node, what they CAN do is clock CUDA a good 300 mhz higher again. More cache has also been implemented since Turing (and Ampere added more I believe). Gonna be interesting what they'll try next.