• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Blackwell GPU die sizes revealed; regression in some cases

Joined
Dec 12, 2016
Messages
2,444 (0.80/day)
Videocardz has compiled some great data about the die sizes and transistor density of Blackwell GPUs using HardwareLuxx as the source. It is becoming clear that Blackwell is only about AI/DLSS/RT. So the question is, with the exception of the 5090, will Blackwell have the same or less pure rasterization performance as Ada Lovelace?

NVIDIA reveals die sizes for GB200 Blackwell GPUs: GB202 is 750mm², features 92.2B transistors - VideoCardz.com

1737032220832.png
 
Am I crazy or does this feel like a refresh similar to raptor lake 13k -> 14k, with some extra software as a side main dish. (edit: now that I thought about it... that's the main dish)

Just a bit more juice for all the cards and if your lucky you get some cores on top...

Actually pretty disappointed. But I wasn't in the market for a Nvidia gpu anyway. (expect if they offered a crazy good card PerformanceTo€ wise)

Ohhhh well... let's hope it gets better in the next 4 years... (send help)
 
Last edited:
How can it have less raster performance?
 
How can it have less raster performance?
Some of the SKUs have less cores down the stack than the Supers in the 4000 series. Also rumors point to same or less cores in the 5060 series.

Of course you can also have less raster performance if clocks are lowered by more than cores are increased.

Nvidia added a lot more AI/RT/DLSS goodness in hardware. Since the 4 nm node is exactly the same between the 4000 and 5000 series, it looks like Nvidia is being stingy on cores and clocks to compensate.
 
Last edited:
Some of the SKUs have less cores down the stack than the Supers in the 4000 series. Also rumors point to same or less cores in the 5060 series.

Of course you can also have less raster performance if clocks are lowered by more than cores are increased.

Nvidia added a lot more AI/RT/DLSS goodness in hardware. Since the 4 nm node is exactly the same between the 4000 and 5000 series, it looks like Nvidia is being stingy on cores and clocks to compensate.
Expect more of this as process nodes get slower and slower to develop and release to market. Profits must go up and if there is no new node to improve performance/power/area, the die area is going to shrink and shrink. :(
 
Yep as many people predicted seems like 5090 is going to be around 30%-35% faster in raster then 4090 but when it comes to the 5080 Vs 4080 situation is even worst most likely only around 15% here the latest benchmark leaks from 2 games where wasn't used frame generation Resident Evil and Horizon Forbidden West

Screenshot 2025-01-16 173451.png


P.S. Note that in Resident Evil Ray Tracing was enabled and in Horizon Forbidden West DLSS was ON....so there is possibility that if RT and DLSS was Off performance increase could be even smaller
 
Yes, it's a margin expansion exercise pretty much. Only the 5090 looks interesting... It'll probably be a decent card still in 5 years or even 10 given how progress has slowed down. So you can kind of justify the 2k for it in that way.

Pretty depressing times in tech.
 
5070 Ti also worth it with 128 rope and 64 mo, if true. 4070 TiS was heavily cut to 96 rop and 48 cache. And yeah it's pretty much over now N5 to N3 30% chip density or 378~ shrinks to 290 mm sq, and N3 to N2 with diminishing returns 15%, next A16 with 7% backside power and nano sheet. But you can safely buy now and enjoy long years of 4k60 with 4x DLSS MFG 240 Hz OLeD without the risk of it depreciating and becoming e-waste by the year's end like my 2080 Ti.
 
Last edited:
Yep as many people predicted seems like 5090 is going to be around 30%-35% faster in raster then 4090 but when it comes to the 5080 Vs 4080 situation is even worst most likely only around 15% here the latest benchmark leaks from 2 games where wasn't used frame generation Resident Evil and Horizon Forbidden West

View attachment 380215

P.S. Note that in Resident Evil Ray Tracing was enabled and in Horizon Forbidden West DLSS was ON....so there is possibility that if RT and DLSS was Off performance increase could be even smaller
RT and DLSS performance most likely will be higher but pure rasterization performance increases are in doubt.

5070 Ti also worth it with 128 rope and 64 mo, if true. 4070 TiS was heavily cut to 96 rop and 48 cache. And yeah it's pretty much over now N5 to N3 30% chip density or 378~ shrinks to 290 mm sq, and N3 to N2 with diminishing returns 15%, next A16 with 7% backside power and nano sheet.
Increases in ROPs will be interesting. That might bring higher performance than my pessimism was thinking.
 
Seems unlikely since the 5080 has an 84 SM count and the 4090 has 144. Everyone might be confused by the DLSS4 only benchmarks provided by Nvidia.
Nope they said and I quote 'but the node is better, and there is always a gain every gen regardless'.

DLSS4 wasn't even in the picture back then. Gonna be a rough wake up call... yes the market is indeed fubar
 
Seems unlikely since the 5080 has an 84 SM count and the 4090 has 144. Everyone might be confused by the DLSS4 only benchmarks provided by Nvidia.

Only 128 out of 144 are enabled. There are no SKUs with the full AD102 chip, RTX 6000 Ada Generation (professional card) maxes out at 142 SM. The 4090 also only has 72 out of the 96 MB of L3 enabled from AD102.
 
Only 128 out of 144 are enabled. There are no SKUs with the full AD102 chip, RTX 6000 Ada Generation (professional card) maxes out at 142 SM. The 4090 also only has 72 out of the 96 MB of L3 enabled from AD102.
Oh you are right. The Videocardz table is showing SMs for the fully unlocked die. The sentiment is still there. It is doubtful that 128 SM 4090 will be beaten by the 84 SM 5080.
 
Yep as many people predicted seems like 5090 is going to be around 30%-35% faster in raster then 4090 but when it comes to the 5080 Vs 4080 situation is even worst most likely only around 15% here the latest benchmark leaks from 2 games where wasn't used frame generation Resident Evil and Horizon Forbidden West

View attachment 380215

P.S. Note that in Resident Evil Ray Tracing was enabled and in Horizon Forbidden West DLSS was ON....so there is possibility that if RT and DLSS was Off performance increase could be even smaller
As predicted pre release... : at best 15% from clocking... maybe 5% from optimizations and/or reduced bottlenecking in VRAM.
 
The one thing people aren't really accounting for are the CPU limitations. Even the mightiest of processors like the Intel KS or Ryzen X3D chips buckle under the pressure that an RTX 4090 can manage. This will worsen with the 5090.

if Blackwell was rearchitected in such a manner as to reduce the reliance on processor performance, we may yet see insane numbers from smaller silicon. Exciting times regardless.
 
Is Blackwell transistor count accurate there? Blackwell dies seem to have essentially less transistors per SM. Looking at what has been revealed that does not quite make sense. Blackwell should have slightly more transistors. Did they cut down somewhere? Cache?
 
The one thing people aren't really accounting for are the CPU limitations. Even the mightiest of processors like the Intel KS or Ryzen X3D chips buckle under the pressure that an RTX 4090 can manage. This will worsen with the 5090.

if Blackwell was rearchitected in such a manner as to reduce the reliance on processor performance, we may yet see insane numbers from smaller silicon. Exciting times regardless.

CPU limitations at 4k? So a 4090 getting 25fps at 4k is because of the CPU?
 
I think the green team is forcing a move to RT and AI generated fake frames, through market share and mind share before the other two companies have truly competitive products to gain more market share.

They will either look like a genius or bleed for awhile.
 
Is Blackwell transistor count accurate there? Blackwell dies seem to have essentially less transistors per SM. Looking at what has been revealed that does not quite make sense. Blackwell should have slightly more transistors. Did they cut down somewhere? Cache?

Dropping 1280 CU and 16 rops in GB205 compared to AD104 equals to 4,8 B transistors.
Adding 6144 CU, 512 in each GPC and no Rops, increased the die of GB202 by 15,9
Also only 88 MB cache are found in GB202 down from 96 in Ada.
N4P offers 11% more performance at the cost of relaxing the rules and also helps with the yields.
 
I mean...this is a new architecture, so we can't one to one say "no increase in performance" based solely on specs.

That said, notice they are bumping the power a lot on all models, so it does seem likely you see some increase from the architecture, but a good deal of it coming from increased power. 4070 (200W) to 5070 (250W) is a 25% increase. Meanwhile there is only a 4.34% increase in shaders for the 5070 over 4070, so if overall raster performance is closer to 15-20% increase, then yeah seems likely the increased power drawn is doing some heavy lifting. Still if we see a 5070 performing at about 4070 Ti levels, its still doing it with fewer shaders (7680 vs 6144) at slightly lower power, so there's some improvement in the architecture there.
 
if Blackwell was rearchitected in such a manner as to reduce the reliance on processor performance
That's only possible if they have a separate CPU in there. Practically everything is bottlenecked by the CPU one way or another after some point in time!
 
CPU limitations at 4k? So a 4090 getting 25fps at 4k is because of the CPU?

The only thing that's gonna run at 25 fps on a 4090 is like, Cyberpunk pathtraced or something equally absurd (perhaps Wukong with similarly extreme settings). Although, 4K isn't particularly hard to drive if you have enough GPU power (which the 4090 and now 5090 will comfortably have). 1080p was once just as challenging, after all.

Reduce the resolution and start targeting extra high frame rates, and the gap between the 4090 and the other cards shrinks quite considerably, and that is the point.
 
The only thing that's gonna run at 25 fps on a 4090 is like, Cyberpunk pathtraced or something equally absurd (perhaps Wukong with similarly extreme settings). Although, 4K isn't particularly hard to drive if you have enough GPU power (which the 4090 and now 5090 will comfortably have). 1080p was once just as challenging, after all.

Reduce the resolution and start targeting extra high frame rates, and the gap between the 4090 and the other cards shrinks quite considerably, and that is the point.
So far, Nvidia's CPU footprint on DX12 is still higher than its competitor(s?) so its not far fetched they'll grab some % there at some point.

But I don't think it relates much to the power of the GPUs. It comes down to the simple fact 'There is always a bottleneck'. And its a moving target, as faster CPUs get released, surely the top GPUs will increase their lead a little bit. This has been the case for longer than the last six years.

Also... despite efficiency from the API and hardware side... I think the biggest win to be gained is in the games and the code themselves. Occasionally you see how games do handling huge amounts of logic and its clear there is a lot to be gained there, and its directly engine dependent, more than anything. You can throw more hardware at it, but its still gonna run like shit, if it runs like shit otherwise. Just a little less so. The real CPU loads in games are often determined by continuously keeping a lot of stuff 'in the game' that influences the rest of the game, as fully dynamic components. This creates an exponential load as your simulation increases, and its why city builders often slow to a crawl in late game. Just too many things that need to be kept track of. I think its pretty inexcusable that any other type of game is dying for better CPUs, everything else should just be coded better, really. After all, devs can also do that if they code for Jaguar cores in the PS4 because they need to sell product.
 
Last edited:
So far, Nvidia's CPU footprint on DX12 is still higher than its competitor(s?) so its not far fetched they'll grab some % there at some point.

But I don't think it relates much to the power of the GPUs. It comes down to the simple fact 'There is always a bottleneck'. And its a moving target, as faster CPUs get released, surely the top GPUs will increase their lead a little bit. This has been the case for longer than the last six years.

Also... despite efficiency from the API and hardware side... I think the biggest win to be gained is in the games and the code themselves. Occasionally you see how games do handling huge amounts of logic and its clear there is a lot to be gained there, and its directly engine dependent, more than anything. You can throw more hardware at it, but its still gonna run like shit, if it runs like shit otherwise. Just a little less so. The real CPU loads in games are often determined by continuously keeping a lot of stuff 'in the game' that influences the rest of the game, as fully dynamic components. This creates an exponential load as your simulation increases, and its why city builders often slow to a crawl in late game. Just too many things that need to be kept track of. I think its pretty inexcusable that any other type of game is dying for better CPUs, everything else should just be coded better, really. After all, devs can also do that if they code for Jaguar cores in the PS4 because they need to sell product.
That is a great point. Victoria 3 comes to my mind, for those who have never played this game, it is a deep geopolitical economy simulator from Paradox. The launch version was laughably less complex than the current patch version with entire mechanics not yet implemented yet the performance has largely improved from the starting version. This is exactly the kind of game that bogs down in the late game because of all the data that the simulation is trying to track and manipulate. Paradox made it a priority to improve performance (because late game performance in Vic 3 is indeed pretty horrible) and they have succeeded to date.

Most games should be running better than they do on current CPUs. You see some games just look a lot better than others yet they also seem to run faster than their competition.
 
Back
Top