NVIDIA "Hopper" Might Have Huge 1000 mm² Die, Monolithic Design

Spencer LeBlanc · Jan 31, 2022

Dishnetwork is doing a great job!

ModEl4 · Jan 31, 2022

So, around +20% regarding actual die size, +15% increased frequency potentially and +85% increased density regarding logic, - not so great density scaling for the analog parts,

https://fuse.wikichip.org/wp-content/uploads/2020/03/tsmc-5nm-density-q1-2020.png

+ hopper architectural improvements and it seems everything is in order, nothing surprising regarding this story

Cutechri · Jan 31, 2022

Nvidia still thinking monolithic is a good idea. Saddening

WeeRab · Feb 1, 2022

Just another step towards Nvidia's attempt to push everyone onto their cloud gaming scam, following the ridiculous pricing of their gaming hardware
You will own nothing and be happy.

WhoDecidedThat · Feb 1, 2022

Cutechri said:
Nvidia still thinking monolithic is a good idea. Saddening

It kinda is necessary for graphics. Dividing the workload between 2 dies is very tough at the performance level we have these days. It's also why SLI/Crossfire died as the cards got more powerful. Nvidia would def kill SLI for money but I am pretty sure AMD (the underdog) would not kill CrossFire unless they couldn't get it to work well enough to be worth it.

Cutechri · Feb 1, 2022

blanarahul said:
It kinda is necessary for graphics. Dividing the workload between 2 dies is very tough at the performance level we have these days. It's also why SLI/Crossfire died as the cards got more powerful. Nvidia would def kill SLI for money but I am pretty sure AMD (the underdog) would not kill CrossFire unless they couldn't get it to work well enough to be worth it.

MI250X and CDNA say hi

WhoDecidedThat · Feb 1, 2022

Cutechri said:
MI250X and CDNA say hi

blanarahul said:
It kinda is necessary for graphics.

Multi GPU is alive and thriving for server workloads. It is dead in consumer gaming hardware.

Cutechri · Feb 1, 2022

We'll see, I'm not gonna make assumptions like that until I see it with my own eyes. I learned my lesson about immediately assuming stuff in the tech space.

MCM in consumer gaming hardware is inevitable, 3090 is the biggest example of that. Look at how inefficient that monolithic design is, look at how difficult it is to get clock increases when every bump you give it has to be applied to 10k+ cores on the same die. Whether or not it works well will have to be determined.

WhoDecidedThat · Feb 1, 2022

Cutechri said:
MCM in consumer gaming hardware is inevitable

Of this I have no doubt. I just wanted to convey that there is a very good reason GPUs are monolithic while CPUs have been able to easily follow a multi die strategy as far back as the first Core 2 Quads. Think about a GPU generating an image - a light source from one corner of the screen can cast a shadow in the other corner of the screen. You can't divide this workload so easily. And these days games use temporal techniques like TAA which rely on past frames generated to create the new frames faster. So you can't divide this workload so easily either. The net result is that you can't divide the workload spacially because that's how light works. And you can't divide the workload temporally because of techniques like TAA. The result is that consumer GPUs have found it impossible to support multi GPU as they evolved to their current state.

I am not saying multi die GPU in consumer graphics is impossible. But it will take a lot more time and effort than we expect to get there.

Cutechri · Feb 1, 2022

blanarahul said:
Of this I have no doubt. I just wanted to convey that there is a very good reason GPUs are monolithic while CPUs have been able to easily follow a multi die strategy as far back as the first Core 2 Quads. Think about a GPU generating an image - a light source from one corner of the screen can cast a shadow in the other corner of the screen. You can't divide this workload so easily. And these days games use temporal techniques like TAA which rely on past frames generated to create the new frames faster. So you can't divide this workload so easily either. The net result is that you can't divide the workload spacially because that's how light works. And you can't divide the workload temporally because of techniques like TAA. The result is that consumer GPUs have found it impossible to support multi GPU as they evolved to their current state.

I am not saying multi die GPU in consumer graphics is impossible. But it will take a lot more time and effort than we expect to get there.

Oh yeah, of course. When consumer GPUs will take that leap is unknown but it will happen and I'm interested. As for Hopper, being monolithic on the server side is saddening.

Vayra86 · Feb 9, 2022

Cutechri said:
We'll see, I'm not gonna make assumptions like that until I see it with my own eyes. I learned my lesson about immediately assuming stuff in the tech space.

MCM in consumer gaming hardware is inevitable, 3090 is the biggest example of that. Look at how inefficient that monolithic design is, look at how difficult it is to get clock increases when every bump you give it has to be applied to 10k+ cores on the same die. Whether or not it works well will have to be determined.

Don't mix up the node, the architectural choices and the parameter of efficiency. They're different things. What we have seen in larger GPUs is really that clock speeds don't suffer as much as they used to back in the day. In other words: overall yields are better, the perf/clock range at which the chips come out of the oven is smaller, tighter, and clock control is highly dynamic.

This echoes in many things in the past 3-5 generations of GPUs:

NVIDIA GeForce GTX 980 Ti Specs

NVIDIA GM200, 1076 MHz, 2816 Cores, 176 TMUs, 96 ROPs, 6144 MB GDDR5, 1753 MHz, 384 bit

www.techpowerup.com

The best overclocker (in % perf, not necessarily peak clock, but even then, 1600mhz was a unicorn lower in the stack) in the whole stack of Maxwell was a top tier part. Peak clocks equal that of lower tiered parts with power within spec, and temperature started playing a greater role as GPU Boost was introduced. These were the last 28nm GPUs with fantastic yields.

Pascal:

NVIDIA GeForce GTX 1080 Ti Specs

NVIDIA GP102, 1582 MHz, 3584 Cores, 224 TMUs, 88 ROPs, 11264 MB GDDR5X, 1376 MHz, 352 bit

www.techpowerup.com

Review OC clocks on a properly cooled Gaming X:

Versus a GP104 part on the same cooler (1080):

And this was on a smaller node, straight from the get-go. GPU Boost was further refined. The architecture was stripped of anything non-gaming.

Since Turing, we saw a step back in clocks on a highly similar node, as Nvidia added new components to CUDA, but the small gap between parts in the stack remained.
Since Ampere, we saw another step back, considering this was another shrink and no clockspeed was earned. This can be attributed again to further focus on new CUDA components but also: Samsung's 8nm node that's definitely worse than anything TSMC has.

System Name	Tiny the White Yeti
Processor	7800X3D
Motherboard	MSI MAG Mortar b650m wifi
Cooling	CPU: Thermalright Peerless Assassin / Case: Phanteks T30-120 x3
Memory	32GB Corsair Vengeance 30CL6000
Video Card(s)	ASRock RX7900XT Phantom Gaming
Storage	Lexar NM790 4TB + Samsung 850 EVO 1TB + Samsung 980 1TB + Crucial BX100 250GB
Display(s)	Gigabyte G34QWC (3440x1440)
Case	Lian Li A3 mATX White
Audio Device(s)	Harman Kardon AVR137 + 2.1
Power Supply	EVGA Supernova G2 750W
Mouse	Steelseries Aerox 5
Keyboard	Lenovo Thinkpad Trackpoint II
VR HMD	HD 420 - Green Edition ;)
Software	W11 IoT Enterprise LTSC
Benchmark Scores	Over 9000