• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

NVIDIA Multi-Chip-Module Hopper GPU Rumored To Tape Out Soon

Joined
Jul 10, 2015
Messages
754 (0.22/day)
Location
Sokovia
System Name Alienation from family
Processor i7 7700k
Motherboard Hero VIII
Cooling Macho revB
Memory 16gb Hyperx
Video Card(s) Asus 1080ti Strix OC
Storage 960evo 500gb
Display(s) AOC 4k
Case Define R2 XL
Power Supply Be f*ing Quiet 600W M Gold
Mouse NoName
Keyboard NoNameless HP
Software You have nothing on me
Benchmark Scores Personal record 100m sprint: 60m
She was forced in filming, nothing to brag about.
 
Joined
Mar 21, 2016
Messages
2,508 (0.78/day)
That's what people said about CPUs using MCM. Those are problems to be solved, not ones to be avoided. With that said, I don't see MCM being unrealistic for GPUs. It'll just take time to get right.
It can't possibly be any worse than Lucid Hydra or SLI/CF what do they have to loose!!? For starters the more recent PCI-E bus along with infinity cache, resizable bar, direct storage, ect not to mention modern insanely multi core CPU's makes it more flexible than those other examples done in the past on slower everything above.

Even if not perfect I'm sure MCM will be better than past paired GPU mixing solutions in history unless these companies are just plain incompetent and learned nothing of value from the past attempts.
 

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
13,171 (2.80/day)
Location
Concord, NH, USA
System Name Apollo
Processor Intel Core i9 9880H
Motherboard Some proprietary Apple thing.
Memory 64GB DDR4-2667
Video Card(s) AMD Radeon Pro 5600M, 8GB HBM2
Storage 1TB Apple NVMe, 4TB External
Display(s) Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case MacBook Pro (16", 2019)
Audio Device(s) AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply 96w Power Adapter
Mouse Logitech MX Master 3
Keyboard Logitech G915, GL Clicky
Software MacOS 12.1
It can't possibly be any worse than Lucid Hydra or SLI/CF what do they have to loose!!? For starters the more recent PCI-E bus along with infinity cache, resizable bar, direct storage, ect not to mention modern insanely multi core CPU's makes it more flexible than those other examples done in the past on slower everything above.

Even if not perfect I'm sure MCM will be better than past paired GPU mixing solutions in history unless these companies are just plain incompetent and learned nothing of value from the past attempts.
Exactly. GPU manufacturers have already demonstrated an ability to do this with GPUs with their own memory and a considerable distance between the two chips and their memory. Latency is a very real problem (micro-stutter anyone?) but those are also issues that can be mitigated, even in SLI/Crossfire setups. That's a very different animal when instead of having two GPUs with two pools of memory talking over a relatively long PCIe link, think about two GPU chiplets and an I/O die like AMD does with their CPUs now with common GPU memory between the two dies. Latency will be far better than talking over PCIe, shared memory means no copying to another pool of memory, and direct access to the I/O die means that that I/O die can be responsible for handling output to displays.

All in all, I think going this route is a no-brainer from a scalability standpoint. The real issue is cost because it's a more complicated process to do MCM if you're not already doing it at scale. AMD kind of has a leg up on this because they're already doing it and are far further along than just having multiple dies on a chip. The I/O chip with more simple chiplets was a game changer to be honest and I think we'll see that kind of design drive future high performance GPU designs.

All in all, I think nVidia is about where Intel is on the MCM path. They have multiple dies per chip, but they're still doing the direct communication thing which doesn't scale as well as AMD's solution does. It works okay for two dies, 4 dies is complicated and costly (forget the latency penalty talking to a die you don't have a direct connection to,) and any more is inefficient. However with the I/O die, we've seen how many chiplets AMD can cram on to a single chip. Smaller dies are easier to produce consistently with better yields as well.

This is a slight tangent, but what I would like to see is AMD produce a CPU where the chiplets are mixed between CPU and GPU chiplets. We could finally see some really interesting APUs out of AMD if they did.
 
Joined
Jun 11, 2017
Messages
283 (0.10/day)
Location
Montreal Canada
Hmmmm I'm trying to remember who tried putting two GPU's on a single card. Something 20 years ago. Let me think. Oh wait Now I know the Voodoo 5 5500 AGP by 3dfx. Which Nvidia bought to have SLI. Then what did 3dfx do they said SLI was dead and tried to put everything on a single card. Why is history repeating itself.
 

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
13,171 (2.80/day)
Location
Concord, NH, USA
System Name Apollo
Processor Intel Core i9 9880H
Motherboard Some proprietary Apple thing.
Memory 64GB DDR4-2667
Video Card(s) AMD Radeon Pro 5600M, 8GB HBM2
Storage 1TB Apple NVMe, 4TB External
Display(s) Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case MacBook Pro (16", 2019)
Audio Device(s) AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply 96w Power Adapter
Mouse Logitech MX Master 3
Keyboard Logitech G915, GL Clicky
Software MacOS 12.1
Hmmmm I'm trying to remember who tried putting two GPU's on a single card. Something 20 years ago. Let me think. Oh wait Now I know the Voodoo 5 5500 AGP by 3dfx. Which Nvidia bought to have SLI. Then what did 3dfx do they said SLI was dead and tried to put everything on a single card. Why is history repeating itself.
It's happened a few times since. I think the most recent on the AMD side is the 295X2. The real kicker is having two pools of GPU memory. That's a solvable problem with an MCM design.
 
Joined
Mar 21, 2016
Messages
2,508 (0.78/day)
Exactly. GPU manufacturers have already demonstrated an ability to do this with GPUs with their own memory and a considerable distance between the two chips and their memory. Latency is a very real problem (micro-stutter anyone?) but those are also issues that can be mitigated, even in SLI/Crossfire setups. That's a very different animal when instead of having two GPUs with two pools of memory talking over a relatively long PCIe link, think about two GPU chiplets and an I/O die like AMD does with their CPUs now with common GPU memory between the two dies. Latency will be far better than talking over PCIe, shared memory means no copying to another pool of memory, and direct access to the I/O die means that that I/O die can be responsible for handling output to displays.

All in all, I think going this route is a no-brainer from a scalability standpoint. The real issue is cost because it's a more complicated process to do MCM if you're not already doing it at scale. AMD kind of has a leg up on this because they're already doing it and are far further along than just having multiple dies on a chip. The I/O chip with more simple chiplets was a game changer to be honest and I think we'll see that kind of design drive future high performance GPU designs.

All in all, I think nVidia is about where Intel is on the MCM path. They have multiple dies per chip, but they're still doing the direct communication thing which doesn't scale as well as AMD's solution does. It works okay for two dies, 4 dies is complicated and costly (forget the latency penalty talking to a die you don't have a direct connection to,) and any more is inefficient. However with the I/O die, we've seen how many chiplets AMD can cram on to a single chip. Smaller dies are easier to produce consistently with better yields as well.

This is a slight tangent, but what I would like to see is AMD produce a CPU where the chiplets are mixed between CPU and GPU chiplets. We could finally see some really interesting APUs out of AMD if they did.
AMD already has APU's what they could mix is two different APU chiplets though in a big.LITTLE type of scenario. Basically the base frequency and boost frequency of two chiplets could have overlap and convergence giving a spectrum mixture of performance and efficiency and a larger convergence at the middle.
 

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
13,171 (2.80/day)
Location
Concord, NH, USA
System Name Apollo
Processor Intel Core i9 9880H
Motherboard Some proprietary Apple thing.
Memory 64GB DDR4-2667
Video Card(s) AMD Radeon Pro 5600M, 8GB HBM2
Storage 1TB Apple NVMe, 4TB External
Display(s) Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case MacBook Pro (16", 2019)
Audio Device(s) AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply 96w Power Adapter
Mouse Logitech MX Master 3
Keyboard Logitech G915, GL Clicky
Software MacOS 12.1
AMD already has APU's what they could mix is two different APU chiplets though in a big.LITTLE type of scenario. Basically the base frequency and boost frequency of two chiplets could have overlap and convergence giving a spectrum mixture of performance and efficiency and a larger convergence at the middle.
I'm not thinking of two APU chiplets, but rather one chiplet being the CPU cores and the other being the GPU cores, both sharing the same I/O die in the middle.
 
Joined
Mar 21, 2016
Messages
2,508 (0.78/day)
That would work too I suppose, but you couldn't do as much with power savings in that scenario. A a example stuff like Nvidia's Optimus tech for laptops wouldn't be possible with that approach. You could go headless on the GPU I suppose, but that's more likely to happen in a server environment as opposed to consumer orientated devices. I'd rather be able to turn off the stronger or weaker chip entirely in some kind of deep sleep state to better conserve power.

I fail to see any performance to be gained from it if the underlying hardware adds up to the same amount as well. I suppose it might provide scenario's where you can do twice the work in single clock cycle off the CPU/GPU perhaps? If that's the angle you were going towards then maybe idk honestly. I still think the potential power savings of just two APU chiplets would make better sense overall. Just what that would bring for AMD in the mobile market alone is significant and hard to overlook.

It's happened a few times since. I think the most recent on the AMD side is the 295X2. The real kicker is having two pools of GPU memory. That's a solvable problem with an MCM design.

I think that's one of the significant things DX12 brought about that was intended to help with mGPU was pooled memory. Just wait til pooled memory, direct storage, and infinity cache are combined with mGPU things will start to heat up.
 
Joined
Dec 12, 2020
Messages
1,755 (1.19/day)
It's happened a few times since. I think the most recent on the AMD side is the 295X2. The real kicker is having two pools of GPU memory. That's a solvable problem with an MCM design.
The Radeon Pro Duo is the most recent gaming capable dual-GPU design from AMD, that was back in 2016 though.
 

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
13,171 (2.80/day)
Location
Concord, NH, USA
System Name Apollo
Processor Intel Core i9 9880H
Motherboard Some proprietary Apple thing.
Memory 64GB DDR4-2667
Video Card(s) AMD Radeon Pro 5600M, 8GB HBM2
Storage 1TB Apple NVMe, 4TB External
Display(s) Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case MacBook Pro (16", 2019)
Audio Device(s) AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply 96w Power Adapter
Mouse Logitech MX Master 3
Keyboard Logitech G915, GL Clicky
Software MacOS 12.1
I think that's one of the significant things DX12 brought about that was intended to help with mGPU was pooled memory. Just wait til pooled memory, direct storage, and infinity cache are combined with mGPU things will start to heat up.
I think that's a solution to the SLI/Crossfire problem, for sure. If GPU memory is shared, that no longer is an issue and actually solves one of the biggest problems with multi-GPU setups. The biggest cost is communication between the two dies to keep memory in sync. There are bandwidth and latency limitations on that front. Think about it, if both GPUs share memory, they can render and apply their changes directly to the same framebuffer for output. Microstutter would become a thing of the past.
The Radeon Pro Duo is the most recent gaming capable dual-GPU design from AMD, that was back in 2016 though.
Ah yeah, I forgot about that one. Still, 5 years ago isn't that long.
 
Joined
Mar 21, 2016
Messages
2,508 (0.78/day)
Place a single CPU core on each GPU chiplet designed around compression/decompression that has infinity cache and use direct storage and shared memory pool.
 
Top