• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Social Media Imagines AMD "Navi 48" RDNA 4 to be a Dual-Chiplet GPU

Joined
Jan 8, 2017
Messages
9,499 (3.27/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
This was long speculated and of course sooner or later it will happen.
 
Joined
Jun 2, 2017
Messages
9,353 (3.39/day)
System Name Best AMD Computer
Processor AMD 7900X3D
Motherboard Asus X670E E Strix
Cooling In Win SR36
Memory GSKILL DDR5 32GB 5200 30
Video Card(s) Sapphire Pulse 7900XT (Watercooled)
Storage Corsair MP 700, Seagate 530 2Tb, Adata SX8200 2TBx2, Kingston 2 TBx2, Micron 8 TB, WD AN 1500
Display(s) GIGABYTE FV43U
Case Corsair 7000D Airflow
Audio Device(s) Corsair Void Pro, Logitch Z523 5.1
Power Supply Deepcool 1000M
Mouse Logitech g7 gaming mouse
Keyboard Logitech G510
Software Windows 11 Pro 64 Steam. GOG, Uplay, Origin
Benchmark Scores Firestrike: 46183 Time Spy: 25121
This is cool. AMD Crossfire was much easier to run than SLI. AMD already wrote the software to make it run without the need of a dedicated connector. This could be huge for a Dual GPU based system. If it is baked into the driver it would be great.
 
Joined
Apr 2, 2011
Messages
2,847 (0.57/day)
As has been said, the silicon is dual chiplet but the interconnect would render it as a single chip visible to the OS. It's done through the magic of extremely high speed interconnections...and thus requires precision made substrates to bond to and a decent internal configuration that handles the -relatively- minor communication load losses so that you have 0.9x the power of each GPU visible to the OS as a 1.8x times improvement over the single core...though if they are truly sharing a single memory space it'd be closer to physics limitation and 0.98 or the like.

The issue with crossfire and sli is that both needed to connect over a bus, so step one was to figure out which GPU was doing what. You then lost time in communication and population of data to the correct memory space...and it all leaves a sour taste in the mouth when 2 mid-level GPUs we 1+1 = 1.5, whereas you could instead buy a GPU two levels higher and get 1.8x the performance instead of 1.5...and things worked without all of the driver shenanigans. That, for the record, is why crossfire and sli died. It'd be nice to see that come back...but with 3060 GPUs still selling for almost $300 on ebay I cannot see the benefit to pursuing it.

So we are clear, my optimism lies with AMD pumping out a dual chip version that works in the middle segment and blows monolithic silicon out of the water. If AMD is rational they give most of that cost savings back to the customer, and sell to the huge consumer base that wants more power without having to take out a mortgage to do so. Good QHD gameplay, and 120hz+ 1080p performance, with compromise 4k performance at the $500 mark would be something for everyone to love without really getting into a fight with Nvidia. Not sure where Intel comes out in all of this, but Battlemage has to be less bumpy than Arc. Hopefully they can also release something, that makes the gigantic and under served middle market happy...and gives the Ngreedia side of Nvidia a black eye.


All of this is fluff anyways...so why not indulge in a fantasy?
 
Joined
Feb 3, 2017
Messages
3,806 (1.32/day)
Processor Ryzen 7800X3D
Motherboard ROG STRIX B650E-F GAMING WIFI
Memory 2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s) INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage 2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s) 42" LG C2 OLED, 27" ASUS PG279Q
Case Thermaltake Core P5
Power Supply Fractal Design Ion+ Platinum 760W
Mouse Corsair Dark Core RGB Pro SE
Keyboard Corsair K100 RGB
VR HMD HTC Vive Cosmos
This is what the nVidia B200 is doing, using two chiplets that behave as a single and are seen as a single die by the software. Or did you mean remotely plausible for AMD to pull off? Granted, the B200 is NOT a consumer level GPU...
Consumer - read: gaming - has different requirements than datacenter compute monsters. Mainly - latency in whatever kind of cooperation the chips are going to be doing. Memory access is wide and expensive especially when having to go to another GPU's VRAM etc. Same as it has always been.

The closest we got was SLI/Crossfire and that was a bunch of driver magic from both sides. SLI/Crossfire died due to new incoming rendering methods that made the whole thing expensive to maintain. Plus, incoming DX12 and Vulkan with their own ways to handle multi-GPU - the implicit and explicit mentioned in the article. Which basically no game developers tried to properly implement.

It is different in that instead of making the entire GPU die on the 5nm node, they took the cache and memory controllers and fabbed them as chiplets on the older 6nm node because these parts do not benefits so much from a node shrink. All of the chiplets were then arranged to make a full die. This was an ingenious way to target the parts of the GPU getting the largest performance benefits of the 5nm node shrink, while saving cost by not using a cutting edge node on the parts that do not. Fantastic engineering in my opinion
The ingenious bit was figuring out which parts of a GPU can be separated. The problem always has been and still is that splitting up the compute array is not doable, at least has not been so far. It has been 15+ years since AMD first publicly said they are trying to go for that. Nvidia has been rumored to look into the same thing for 10+ years as well. Both AMD and Nvidia occasionally publish papers about how to split a GPU but the short version of conclusions has been that you really can't.

Again, the context here is gaming GPU. Something that is very latency-sensitive.

Watch this, especially from 10:50
Well, that actually did directly bring out the bandwidth problem as well. Couple orders of magnitude higher than what they did on CPUs.

This is cool. AMD Crossfire was much easier to run than SLI. AMD already wrote the software to make it run without the need of a dedicated connector. This could be huge for a Dual GPU based system. If it is baked into the driver it would be great.
Dedicated connector had a purpose - it was dedicated and under direct control of GPUs. It was not about bandwidth but more about latency and guaranteed availability. Remember, PCIe does not guarantee that GPU is able to send stuff to the other one over it quickly enough. Plus, in some situations the PCIe interface of the GPU could be busy with something else - reading stuff from RAM for example, textures or whatnot. That was a consideration of whether it was worth doing a separate thing for that and it did seem to benefit for a long while. I guess in the end PCIe simply got fast enough :)

As has been said, the silicon is dual chiplet but the interconnect would render it as a single chip visible to the OS. It's done through the magic of extremely high speed interconnections...and thus requires precision made substrates to bond to and a decent internal configuration that handles the -relatively- minor communication load losses so that you have 0.9x the power of each GPU visible to the OS as a 1.8x times improvement over the single core...though if they are truly sharing a single memory space it'd be closer to physics limitation and 0.98 or the like.
Nice in theory but interconnects are not magic. The level of interconnect to actually replace some internal connection in GPU - say the connections to shader engines the video was talking about - does come with a cost. And that cost is probably power, given the bandwidth requirements - a lot of power.

So we are clear, my optimism lies with AMD pumping out a dual chip version that works in the middle segment and blows monolithic silicon out of the water. If AMD is rational they give most of that cost savings back to the customer, and sell to the huge consumer base that wants more power without having to take out a mortgage to do so.
Unless there is some new breakthrough development that makes everything simple, this makes no sense to do in mid segment. The added complexity and overhead starts justifying itself when die sizes get very large. In practice - when die sizes are nearing either the reticle limit or yield limits. And this is not a cheap solution for a GPU.

The savings might not be quite as significant as you imagine. RX7000 series is a chiplet design and while cheaper than Nvidia it is basically relatively same as it has always been. These did not end up dominating the generation due to being cheap.
 
Last edited:
Joined
Jan 8, 2017
Messages
9,499 (3.27/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
Consumer - read: gaming - has different requirements than datacenter compute monsters. Mainly - latency in whatever kind of cooperation the chips are going to be doing. Memory access is wide and expensive especially when having to go to another GPU's VRAM etc. Same as it has always been.
GPUs in general are resilient to latency by design, that's why the memory chips used in GPUs have always been way faster than chips used for system RAM at the cost of much higher latency, it just doesn't matter as much. There is nothing special that those datacenter GPUs do with regards to mitigating latency.
 
Joined
Feb 3, 2017
Messages
3,806 (1.32/day)
Processor Ryzen 7800X3D
Motherboard ROG STRIX B650E-F GAMING WIFI
Memory 2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s) INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage 2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s) 42" LG C2 OLED, 27" ASUS PG279Q
Case Thermaltake Core P5
Power Supply Fractal Design Ion+ Platinum 760W
Mouse Corsair Dark Core RGB Pro SE
Keyboard Corsair K100 RGB
VR HMD HTC Vive Cosmos
Ive been saying this since the dual core days when I was running my AMD64 X2 3800+ (939, Manchester, clocked at 2.66Ghz...) If we could have dual core CPUs then why couldnt we have dual core GPUs? Then we went down the road of having two GPUs on one PCB... those all died. crossfire died, SLi died and now we are finally going somewhere.

Its confused me for the longest time why we couldnt have a dual or quad chiplet GPu design. I thought that all the knowledge and experience that AMD gained working on desktop and server chips would carry over to the GPu side of things but it never did till now.
Because CPU cores are doing their work independently. Coordinating stuff between them is not that easy and native. GPUs tend to work differently.

But, from a different perspective, multiGPU is dual or quad or whatever GPUs. Unfortunately nobody really has gone through the effort to write software to run on these. DX12 and implicit/explicit allow doing it just fine in theory :)

GPUs in general are resilient to latency by design, that's why the memory chips used in GPUs have always been way faster than chips used for system RAM at the cost of much higher latency, it just doesn't matter as much. There is nothing special that those datacenter GPUs do with regards to mitigating latency.
I did not mean memory. More internal than that - cache and work coordination basically. Ideally we want what today, under 10ms per frame? Any delays will add up quickly.
Workloads in datacenter generally do not care about that aspect as much.
 
Joined
Apr 18, 2019
Messages
2,392 (1.15/day)
Location
Olympia, WA
System Name Sleepy Painter
Processor AMD Ryzen 5 3600
Motherboard Asus TuF Gaming X570-PLUS/WIFI
Cooling FSP Windale 6 - Passive
Memory 2x16GB F4-3600C16-16GVKC @ 16-19-21-36-58-1T
Video Card(s) MSI RX580 8GB
Storage 2x Samsung PM963 960GB nVME RAID0, Crucial BX500 1TB SATA, WD Blue 3D 2TB SATA
Display(s) Microboard 32" Curved 1080P 144hz VA w/ Freesync
Case NZXT Gamma Classic Black
Audio Device(s) Asus Xonar D1
Power Supply Rosewill 1KW on 240V@60hz
Mouse Logitech MX518 Legend
Keyboard Red Dragon K552
Software Windows 10 Enterprise 2019 LTSC 1809 17763.1757
What exactly makes this rumor even remotely plausible?

Multi-GPU resource coherency over InfinityFabric has existed since at least RDNA 2.
 
Joined
Feb 26, 2024
Messages
99 (0.33/day)
Because CPU cores are doing their work independently. Coordinating stuff between them is not that easy and native. GPUs tend to work differently.
And GPUs have to deal with some stubbornly "single threaded" code in DX3D (still, even DX12). Data center GPUs don't have this drawback.
 
Joined
Jan 8, 2017
Messages
9,499 (3.27/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
I did not mean memory. More internal than that - cache and work coordination basically. Ideally we want what today, under 10ms per frame? Any delays will add up quickly.
Workloads in datacenter generally do not care about that aspect as much.
What I am saying is GPU workloads in general do not care about that because the execution model hides latencies very well. The way it works is you're executing the same sequence of instructions over some number of data items, usually orders of magnitude more than what the GPU cores can process in one clock cycle and what you are interested in is how fast can that batch be processed, not how fast every item can be processed, in which case latency would have mattered more.

Because every data item takes the same time to process and you always know which data items come next it's very easy to hide memory access like so :

... ->data transfer -> processing
--------------------> data transfer -> processing
------------------------------ ----- -> data transfer -> ....

So if you increase the latency but the execution time is always more that what it takes to initiate the next data transfer it makes no real difference to the overall performance, it's far more important how many memory transfers you can initiate than how long they each take.
 
Joined
Feb 3, 2017
Messages
3,806 (1.32/day)
Processor Ryzen 7800X3D
Motherboard ROG STRIX B650E-F GAMING WIFI
Memory 2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s) INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage 2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s) 42" LG C2 OLED, 27" ASUS PG279Q
Case Thermaltake Core P5
Power Supply Fractal Design Ion+ Platinum 760W
Mouse Corsair Dark Core RGB Pro SE
Keyboard Corsair K100 RGB
VR HMD HTC Vive Cosmos
Multi-GPU resource coherency over InfinityFabric has existed since at least RDNA 2.
That is two RX6800s on a single PCB. Chiplet design as in the start of this thread kind of implies a bit more integration than that :)

I mean everyone seems to be expecting something more like MI300X .But for some reason AMD has not done even a gaming demonstration with something like that. Nor has Nvidia. Both have been doing chiplets in data center for several generations now.
 
Joined
Apr 18, 2019
Messages
2,392 (1.15/day)
Location
Olympia, WA
System Name Sleepy Painter
Processor AMD Ryzen 5 3600
Motherboard Asus TuF Gaming X570-PLUS/WIFI
Cooling FSP Windale 6 - Passive
Memory 2x16GB F4-3600C16-16GVKC @ 16-19-21-36-58-1T
Video Card(s) MSI RX580 8GB
Storage 2x Samsung PM963 960GB nVME RAID0, Crucial BX500 1TB SATA, WD Blue 3D 2TB SATA
Display(s) Microboard 32" Curved 1080P 144hz VA w/ Freesync
Case NZXT Gamma Classic Black
Audio Device(s) Asus Xonar D1
Power Supply Rosewill 1KW on 240V@60hz
Mouse Logitech MX518 Legend
Keyboard Red Dragon K552
Software Windows 10 Enterprise 2019 LTSC 1809 17763.1757
That is two RX6800s on a single PCB. Chiplet design as in the start of this thread kind of implies a bit more integration than that :)
It was entirely to point out that:
InfinityFabric has already been used for 'bonding' the resources of 2 (otherwise discrete) GPUs into one, and that it's viable for more than strictly Machine Learning.
Potentially, both dies sharing package and membus, would mean less latency.
I mean everyone seems to be expecting something more like MI300X .But for some reason AMD has not done even a gaming demonstration with something like that. Nor has Nvidia. Both have been doing chiplets in data center for several generations now.
Those Radeon Instinct chiplets are connected via IF, too. It's the same concept/technology, differing scale and implementation.

I think this 'rumor' has some basis in reality, given how AMD's been using its technologies: Modularity and Scalability.
 
Joined
Feb 3, 2017
Messages
3,806 (1.32/day)
Processor Ryzen 7800X3D
Motherboard ROG STRIX B650E-F GAMING WIFI
Memory 2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s) INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage 2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s) 42" LG C2 OLED, 27" ASUS PG279Q
Case Thermaltake Core P5
Power Supply Fractal Design Ion+ Platinum 760W
Mouse Corsair Dark Core RGB Pro SE
Keyboard Corsair K100 RGB
VR HMD HTC Vive Cosmos
Yes, Infinity Fabric and related interconnects exist and work just fine. Is interconnect type really the problem here?
Shared membus typically does not mean less latency.

Edit:
If you meant W6800X Duo with the memory bus sharing then that is not the case. On that card, both 6800 GPUs manage 32GB VRAM for the total of 64GB on the card. Yes, the interconnect makes the connection between GPUs faster but that is not membus sharing.
 
Last edited:
Joined
Apr 18, 2019
Messages
2,392 (1.15/day)
Location
Olympia, WA
System Name Sleepy Painter
Processor AMD Ryzen 5 3600
Motherboard Asus TuF Gaming X570-PLUS/WIFI
Cooling FSP Windale 6 - Passive
Memory 2x16GB F4-3600C16-16GVKC @ 16-19-21-36-58-1T
Video Card(s) MSI RX580 8GB
Storage 2x Samsung PM963 960GB nVME RAID0, Crucial BX500 1TB SATA, WD Blue 3D 2TB SATA
Display(s) Microboard 32" Curved 1080P 144hz VA w/ Freesync
Case NZXT Gamma Classic Black
Audio Device(s) Asus Xonar D1
Power Supply Rosewill 1KW on 240V@60hz
Mouse Logitech MX518 Legend
Keyboard Red Dragon K552
Software Windows 10 Enterprise 2019 LTSC 1809 17763.1757
Yes, Infinity Fabric and related interconnects exist and work just fine. Is interconnect type really the problem here?
Shared membus typically does not mean less latency.

Edit:
If you meant W6800X Duo with the memory bus sharing then that is not the case. On that card, both 6800 GPUs manage 32GB VRAM for the total of 64GB on the card. Yes, the interconnect makes the connection between GPUs faster but that is not membus sharing.
Radeon Instinct is not a 'for-graphics' product, W6800 X Duo was.
I was merely pointing out that the technology has already been demonstrated on existing 'for-graphics' Navi21 silicon (non-MI/AI use hardware, using the W6800 X Duo).
Meaning, that this hypothetical dual-die Navi4x is not entirely unrealistic.



My comments on latency and membus were explicitly pointed towards the topic's prospective dual-die GPU.

Physical proximity decreases latency; sharing a membus rather than having to communicate over 2 (distant) memory buses, will be lower latency.
(Ex. Dual 2.6Ghz Troy Opteron 252s, ea. w/ single-channel RAM vs. a single 2.6Ghz dual-core Toledo FX-60 w/ dual-channel RAM) P.S. InfinityFabric is a superset of HyperTransport
 
Joined
Apr 30, 2020
Messages
996 (0.59/day)
System Name S.L.I + RTX research rig
Processor Ryzen 7 5800X 3D.
Motherboard MSI MEG ACE X570
Cooling Corsair H150i Cappellx
Memory Corsair Vengeance pro RGB 3200mhz 32Gbs
Video Card(s) 2x Dell RTX 2080 Ti in S.L.I
Storage Western digital Sata 6.0 SDD 500gb + fanxiang S660 4TB PCIe 4.0 NVMe M.2
Display(s) HP X24i
Case Corsair 7000D Airflow
Power Supply EVGA G+1600watts
Mouse Corsair Scimitar
Keyboard Cosair K55 Pro RGB
As has been said, the silicon is dual chiplet but the interconnect would render it as a single chip visible to the OS. It's done through the magic of extremely high speed interconnections...and thus requires precision made substrates to bond to and a decent internal configuration that handles the -relatively- minor communication load losses so that you have 0.9x the power of each GPU visible to the OS as a 1.8x times improvement over the single core...though if they are truly sharing a single memory space it'd be closer to physics limitation and 0.98 or the like.

The issue with crossfire and sli is that both needed to connect over a bus, so step one was to figure out which GPU was doing what. You then lost time in communication and population of data to the correct memory space...and it all leaves a sour taste in the mouth when 2 mid-level GPUs we 1+1 = 1.5, whereas you could instead buy a GPU two levels higher and get 1.8x the performance instead of 1.5...and things worked without all of the driver shenanigans. That, for the record, is why crossfire and sli died. It'd be nice to see that come back...but with 3060 GPUs still selling for almost $300 on ebay I cannot see the benefit to pursuing it.

So we are clear, my optimism lies with AMD pumping out a dual chip version that works in the middle segment and blows monolithic silicon out of the water. If AMD is rational they give most of that cost savings back to the customer, and sell to the huge consumer base that wants more power without having to take out a mortgage to do so. Good QHD gameplay, and 120hz+ 1080p performance, with compromise 4k performance at the $500 mark would be something for everyone to love without really getting into a fight with Nvidia. Not sure where Intel comes out in all of this, but Battlemage has to be less bumpy than Arc. Hopefully they can also release something, that makes the gigantic and under served middle market happy...and gives the Ngreedia side of Nvidia a black eye.


All of this is fluff anyways...so why not indulge in a fantasy?
1. Dx12 is ten times more relienant on drivers than dx11 was espically with the upscallers now..
Just look at how much intel ARC GPUs improvdd from drivers alone.

2. People forgot about that mGPU is already feesable???

3 All of the RNDA1/2/3 supports mGPU without any phyical concections

4. The real probelm with mGPU. as I've found out is that mGPU is heavily CPU bound/bottlenecked on most games that have it properly done. Unlike a few older games that moved to vuklan for it.

6. Stop comapling about scaling fof gpus, becuase even dual cores cpus didn't get 200% increases till things started using more than 2 cores. beside gous don't always output every frame they can.nvidia literally has a fast option in the control panel to dump frames the gpu knows the monitor can't render fast enough. Its udner ths v-sync settings. Frame geration does the oppoiaits of that setting


7. All upscaling requires far more work in drivers, patches for games, updates & intergation to the game than sl.i or crossfire ever needed. (Let alonw the A.I that nvidia has to run it through before makw a driver or a patch fof a game)
 
Joined
Jul 7, 2019
Messages
929 (0.47/day)
What's next? "AI Imagines AMD "Navi 58" RDNA 5 to be a Quad-Chiplet GPU"?
FWIW, several CPU and GPU makers already state they'll be using AI to help speed up designing their next-gen CPUs and GPUs. So while different than you imply, there's every possibility that chipmakers might upload their designs into an AI-powered designer and try out any of the possible suggested optimal setups.
 
Joined
Jun 2, 2017
Messages
9,353 (3.39/day)
System Name Best AMD Computer
Processor AMD 7900X3D
Motherboard Asus X670E E Strix
Cooling In Win SR36
Memory GSKILL DDR5 32GB 5200 30
Video Card(s) Sapphire Pulse 7900XT (Watercooled)
Storage Corsair MP 700, Seagate 530 2Tb, Adata SX8200 2TBx2, Kingston 2 TBx2, Micron 8 TB, WD AN 1500
Display(s) GIGABYTE FV43U
Case Corsair 7000D Airflow
Audio Device(s) Corsair Void Pro, Logitch Z523 5.1
Power Supply Deepcool 1000M
Mouse Logitech g7 gaming mouse
Keyboard Logitech G510
Software Windows 11 Pro 64 Steam. GOG, Uplay, Origin
Benchmark Scores Firestrike: 46183 Time Spy: 25121
1. Dx12 is ten times more relienant on drivers than dx11 was espically with the upscallers now..
Just look at how much intel ARC GPUs improvdd from drivers alone.

2. People forgot about that mGPU is already feesable???

3 All of the RNDA1/2/3 supports mGPU without any phyical concections

4. The real probelm with mGPU. as I've found out is that mGPU is heavily CPU bound/bottlenecked on most games that have it properly done. Unlike a few older games that moved to vuklan for it.

6. Stop comapling about scaling fof gpus, becuase even dual cores cpus didn't get 200% increases till things started using more than 2 cores. beside gous don't always output every frame they can.nvidia literally has a fast option in the control panel to dump frames the gpu knows the monitor can't render fast enough. Its udner ths v-sync settings. Frame geration does the oppoiaits of that setting


7. All upscaling requires far more work in drivers, patches for games, updates & intergation to the game than sl.i or crossfire ever needed. (Let alonw the A.I that nvidia has to run it through before makw a driver or a patch fof a game)
Multi Gpu works fine. You can even mix whatever card you want in a system. As far as Gaming goes, once Total War stopped supporting Multi GPU. It did not become as important to me.
 
Joined
Feb 3, 2017
Messages
3,806 (1.32/day)
Processor Ryzen 7800X3D
Motherboard ROG STRIX B650E-F GAMING WIFI
Memory 2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s) INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage 2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s) 42" LG C2 OLED, 27" ASUS PG279Q
Case Thermaltake Core P5
Power Supply Fractal Design Ion+ Platinum 760W
Mouse Corsair Dark Core RGB Pro SE
Keyboard Corsair K100 RGB
VR HMD HTC Vive Cosmos
Physical proximity decreases latency; sharing a membus rather than having to communicate over 2 (distant) memory buses, will be lower latency.
(Ex. Dual 2.6Ghz Troy Opteron 252s, ea. w/ single-channel RAM vs. a single 2.6Ghz dual-core Toledo FX-60 w/ dual-channel RAM) P.S. InfinityFabric is a superset of HyperTransport
Sharing a membus would still have a penalty compared to a dedicated membus from a single GPU perspective, no?
Basically you would envision something similar to chiplet Zen where Memory controller is separate from compute and connected to both/all compute dies via IF?
1. Dx12 is ten times more relienant on drivers than dx11 was espically with the upscallers now..
Just look at how much intel ARC GPUs improvdd from drivers alone.
Isn't that the other way around? DX12 is a lower-level API than say DX11. Developer needs to do more heavy lifting there and API implementation in drivers does less and has less of an impact.
Intel ARC is a strange example here. ARC had DX12 drivers working quite well when they came out. It was DX11, DX9 etc that were a big problem. The thing with this example is though that it does not really say too much about drivers and APIs, creating a driver stack from relatively scratch is a daunting task and Intel had to prioritize and catch up later.
3 All of the RNDA1/2/3 supports mGPU without any phyical concections
4. The real probelm with mGPU. as I've found out is that mGPU is heavily CPU bound/bottlenecked on most games that have it properly done. Unlike a few older games that moved to vuklan for it.
This really isn't about cards but APIs and what developers choose to implement. SLI and Crossfire were a convenient middle ground where Nvidia and AMD respectively held some hands to make things easier.
Both current frontrunners in graphics APIs - DX12 and Vulkan - can work with multiple GPUs just fine. The problem is that writing this seems to be more trouble than it is worth so far.
 
Last edited:
Joined
Dec 30, 2010
Messages
2,199 (0.43/day)
It's 3DFX all over again ....

1730271370196.png


No without the jokes, that chiplet is likely being different then SLI. You have no PCI-E constraint as the latency's and bandwidth will exceed 100 times more then what SLI provides. So no more micro stuttering in the first place.

I think it's a double die working as one. Remember the whole idea of chiplets is to build the latest tech for GPU's / CPU's but things like an IO die on older tech, as they don't scale that well or add too much cost in regards of how many bad die's are out there.

The sole reason why Ryzen or EPYC is so damn efficient is that it simply creates CCD's that can be re-used instead of throwing out half of the chip like that (in the older case of Intel).

Nvidia is still doing monolithic die's with a huge cost per wafer and efficiency thrown out of the window (TDP: 600W coming).
 
Joined
Feb 3, 2017
Messages
3,806 (1.32/day)
Processor Ryzen 7800X3D
Motherboard ROG STRIX B650E-F GAMING WIFI
Memory 2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s) INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage 2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s) 42" LG C2 OLED, 27" ASUS PG279Q
Case Thermaltake Core P5
Power Supply Fractal Design Ion+ Platinum 760W
Mouse Corsair Dark Core RGB Pro SE
Keyboard Corsair K100 RGB
VR HMD HTC Vive Cosmos
The sole reason why Ryzen or EPYC is so damn efficient is that it simply creates CCD's that can be re-used instead of throwing out half of the chip like that (in the older case of Intel).

Nvidia is still doing monolithic die's with a huge cost per wafer and efficiency thrown out of the window (TDP: 600W coming).
Which efficiency do you mean?

What chiplets bring is efficient manufacturing - smaller individual dies, better yields. The main reason for chiplet design making sense is the additional cost in packaging being smaller than what it would cost to actually go for the monolithic die. Not the only reason of course, reuse is another one, exemplified by Ryzen's IO Die. Or straight up limits in manufacturing, reticle limit for example.

When it comes to power efficiency and performance benefit though, chiplet design is a straight up negative. Some of this can be mitigated but not completely negated.
 
Joined
Jun 2, 2017
Messages
9,353 (3.39/day)
System Name Best AMD Computer
Processor AMD 7900X3D
Motherboard Asus X670E E Strix
Cooling In Win SR36
Memory GSKILL DDR5 32GB 5200 30
Video Card(s) Sapphire Pulse 7900XT (Watercooled)
Storage Corsair MP 700, Seagate 530 2Tb, Adata SX8200 2TBx2, Kingston 2 TBx2, Micron 8 TB, WD AN 1500
Display(s) GIGABYTE FV43U
Case Corsair 7000D Airflow
Audio Device(s) Corsair Void Pro, Logitch Z523 5.1
Power Supply Deepcool 1000M
Mouse Logitech g7 gaming mouse
Keyboard Logitech G510
Software Windows 11 Pro 64 Steam. GOG, Uplay, Origin
Benchmark Scores Firestrike: 46183 Time Spy: 25121
Sharing a membus would still have a penalty compared to a dedicated membus from a single GPU perspective, no?
Basically you would envision something similar to chiplet Zen where Memory controller is separate from compute and connected to both/all compute dies via IF?
Isn't that the other way around? DX12 is a lower-level API than say DX11. Developer needs to do more heavy lifting there and API implementation in drivers does less and has less of an impact.
Intel ARC is a strange example here. ARC had DX12 drivers working quite well when they came out. It was DX11, DX9 etc that were a big problem. The thing with this example is though that it does not really say too much about drivers and APIs, creating a driver stack from relatively scratch is a daunting task and Intel had to prioritize and catch up later.
This really isn't about cards but APIs and what developers choose to implement. SLI and Crossfire were a convenient middle ground where Nvidia and AMD respectively held some hands to make things easier.
Both current frontrunners in graphics APIs - DX12 and Vulkan - can work with multiple GPUs just fine. The problem is that writing this seems to be more trouble than it is worth so far.
The first DX12 Multi GPU capable Game showed the potential but Ashes of the Singularity was not exciting enough in Gameplay to give the technology it's exposure. In that Game it does not matter what GPU you have and unlike Games like Jedi Fallen Order the implementation is not dependent on Crossfire/Sli drivers. The absolute best implementation of Multi GPU was Crossfire for Polaris Era cards. If it did not work it did not turn on the other GPU. I was a Total War addict at the time so the 90% improvement in performance mattered.
 
Joined
Apr 30, 2020
Messages
996 (0.59/day)
System Name S.L.I + RTX research rig
Processor Ryzen 7 5800X 3D.
Motherboard MSI MEG ACE X570
Cooling Corsair H150i Cappellx
Memory Corsair Vengeance pro RGB 3200mhz 32Gbs
Video Card(s) 2x Dell RTX 2080 Ti in S.L.I
Storage Western digital Sata 6.0 SDD 500gb + fanxiang S660 4TB PCIe 4.0 NVMe M.2
Display(s) HP X24i
Case Corsair 7000D Airflow
Power Supply EVGA G+1600watts
Mouse Corsair Scimitar
Keyboard Cosair K55 Pro RGB
I think it's a double die working as one. Remember the whole idea of chiplets is to build the latest tech for GPU's / CPU's but things like an IO die on older tech, as they don't scale that well or add too much cost in regards of how many bad die's are out there.

Stutter has been proven to be a game engine problem, & DX12 shader compiler issue & not a multi-GPU problem. There's plenty of Thread on this forum of people complaining about stuttering when using only a single cards there's was like 8 pages worth of complaints last time I looked.

So, the Multi-GPU causing stuttering is a complete MYTH!
 

iO

Joined
Jul 18, 2012
Messages
531 (0.12/day)
Location
Germany
Processor R7 5700x
Motherboard MSI B450i Gaming
Cooling Accelero Mono CPU Edition
Memory 16 GB VLP
Video Card(s) RX 7900 GRE Dual
Storage P34A80 512GB
Display(s) LG 27UM67 UHD
Case none
Power Supply Fractal Ion 650 SFX
Well, Navi31 has 5.3 TB/s bandwith between GCD and MCDs on a regular fiberglass substrate.
And 2.5 TB/s of bandwith are enough for Apple to glue two dies together so the Ultra SKU functions basically as a monolithic chip.
If the latency is low enough this rumor isnt even that unrealistic.
 
Joined
Jun 22, 2014
Messages
446 (0.12/day)
System Name Desktop / "Console"
Processor Ryzen 5950X / Ryzen 5800X
Motherboard Asus X570 Hero / Asus X570-i
Cooling EK AIO Elite 280 / Cryorig C1
Memory 32GB Gskill Trident DDR4-3600 CL16 / 16GB Crucial Ballistix DDR4-3600 CL16
Video Card(s) RTX 4090 FE / RTX 2080ti FE
Storage 1TB Samsung 980 Pro, 1TB Sabrent Rocket 4 Plus NVME / 1TB Sabrent Rocket 4 NVME, 1TB Intel 660P
Display(s) Alienware AW3423DW / LG 65CX Oled
Case Lian Li O11 Mini / Sliger CL530 Conswole
Audio Device(s) Sony AVR, SVS speakers & subs / Marantz AVR, SVS speakers & subs
Power Supply ROG Loki 1000 / Silverstone SX800
VR HMD Quest 3
FWIW, several CPU and GPU makers already state they'll be using AI to help speed up designing their next-gen CPUs and GPUs. So while different than you imply, there's every possibility that chipmakers might upload their designs into an AI-powered designer and try out any of the possible suggested optimal setups.
nVidia used AI for the 40 series to optimize chip layout, and has open sourced the tool for any chip designer to use. There was a TPU article about it before 40 series launched but I'm having trouble finding it. Here is a link to the tech though--

 
Joined
Jun 2, 2017
Messages
9,353 (3.39/day)
System Name Best AMD Computer
Processor AMD 7900X3D
Motherboard Asus X670E E Strix
Cooling In Win SR36
Memory GSKILL DDR5 32GB 5200 30
Video Card(s) Sapphire Pulse 7900XT (Watercooled)
Storage Corsair MP 700, Seagate 530 2Tb, Adata SX8200 2TBx2, Kingston 2 TBx2, Micron 8 TB, WD AN 1500
Display(s) GIGABYTE FV43U
Case Corsair 7000D Airflow
Audio Device(s) Corsair Void Pro, Logitch Z523 5.1
Power Supply Deepcool 1000M
Mouse Logitech g7 gaming mouse
Keyboard Logitech G510
Software Windows 11 Pro 64 Steam. GOG, Uplay, Origin
Benchmark Scores Firestrike: 46183 Time Spy: 25121
Stutter has been proven to be a game engine problem, & DX12 shader compiler issue & not a multi-GPU problem. There's plenty of Thread on this forum of people complaining about stuttering when using only a single cards there's was like 8 pages worth of complaints last time I looked.

So, the Multi-GPU causing stuttering is a complete MYTH!
The only Game I have that loves to Compile Shaders is Forza.
 
Joined
Nov 26, 2021
Messages
1,702 (1.52/day)
Location
Mississauga, Canada
Processor Ryzen 7 5700X
Motherboard ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling Noctua NH-C14S (two fans)
Memory 2x16GB DDR4 3200
Video Card(s) Reference Vega 64
Storage Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s) Nixeus NX-EDG27, and Samsung S23A700
Case Fractal Design R5
Power Supply Seasonic PRIME TITANIUM 850W
Mouse Logitech
VR HMD Oculus Rift
Software Windows 11 Pro, and Ubuntu 20.04
nVidia used AI for the 40 series to optimize chip layout, and has open sourced the tool for any chip designer to use. There was a TPU article about it before 40 series launched but I'm having trouble finding it. Here is a link to the tech though--

That article doesn't mention Ada, i.e. the 40 series. They also compare PPA with commercial tools rather than hand optimized layouts. Given that they didn't compare against the latter, I suspect humans still outperform AI for this.
 
Top