Tuesday, October 29th 2024

Social Media Imagines AMD "Navi 48" RDNA 4 to be a Dual-Chiplet GPU

A Chinese tech forum ChipHell user who goes by zcjzcj11111 sprung up a fascinating take on what the next-generation AMD "Navi 48" GPU could be, and put their imagination on a render. Apparently, the "Navi 48," which powers AMD's series-topping performance-segment graphics card, is a dual chiplet-based design, similar to the company's latest Instinct MI300 series AI GPUs. This won't be a disaggregated GPU such as the "Navi 31" and "Navi 32," but rather a scale-out multi-chip module of two GPU dies that can otherwise run on their own in single-die packages. You want to call this a multi-GPU-on-a-stick? Go ahead, but there are a couple of changes.

On AMD's Instinct AI GPUs, the chiplets have full cache coherence with each other, and can address memory controlled by each other. This cache coherence makes the chiplets work like one giant chip. In a multi-GPU-on-a-stick, there would be no cache coherence, the two dies would be mapped by the host machine as two separate devices, and then you'd be at the mercy of implicit or explicit multi-GPU technologies for performance to scale. This isn't what's happening on AI GPUs—despite multiple chiplets, the GPU is seen by the host as a single PCI device with all its cache and memory visible to software as a contiguously addressable block.
We imagine the "Navi 48" is modeled along the same lines as the company's AI GPUs. The graphics driver sees this package as a single GPU. For this to work, the two chiplets are probably connected by Infinity Fabric Fanout links—an interconnect with a much higher amount of bandwidth than a serial bus like PCIe. This is probably needed for the cache coherence to be effective. The "Navi 44" is probably just one of these chiplets sitting its own package.

In the render, the substrate and package is made to resemble that of the "Navi 32," which tends to agree with the theory that "Navi 48" will be a performance segment GPU, and a successor to the "Navi 32," "Navi 22," and "Navi 10," rather than being a successor to enthusiast-segment GPUs like the "Navi 21" and "Navi 31." This much was made clear by AMD in its recent interviews with the media.

Do we think the ChipHell rumor is plausible? Absolutely, considering nobody took the very first such renders about the AM5 package having an oddly-shaped IHS seriously. The "Navi 48" being a chiplet-based GPU is something within character for a company like AMD, which loves chiplets, MCMs, and disaggregated devices.
Sources: ChipHell Forums, HXL (Twitter)
Add your own comment

59 Comments on Social Media Imagines AMD "Navi 48" RDNA 4 to be a Dual-Chiplet GPU

#26
kapone32
This is cool. AMD Crossfire was much easier to run than SLI. AMD already wrote the software to make it run without the need of a dedicated connector. This could be huge for a Dual GPU based system. If it is baked into the driver it would be great.
Posted on Reply
#27
lilhasselhoffer
As has been said, the silicon is dual chiplet but the interconnect would render it as a single chip visible to the OS. It's done through the magic of extremely high speed interconnections...and thus requires precision made substrates to bond to and a decent internal configuration that handles the -relatively- minor communication load losses so that you have 0.9x the power of each GPU visible to the OS as a 1.8x times improvement over the single core...though if they are truly sharing a single memory space it'd be closer to physics limitation and 0.98 or the like.

The issue with crossfire and sli is that both needed to connect over a bus, so step one was to figure out which GPU was doing what. You then lost time in communication and population of data to the correct memory space...and it all leaves a sour taste in the mouth when 2 mid-level GPUs we 1+1 = 1.5, whereas you could instead buy a GPU two levels higher and get 1.8x the performance instead of 1.5...and things worked without all of the driver shenanigans. That, for the record, is why crossfire and sli died. It'd be nice to see that come back...but with 3060 GPUs still selling for almost $300 on ebay I cannot see the benefit to pursuing it.

So we are clear, my optimism lies with AMD pumping out a dual chip version that works in the middle segment and blows monolithic silicon out of the water. If AMD is rational they give most of that cost savings back to the customer, and sell to the huge consumer base that wants more power without having to take out a mortgage to do so. Good QHD gameplay, and 120hz+ 1080p performance, with compromise 4k performance at the $500 mark would be something for everyone to love without really getting into a fight with Nvidia. Not sure where Intel comes out in all of this, but Battlemage has to be less bumpy than Arc. Hopefully they can also release something, that makes the gigantic and under served middle market happy...and gives the Ngreedia side of Nvidia a black eye.


All of this is fluff anyways...so why not indulge in a fantasy?
Posted on Reply
#28
londiste
Franzen4RealThis is what the nVidia B200 is doing, using two chiplets that behave as a single and are seen as a single die by the software. Or did you mean remotely plausible for AMD to pull off? Granted, the B200 is NOT a consumer level GPU...
Consumer - read: gaming - has different requirements than datacenter compute monsters. Mainly - latency in whatever kind of cooperation the chips are going to be doing. Memory access is wide and expensive especially when having to go to another GPU's VRAM etc. Same as it has always been.

The closest we got was SLI/Crossfire and that was a bunch of driver magic from both sides. SLI/Crossfire died due to new incoming rendering methods that made the whole thing expensive to maintain. Plus, incoming DX12 and Vulkan with their own ways to handle multi-GPU - the implicit and explicit mentioned in the article. Which basically no game developers tried to properly implement.
Franzen4RealIt is different in that instead of making the entire GPU die on the 5nm node, they took the cache and memory controllers and fabbed them as chiplets on the older 6nm node because these parts do not benefits so much from a node shrink. All of the chiplets were then arranged to make a full die. This was an ingenious way to target the parts of the GPU getting the largest performance benefits of the 5nm node shrink, while saving cost by not using a cutting edge node on the parts that do not. Fantastic engineering in my opinion
The ingenious bit was figuring out which parts of a GPU can be separated. The problem always has been and still is that splitting up the compute array is not doable, at least has not been so far. It has been 15+ years since AMD first publicly said they are trying to go for that. Nvidia has been rumored to look into the same thing for 10+ years as well. Both AMD and Nvidia occasionally publish papers about how to split a GPU but the short version of conclusions has been that you really can't.

Again, the context here is gaming GPU. Something that is very latency-sensitive.
Bet0nWatch this, especially from 10:50
Well, that actually did directly bring out the bandwidth problem as well. Couple orders of magnitude higher than what they did on CPUs.
kapone32This is cool. AMD Crossfire was much easier to run than SLI. AMD already wrote the software to make it run without the need of a dedicated connector. This could be huge for a Dual GPU based system. If it is baked into the driver it would be great.
Dedicated connector had a purpose - it was dedicated and under direct control of GPUs. It was not about bandwidth but more about latency and guaranteed availability. Remember, PCIe does not guarantee that GPU is able to send stuff to the other one over it quickly enough. Plus, in some situations the PCIe interface of the GPU could be busy with something else - reading stuff from RAM for example, textures or whatnot. That was a consideration of whether it was worth doing a separate thing for that and it did seem to benefit for a long while. I guess in the end PCIe simply got fast enough :)
lilhasselhofferAs has been said, the silicon is dual chiplet but the interconnect would render it as a single chip visible to the OS. It's done through the magic of extremely high speed interconnections...and thus requires precision made substrates to bond to and a decent internal configuration that handles the -relatively- minor communication load losses so that you have 0.9x the power of each GPU visible to the OS as a 1.8x times improvement over the single core...though if they are truly sharing a single memory space it'd be closer to physics limitation and 0.98 or the like.
Nice in theory but interconnects are not magic. The level of interconnect to actually replace some internal connection in GPU - say the connections to shader engines the video was talking about - does come with a cost. And that cost is probably power, given the bandwidth requirements - a lot of power.
lilhasselhofferSo we are clear, my optimism lies with AMD pumping out a dual chip version that works in the middle segment and blows monolithic silicon out of the water. If AMD is rational they give most of that cost savings back to the customer, and sell to the huge consumer base that wants more power without having to take out a mortgage to do so.
Unless there is some new breakthrough development that makes everything simple, this makes no sense to do in mid segment. The added complexity and overhead starts justifying itself when die sizes get very large. In practice - when die sizes are nearing either the reticle limit or yield limits. And this is not a cheap solution for a GPU.

The savings might not be quite as significant as you imagine. RX7000 series is a chiplet design and while cheaper than Nvidia it is basically relatively same as it has always been. These did not end up dominating the generation due to being cheap.
Posted on Reply
#29
Vya Domus
londisteConsumer - read: gaming - has different requirements than datacenter compute monsters. Mainly - latency in whatever kind of cooperation the chips are going to be doing. Memory access is wide and expensive especially when having to go to another GPU's VRAM etc. Same as it has always been.
GPUs in general are resilient to latency by design, that's why the memory chips used in GPUs have always been way faster than chips used for system RAM at the cost of much higher latency, it just doesn't matter as much. There is nothing special that those datacenter GPUs do with regards to mitigating latency.
Posted on Reply
#30
londiste
FreedomEclipseIve been saying this since the dual core days when I was running my AMD64 X2 3800+ (939, Manchester, clocked at 2.66Ghz...) If we could have dual core CPUs then why couldnt we have dual core GPUs? Then we went down the road of having two GPUs on one PCB... those all died. crossfire died, SLi died and now we are finally going somewhere.

Its confused me for the longest time why we couldnt have a dual or quad chiplet GPu design. I thought that all the knowledge and experience that AMD gained working on desktop and server chips would carry over to the GPu side of things but it never did till now.
Because CPU cores are doing their work independently. Coordinating stuff between them is not that easy and native. GPUs tend to work differently.

But, from a different perspective, multiGPU is dual or quad or whatever GPUs. Unfortunately nobody really has gone through the effort to write software to run on these. DX12 and implicit/explicit allow doing it just fine in theory :)
Vya DomusGPUs in general are resilient to latency by design, that's why the memory chips used in GPUs have always been way faster than chips used for system RAM at the cost of much higher latency, it just doesn't matter as much. There is nothing special that those datacenter GPUs do with regards to mitigating latency.
I did not mean memory. More internal than that - cache and work coordination basically. Ideally we want what today, under 10ms per frame? Any delays will add up quickly.
Workloads in datacenter generally do not care about that aspect as much.
Posted on Reply
#31
LabRat 891
londisteWhat exactly makes this rumor even remotely plausible?

Multi-GPU resource coherency over InfinityFabric has existed since at least RDNA 2.
Posted on Reply
#32
RUSerious
londisteBecause CPU cores are doing their work independently. Coordinating stuff between them is not that easy and native. GPUs tend to work differently.
And GPUs have to deal with some stubbornly "single threaded" code in DX3D (still, even DX12). Data center GPUs don't have this drawback.
Posted on Reply
#33
Vya Domus
londisteI did not mean memory. More internal than that - cache and work coordination basically. Ideally we want what today, under 10ms per frame? Any delays will add up quickly.
Workloads in datacenter generally do not care about that aspect as much.
What I am saying is GPU workloads in general do not care about that because the execution model hides latencies very well. The way it works is you're executing the same sequence of instructions over some number of data items, usually orders of magnitude more than what the GPU cores can process in one clock cycle and what you are interested in is how fast can that batch be processed, not how fast every item can be processed, in which case latency would have mattered more.

Because every data item takes the same time to process and you always know which data items come next it's very easy to hide memory access like so :

... ->data transfer -> processing
--------------------> data transfer -> processing
------------------------------ ----- -> data transfer -> ....

So if you increase the latency but the execution time is always more that what it takes to initiate the next data transfer it makes no real difference to the overall performance, it's far more important how many memory transfers you can initiate than how long they each take.
Posted on Reply
#34
londiste
LabRat 891
Multi-GPU resource coherency over InfinityFabric has existed since at least RDNA 2.
That is two RX6800s on a single PCB. Chiplet design as in the start of this thread kind of implies a bit more integration than that :)

I mean everyone seems to be expecting something more like MI300X .But for some reason AMD has not done even a gaming demonstration with something like that. Nor has Nvidia. Both have been doing chiplets in data center for several generations now.
Posted on Reply
#35
LabRat 891
londisteThat is two RX6800s on a single PCB. Chiplet design as in the start of this thread kind of implies a bit more integration than that :)
It was entirely to point out that:
InfinityFabric has already been used for 'bonding' the resources of 2 (otherwise discrete) GPUs into one, and that it's viable for more than strictly Machine Learning.
Potentially, both dies sharing package and membus, would mean less latency.
londisteI mean everyone seems to be expecting something more like MI300X .But for some reason AMD has not done even a gaming demonstration with something like that. Nor has Nvidia. Both have been doing chiplets in data center for several generations now.
Those Radeon Instinct chiplets are connected via IF, too. It's the same concept/technology, differing scale and implementation.

I think this 'rumor' has some basis in reality, given how AMD's been using its technologies: Modularity and Scalability.
Posted on Reply
#36
londiste
Yes, Infinity Fabric and related interconnects exist and work just fine. Is interconnect type really the problem here?
Shared membus typically does not mean less latency.

Edit:
If you meant W6800X Duo with the memory bus sharing then that is not the case. On that card, both 6800 GPUs manage 32GB VRAM for the total of 64GB on the card. Yes, the interconnect makes the connection between GPUs faster but that is not membus sharing.
Posted on Reply
#37
LabRat 891
londisteYes, Infinity Fabric and related interconnects exist and work just fine. Is interconnect type really the problem here?
Shared membus typically does not mean less latency.

Edit:
If you meant W6800X Duo with the memory bus sharing then that is not the case. On that card, both 6800 GPUs manage 32GB VRAM for the total of 64GB on the card. Yes, the interconnect makes the connection between GPUs faster but that is not membus sharing.
Radeon Instinct is not a 'for-graphics' product, W6800 X Duo was.
I was merely pointing out that the technology has already been demonstrated on existing 'for-graphics' Navi21 silicon (non-MI/AI use hardware, using the W6800 X Duo).
Meaning, that this hypothetical dual-die Navi4x is not entirely unrealistic.



My comments on latency and membus were explicitly pointed towards the topic's prospective dual-die GPU.

Physical proximity decreases latency; sharing a membus rather than having to communicate over 2 (distant) memory buses, will be lower latency.
(Ex. Dual 2.6Ghz Troy Opteron 252s, ea. w/ single-channel RAM vs. a single 2.6Ghz dual-core Toledo FX-60 w/ dual-channel RAM) P.S. InfinityFabric is a superset of HyperTransport
Posted on Reply
#38
DemonicRyzen666
lilhasselhofferAs has been said, the silicon is dual chiplet but the interconnect would render it as a single chip visible to the OS. It's done through the magic of extremely high speed interconnections...and thus requires precision made substrates to bond to and a decent internal configuration that handles the -relatively- minor communication load losses so that you have 0.9x the power of each GPU visible to the OS as a 1.8x times improvement over the single core...though if they are truly sharing a single memory space it'd be closer to physics limitation and 0.98 or the like.

The issue with crossfire and sli is that both needed to connect over a bus, so step one was to figure out which GPU was doing what. You then lost time in communication and population of data to the correct memory space...and it all leaves a sour taste in the mouth when 2 mid-level GPUs we 1+1 = 1.5, whereas you could instead buy a GPU two levels higher and get 1.8x the performance instead of 1.5...and things worked without all of the driver shenanigans. That, for the record, is why crossfire and sli died. It'd be nice to see that come back...but with 3060 GPUs still selling for almost $300 on ebay I cannot see the benefit to pursuing it.

So we are clear, my optimism lies with AMD pumping out a dual chip version that works in the middle segment and blows monolithic silicon out of the water. If AMD is rational they give most of that cost savings back to the customer, and sell to the huge consumer base that wants more power without having to take out a mortgage to do so. Good QHD gameplay, and 120hz+ 1080p performance, with compromise 4k performance at the $500 mark would be something for everyone to love without really getting into a fight with Nvidia. Not sure where Intel comes out in all of this, but Battlemage has to be less bumpy than Arc. Hopefully they can also release something, that makes the gigantic and under served middle market happy...and gives the Ngreedia side of Nvidia a black eye.


All of this is fluff anyways...so why not indulge in a fantasy?
1. Dx12 is ten times more relienant on drivers than dx11 was espically with the upscallers now..
Just look at how much intel ARC GPUs improvdd from drivers alone.

2. People forgot about that mGPU is already feesable???

3 All of the RNDA1/2/3 supports mGPU without any phyical concections

4. The real probelm with mGPU. as I've found out is that mGPU is heavily CPU bound/bottlenecked on most games that have it properly done. Unlike a few older games that moved to vuklan for it.

6. Stop comapling about scaling fof gpus, becuase even dual cores cpus didn't get 200% increases till things started using more than 2 cores. beside gous don't always output every frame they can.nvidia literally has a fast option in the control panel to dump frames the gpu knows the monitor can't render fast enough. Its udner ths v-sync settings. Frame geration does the oppoiaits of that setting


7. All upscaling requires far more work in drivers, patches for games, updates & intergation to the game than sl.i or crossfire ever needed. (Let alonw the A.I that nvidia has to run it through before makw a driver or a patch fof a game)
Posted on Reply
#39
TechLurker
mate123What's next? "AI Imagines AMD "Navi 58" RDNA 5 to be a Quad-Chiplet GPU"?
FWIW, several CPU and GPU makers already state they'll be using AI to help speed up designing their next-gen CPUs and GPUs. So while different than you imply, there's every possibility that chipmakers might upload their designs into an AI-powered designer and try out any of the possible suggested optimal setups.
Posted on Reply
#40
kapone32
DemonicRyzen6661. Dx12 is ten times more relienant on drivers than dx11 was espically with the upscallers now..
Just look at how much intel ARC GPUs improvdd from drivers alone.

2. People forgot about that mGPU is already feesable???

3 All of the RNDA1/2/3 supports mGPU without any phyical concections

4. The real probelm with mGPU. as I've found out is that mGPU is heavily CPU bound/bottlenecked on most games that have it properly done. Unlike a few older games that moved to vuklan for it.

6. Stop comapling about scaling fof gpus, becuase even dual cores cpus didn't get 200% increases till things started using more than 2 cores. beside gous don't always output every frame they can.nvidia literally has a fast option in the control panel to dump frames the gpu knows the monitor can't render fast enough. Its udner ths v-sync settings. Frame geration does the oppoiaits of that setting


7. All upscaling requires far more work in drivers, patches for games, updates & intergation to the game than sl.i or crossfire ever needed. (Let alonw the A.I that nvidia has to run it through before makw a driver or a patch fof a game)
Multi Gpu works fine. You can even mix whatever card you want in a system. As far as Gaming goes, once Total War stopped supporting Multi GPU. It did not become as important to me.
Posted on Reply
#41
londiste
LabRat 891Physical proximity decreases latency; sharing a membus rather than having to communicate over 2 (distant) memory buses, will be lower latency.
(Ex. Dual 2.6Ghz Troy Opteron 252s, ea. w/ single-channel RAM vs. a single 2.6Ghz dual-core Toledo FX-60 w/ dual-channel RAM) P.S. InfinityFabric is a superset of HyperTransport
Sharing a membus would still have a penalty compared to a dedicated membus from a single GPU perspective, no?
Basically you would envision something similar to chiplet Zen where Memory controller is separate from compute and connected to both/all compute dies via IF?
DemonicRyzen6661. Dx12 is ten times more relienant on drivers than dx11 was espically with the upscallers now..
Just look at how much intel ARC GPUs improvdd from drivers alone.
Isn't that the other way around? DX12 is a lower-level API than say DX11. Developer needs to do more heavy lifting there and API implementation in drivers does less and has less of an impact.
Intel ARC is a strange example here. ARC had DX12 drivers working quite well when they came out. It was DX11, DX9 etc that were a big problem. The thing with this example is though that it does not really say too much about drivers and APIs, creating a driver stack from relatively scratch is a daunting task and Intel had to prioritize and catch up later.
DemonicRyzen6663 All of the RNDA1/2/3 supports mGPU without any phyical concections
4. The real probelm with mGPU. as I've found out is that mGPU is heavily CPU bound/bottlenecked on most games that have it properly done. Unlike a few older games that moved to vuklan for it.
This really isn't about cards but APIs and what developers choose to implement. SLI and Crossfire were a convenient middle ground where Nvidia and AMD respectively held some hands to make things easier.
Both current frontrunners in graphics APIs - DX12 and Vulkan - can work with multiple GPUs just fine. The problem is that writing this seems to be more trouble than it is worth so far.
Posted on Reply
#42
Jism
It's 3DFX all over again ....



No without the jokes, that chiplet is likely being different then SLI. You have no PCI-E constraint as the latency's and bandwidth will exceed 100 times more then what SLI provides. So no more micro stuttering in the first place.

I think it's a double die working as one. Remember the whole idea of chiplets is to build the latest tech for GPU's / CPU's but things like an IO die on older tech, as they don't scale that well or add too much cost in regards of how many bad die's are out there.

The sole reason why Ryzen or EPYC is so damn efficient is that it simply creates CCD's that can be re-used instead of throwing out half of the chip like that (in the older case of Intel).

Nvidia is still doing monolithic die's with a huge cost per wafer and efficiency thrown out of the window (TDP: 600W coming).
Posted on Reply
#43
londiste
JismThe sole reason why Ryzen or EPYC is so damn efficient is that it simply creates CCD's that can be re-used instead of throwing out half of the chip like that (in the older case of Intel).

Nvidia is still doing monolithic die's with a huge cost per wafer and efficiency thrown out of the window (TDP: 600W coming).
Which efficiency do you mean?

What chiplets bring is efficient manufacturing - smaller individual dies, better yields. The main reason for chiplet design making sense is the additional cost in packaging being smaller than what it would cost to actually go for the monolithic die. Not the only reason of course, reuse is another one, exemplified by Ryzen's IO Die. Or straight up limits in manufacturing, reticle limit for example.

When it comes to power efficiency and performance benefit though, chiplet design is a straight up negative. Some of this can be mitigated but not completely negated.
Posted on Reply
#44
kapone32
londisteSharing a membus would still have a penalty compared to a dedicated membus from a single GPU perspective, no?
Basically you would envision something similar to chiplet Zen where Memory controller is separate from compute and connected to both/all compute dies via IF?
Isn't that the other way around? DX12 is a lower-level API than say DX11. Developer needs to do more heavy lifting there and API implementation in drivers does less and has less of an impact.
Intel ARC is a strange example here. ARC had DX12 drivers working quite well when they came out. It was DX11, DX9 etc that were a big problem. The thing with this example is though that it does not really say too much about drivers and APIs, creating a driver stack from relatively scratch is a daunting task and Intel had to prioritize and catch up later.
This really isn't about cards but APIs and what developers choose to implement. SLI and Crossfire were a convenient middle ground where Nvidia and AMD respectively held some hands to make things easier.
Both current frontrunners in graphics APIs - DX12 and Vulkan - can work with multiple GPUs just fine. The problem is that writing this seems to be more trouble than it is worth so far.
The first DX12 Multi GPU capable Game showed the potential but Ashes of the Singularity was not exciting enough in Gameplay to give the technology it's exposure. In that Game it does not matter what GPU you have and unlike Games like Jedi Fallen Order the implementation is not dependent on Crossfire/Sli drivers. The absolute best implementation of Multi GPU was Crossfire for Polaris Era cards. If it did not work it did not turn on the other GPU. I was a Total War addict at the time so the 90% improvement in performance mattered.
Posted on Reply
#45
DemonicRyzen666
JismI think it's a double die working as one. Remember the whole idea of chiplets is to build the latest tech for GPU's / CPU's but things like an IO die on older tech, as they don't scale that well or add too much cost in regards of how many bad die's are out there.
Stutter has been proven to be a game engine problem, & DX12 shader compiler issue & not a multi-GPU problem. There's plenty of Thread on this forum of people complaining about stuttering when using only a single cards there's was like 8 pages worth of complaints last time I looked.

So, the Multi-GPU causing stuttering is a complete MYTH!
Posted on Reply
#46
iO
Well, Navi31 has 5.3 TB/s bandwith between GCD and MCDs on a regular fiberglass substrate.
And 2.5 TB/s of bandwith are enough for Apple to glue two dies together so the Ultra SKU functions basically as a monolithic chip.
If the latency is low enough this rumor isnt even that unrealistic.
Posted on Reply
#47
Franzen4Real
TechLurkerFWIW, several CPU and GPU makers already state they'll be using AI to help speed up designing their next-gen CPUs and GPUs. So while different than you imply, there's every possibility that chipmakers might upload their designs into an AI-powered designer and try out any of the possible suggested optimal setups.
nVidia used AI for the 40 series to optimize chip layout, and has open sourced the tool for any chip designer to use. There was a TPU article about it before 40 series launched but I'm having trouble finding it. Here is a link to the tech though--

developer.nvidia.com/blog/autodmp-optimizes-macro-placement-for-chip-design-with-ai-and-gpus/
Posted on Reply
#48
kapone32
DemonicRyzen666Stutter has been proven to be a game engine problem, & DX12 shader compiler issue & not a multi-GPU problem. There's plenty of Thread on this forum of people complaining about stuttering when using only a single cards there's was like 8 pages worth of complaints last time I looked.

So, the Multi-GPU causing stuttering is a complete MYTH!
The only Game I have that loves to Compile Shaders is Forza.
Posted on Reply
#49
AnotherReader
Franzen4RealnVidia used AI for the 40 series to optimize chip layout, and has open sourced the tool for any chip designer to use. There was a TPU article about it before 40 series launched but I'm having trouble finding it. Here is a link to the tech though--

developer.nvidia.com/blog/autodmp-optimizes-macro-placement-for-chip-design-with-ai-and-gpus/
That article doesn't mention Ada, i.e. the 40 series. They also compare PPA with commercial tools rather than hand optimized layouts. Given that they didn't compare against the latter, I suspect humans still outperform AI for this.
Posted on Reply
#50
bitsandboots
kapone32This is cool. AMD Crossfire was much easier to run than SLI. AMD already wrote the software to make it run without the need of a dedicated connector. This could be huge for a Dual GPU based system. If it is baked into the driver it would be great.
That's not how I remember it
Crossfire was easier to purchase due to not needing special, sometimes nvidia-only boards.
But getting crossfire to actually do its job and improve performance was a mess. SLI was not much better, but I remember with AMD I had to use specific versions of their drivers because they'd regress per game.
Posted on Reply
Add your own comment
Dec 17th, 2024 22:25 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts