• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Samsung Develops GDDR6W Memory Standard: Double the Bandwidth and Density of GDDR6 Through Packaging Innovations

Joined
Feb 1, 2019
Messages
3,667 (1.70/day)
Location
UK, Midlands
System Name Main PC
Processor 13700k
Motherboard Asrock Z690 Steel Legend D4 - Bios 13.02
Cooling Noctua NH-D15S
Memory 32 Gig 3200CL14
Video Card(s) 4080 RTX SUPER FE 16G
Storage 1TB 980 PRO, 2TB SN850X, 2TB DC P4600, 1TB 860 EVO, 2x 3TB WD Red, 2x 4TB WD Red
Display(s) LG 27GL850
Case Fractal Define R4
Audio Device(s) Soundblaster AE-9
Power Supply Antec HCG 750 Gold
Software Windows 10 21H2 LTSC
I wonder if this will be leveraged for double the speed and capacity, or just leveraged for the currently planned speeds and capacities by using half the amount of chips, saving money but likely not passing that saving on to consumers...
The latter of course as SMR and QLC has proven.
 
Joined
Oct 12, 2005
Messages
712 (0.10/day)
I wasn't talking about RDNA3 using GDDR6W. I'm sorry if I gave that impression. I was just talking bandwidth in general.

I agree that each RDNA3 MCD has a 64-bit wide memory bus and it's difficult to fit more than 6 on a single package. However, future RDNA4 MCD can have 96-bit/128-bit wide memory bus. That's what GDDR6W is targeting anyway.



Moar bandwidth comes at an increased cost anyway. That's expected.

On another note, Nvidia would have benefitted a lot from something like this in 2020 with Ampere (in terms of VRAM size).

RTX 3080 came with 320-bit 19 GHz = 760 GB/sec bandwidth but only had 10 GB capacity.

Considering GDDR6 was at 14 GHz in Ampere, 448-bit 14 GHz = 784 GB/sec bandwidth with 14 GB VRAM size.
First, From the comments from AMD I wouldn't be surprised at all if NAVI 4x use the same MCD than Navi 3x (a bit like on Ryzen where they reuse the same I/O die for multiple generation).

You can view the Gamer Nexus Video on chiplets, AMD explain it pretty well. The design of the memory controller and other stuff is hard, take a lot of time and is boring for not so much gain anyway. The fact that they will be able to reuse it for the next gen will probably allow them to ship it earlier. If those MCD already support GDDR7 and cache stacking, I don't see why they would update it for next gen.

Again, a misconception of those chip are that they increase the bandwidth. They doesn't really, They increase the bandwidth per chip, but not per bus size. You could just put the double amount of chips on your board (like on both side) and you would get the same bandwidth. They are really for packaging reason more than anything. It could be really useful to put 256 bit and 384 bit GPU into mobile. But It will still be cost prohibitive to do it on larger bus even you reduce by half the amount of chip. At this point HBM start to make sense.

Also they compare it there with HBM2E, but HBM3 is available and have way more bandwidth.

They could do super large bus on professional high end GPU, but it's cheaper and better to do HBM at that point.

Also GDDR7 is around the corner with speed going up to 36 gbps
 
Joined
Dec 17, 2011
Messages
359 (0.08/day)
They are really for packaging reason more than anything.

I understand. I thought more about it and your point makes sense..

If you're knowledgeable enough I have a question. AMD is already using advanced packaging technology in RDNA3 right... so why do you think they did not go for a HBM solution?
 
Last edited:
Joined
Oct 12, 2005
Messages
712 (0.10/day)
I understand. I thought more about it and your point makes sense..

If you're knowledgeable enough I have a question. AMD is already using advanced packaging technology in RDNA3 right... so instead of 6 MCDs with 96 MB cache, why not go for a HBM solution?
HBM use an interposer. This is a large piece of silicon and is quite expensive to produce. Think of a large chip under the main CCD + the HBM.

AMD on RDNA3 use an organic substrate like traditional GPU chip (Think like some kind of PCB) and they were able to really shrink the trace on it to allow all the connection. This is way cheaper and this is one of the enabler for chiplets GPU since GPU require way more connection than CPU.
 
Joined
Jan 3, 2021
Messages
3,606 (2.49/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
HBM use an interposer. This is a large piece of silicon and is quite expensive to produce. Think of a large chip under the main CCD + the HBM.

AMD on RDNA3 use an organic substrate like traditional GPU chip (Think like some kind of PCB) and they were able to really shrink the trace on it to allow all the connection. This is way cheaper and this is one of the enabler for chiplets GPU since GPU require way more connection than CPU.
RDNA3 is built using something more advanced than a usual substrate: the fan-out RDL. I commented on it here:
I don't know if it's good enough for routing the wires to a HBM stack, though. But AMD also uses some kind of buried silicon bridges (could be very similar to EMIB) for the HMB stacks on their Instinct GPU.

At this point it's very hard to say which applications are better suited for HBM and which are better for GDDR. Both are evolving, but packaging technology is evolving even faster. HBM requires giant memory controllers for multiple 1024-bit wide buses, meaning a lot of silicon. Also, bridges and similar stuff apparently take up a considerable amount of space on the chips that they connect.
 
Joined
Oct 12, 2005
Messages
712 (0.10/day)
RDNA3 is built using something more advanced than a usual substrate: the fan-out RDL. I commented on it here:
I don't know if it's good enough for routing the wires to a HBM stack, though. But AMD also uses some kind of buried silicon bridges (could be very similar to EMIB) for the HMB stacks on their Instinct GPU.

At this point it's very hard to say which applications are better suited for HBM and which are better for GDDR. Both are evolving, but packaging technology is evolving even faster. HBM requires giant memory controllers for multiple 1024-bit wide buses, meaning a lot of silicon. Also, bridges and similar stuff apparently take up a considerable amount of space on the chips that they connect.
you are correct but I tried to keep the explanation simple. it still organic vs being silicon.

I still think HBM require a silicon substrate. we have to remember that those MCD are connect by multiple Infinity fabrics link and those work by serializing the data to have fewer trace.

But one possibility in the future, could be that instead of stacking cache, we stack HBM. This way, the HBM would be on top of silicon and you could still use infinity fabrics and organic substrate to connect to the CGD.

But first, lets see the first chiplets GPU release and see how it perform.
 
Top