• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD Patents Chiplet-based GPU Design With Active Cache Bridge

Joined
Jan 3, 2021
Messages
3,447 (2.45/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
AMD may be experimenting with ways to separate processing cores, built on the latest tech they can get their hands on, and cache. The cache could be built using second best - now GlobalFoundries' 12mm, later something like TSMC 7nm. Static RAM doesn't scale well with node shrinks - at least the surface area doesn't scale well, I don't know about performance and power. So the cache is possibly a good candidate for being offloaded to a cheaper die, the latency would obviously go up but maintaining cache coherence would be an easier task, higher latency can also be mitigated with increased size, and AMD needs to keep buying something from GloFo anyway.
 
Joined
Jul 13, 2016
Messages
3,258 (1.07/day)
Processor Ryzen 7800X3D
Motherboard ASRock X670E Taichi
Cooling Noctua NH-D15 Chromax
Memory 32GB DDR5 6000 CL30
Video Card(s) MSI RTX 4090 Trio
Storage Too much
Display(s) Acer Predator XB3 27" 240 Hz
Case Thermaltake Core X9
Audio Device(s) Topping DX5, DCA Aeon II
Power Supply Seasonic Prime Titanium 850w
Mouse G305
Keyboard Wooting HE60
VR HMD Valve Index
Software Win 10
Joined
Nov 4, 2005
Messages
11,965 (1.72/day)
System Name Compy 386
Processor 7800X3D
Motherboard Asus
Cooling Air for now.....
Memory 64 GB DDR5 6400Mhz
Video Card(s) 7900XTX 310 Merc
Storage Samsung 990 2TB, 2 SP 2TB SSDs, 24TB Enterprise drives
Display(s) 55" Samsung 4K HDR
Audio Device(s) ATI HDMI
Mouse Logitech MX518
Keyboard Razer
Software A lot.
Benchmark Scores Its fast. Enough.
The biggest gains will be in clock speed, multiple domains for multiple chiplets and each can be engineered for IPC, clock speed, and or latency as required.

Imagine 4 chiplets with 4Ghz boost speeds, a 2Ghz cache that is massively parallel with compression technology, a couple tiny chiplets for video encode/decode and for low power applications.

Now add on the stacked die tech that has been learned to create a parallel pipeline for pure vector math for Ray tracing stacked on each of the main 4 chiplets that can read and write to caches on the primary die. Ray tracing with the only performance penalty being extra heat and a fraction of the latency.
 
Joined
May 3, 2018
Messages
2,881 (1.21/day)
Possibly a glimpse of RDNA4's future, doubt we'll see this in RDNA3. Mostly likely will go up against Hopper which was delayed and replaced by Lovelace for next gen.
 
Joined
Jul 13, 2016
Messages
3,258 (1.07/day)
Processor Ryzen 7800X3D
Motherboard ASRock X670E Taichi
Cooling Noctua NH-D15 Chromax
Memory 32GB DDR5 6000 CL30
Video Card(s) MSI RTX 4090 Trio
Storage Too much
Display(s) Acer Predator XB3 27" 240 Hz
Case Thermaltake Core X9
Audio Device(s) Topping DX5, DCA Aeon II
Power Supply Seasonic Prime Titanium 850w
Mouse G305
Keyboard Wooting HE60
VR HMD Valve Index
Software Win 10
Doesn't it say MCM adds as much as +1GHz!

Correct. By separating the CPU cores into a separate die you gain the ability to further bin which CPU die ends up on which CPU. This is how AMD is able to have it's 16 core 5950X that consumes less power than it's 12 core while also using less power. The 5950X is about 28% more power efficient than other Ryzen 5000 series CPUs through binning alone. AMD likely decided to go for efficiency instead of extra clocks for two reasons 1) Intel doesn't have anything competitive to it's 12 and 16 core mainstream CPUs 2) The power consumption goes up much faster above the sweet spot. Increasing the GHz would improve ST performance but at a cost. AMD likely calculated that given Intel's current prospects, it would be better to focus on efficiency.
 
Joined
Jul 12, 2017
Messages
32 (0.01/day)
System Name ROU-Think-Fast
Processor AMD Ryzen 7 5800X
Motherboard B550 AORUS PRO V2 (rev. 1.0)
Cooling Custom (2 x Alphacool NexXxoS XT45, front/top, Alphacool Eisblock Aurora Acryl GPX-N RTX 3090/3080 )
Memory 4x8 GB Kingston Hyper X KHX3466C16D4/8GX (B-Die) @ 3600, C16-16-16-32
Video Card(s) RTX 3080 10GB
Storage ADATA SX8200 Pro 1 TB
Display(s) Acer Predator XB271HU
Case Fractal Design Meshify 2
Power Supply EVGA 750W Gold
This is mostly true altought less and less true as there are more and more technique that reuse generated data. This is also why SLI/Crossfire is dead. The latency to move these data was just way too big. Temporal AA, ScreenSpace reflection, etc...

Can't you have one chiplet dealing with frame/scene level calculations after you've powered through the more easily parallelizable tasks? As in 1 Bigger (perhaps on the hub chip to reduce latency to the cache) + N Small(er)?
 
Joined
Jun 3, 2010
Messages
2,540 (0.48/day)
Increasing the GHz would improve ST performance but at a cost.
You approach from a cpu stand point. On a gpu, the ST isn't the only factor, internal bandwidth is a major proponent. The bandwidth is a lot on a gpu however bandwidth per CU needs a lot of use to leverage fully, since the memory unit is external to the chip. Running it faster solves that problem.
Bets: 3.5GHz gpus over the horizon, or not?
 
Joined
Jul 13, 2016
Messages
3,258 (1.07/day)
Processor Ryzen 7800X3D
Motherboard ASRock X670E Taichi
Cooling Noctua NH-D15 Chromax
Memory 32GB DDR5 6000 CL30
Video Card(s) MSI RTX 4090 Trio
Storage Too much
Display(s) Acer Predator XB3 27" 240 Hz
Case Thermaltake Core X9
Audio Device(s) Topping DX5, DCA Aeon II
Power Supply Seasonic Prime Titanium 850w
Mouse G305
Keyboard Wooting HE60
VR HMD Valve Index
Software Win 10
You approach from a cpu stand point. On a gpu, the ST isn't the only factor, internal bandwidth is a major proponent. The bandwidth is a lot on a gpu however bandwidth per CU needs a lot of use to leverage fully, since the memory unit is external to the chip. Running it faster solves that problem.
Bets: 3.5GHz gpus over the horizon, or not?
I'd say it's equally as possible that we see MCM GPU architectures that simply target the frequency sweetspot and spend any extra power budget add more cores, cache, ect. It really depends though, for all we know AMD or Nvidia could design their GPU chiplets to clock very high and thus the sweetspot would follow suite. I'm not knowledgeable enough on the topic to say to the extent that Nvidia / AMD and TSMC can influence ideal GPU clockspeed based on design / node.
 
Joined
Jun 3, 2010
Messages
2,540 (0.48/day)
I'd say it's equally as possible that we see MCM GPU architectures that simply target the frequency sweetspot and spend any extra power budget add more cores, cache, ect. It really depends though, for all we know AMD or Nvidia could design their GPU chiplets to clock very high and thus the sweetspot would follow suite. I'm not knowledgeable enough on the topic to say to the extent that Nvidia / AMD and TSMC can influence ideal GPU clockspeed based on design / node.
Me neither, although some would consider me an old timer.
Gpus, do associate with high frequency because the power cost is already paid for. Remember Hawaii series? AMD never integrated tiled 'buffered' rasterization up until Vega and thus the memory interface never slowed down since it was always running in immediate mode whereas Nvidia can keep tabs at various memory clocks.
It could improve utilization if the shaders request at a higher rate - gpus are throughput oriented, after all...
 
Joined
Jan 8, 2017
Messages
9,403 (3.29/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
Imagine 4 chiplets with 4Ghz boost speeds

There is going to be a long time before we'll see that if ever. Every kind of chip seems to start scaling horribly past the 3 Ghz mark, a GPU in particular will be horrendous efficiency wise at those kinds of speeds.
 
Joined
Apr 16, 2019
Messages
632 (0.31/day)
So for those of you waiting for AMD to do to nVidia what they did to Intel....

Here it is.

Sounds like RDNA 3 will be an interesting generation for sure!
What they did to Intel? You mean, as soon as they got competitive, they also became both more expensive and hard to get in the first place - what a fantastic prospect for the already beleaguered graphics cards market indeed!
 
Joined
Mar 30, 2021
Messages
25 (0.02/day)
System Name Dell Alienware Aurora R10
Processor Ryzen 5600x
Motherboard Dell 570 or B550
Cooling Alienware AIO sandwiched between two Corsair ML120 Pro's
Memory G.SKILL Ripjaws V Series 32GB cl16
Video Card(s) Radeon RX 6800 XT
Storage Western Digital WD BLACK SN750 NVMe M.2 2280 2TB
Display(s) GIGABYTE G34WQC 34" 144Hz (plus 2 Dell 19" 1280x1024 to flank it)
Case Alienware Auraor r10
Audio Device(s) onboard
Power Supply Dell 1KW
Mouse Logitech Trackman Marble
Keyboard blue glowy thinhy 104 key KB
What they did to Intel? You mean, as soon as they got competitive, they also became both more expensive and hard to get in the first place - what a fantastic prospect for the already beleaguered graphics cards market indeed!
You DO realize that this is market forces at work right?

Demand outstripped supply so far that even though TSMC is running FLAT OUT they still cannot keep up!
They now spending 100 BILLION DOLLATRS over the next three years to build more plants so they can deal with the demand.

Then you have people buying them by the millisecond so fast with their bots that you cannot buy them through normal channels making a bad situation even worse.
But hey they do it because they can make 25 to 50% profit selling on ebay and through the gray market.

AMD made the decision to focus on supplying computer manufacturers and not direct sellers like newegg and amazon.
I just got a 6800xt and 5600x from Dell.
Placed my order, waited a month and here it is! AND I got both for what appears to be MSRP or close to it.


Be sure you are looking at the BIG PICTURE before lambasting people and companies for things that are out of their control.
 
Last edited:
Joined
Jun 3, 2010
Messages
2,540 (0.48/day)
Every kind of chip seems to start scaling horribly past the 3 Ghz mark, a GPU in particular will be horrendous efficiency wise at those kinds of speeds.
This could bring a split multiplier to run internal caches faster than the gpu. Don't dismiss it, the scaling isn't linear because memory is external and not helpful in the gpu pipeline flow directly - gpu speed, however, is. Nothing outside of cache speed changes that(maybe texture caching, too).
 
Joined
Jan 8, 2017
Messages
9,403 (3.29/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
This could bring a split multiplier to run internal caches faster than the gpu. Don't dismiss it, the scaling isn't linear because memory is external and not helpful in the gpu pipeline flow directly - gpu speed, however, is. Nothing outside of cache speed changes that(maybe texture caching, too).

Caches are power hogs, very high energy density per area, for that reason they usually run slower than the processor itself. The only portions of memory that run as fast the processor are the registers, everything else, including L1 caches typically run slower.
 
Joined
Jun 3, 2010
Messages
2,540 (0.48/day)
Caches are power hogs, very high energy density per area, for that reason they usually run slower than the processor itself. The only portions of memory that run as fast the processor are the registers, everything else, including L1 caches typically run slower.
Well, guess what consumes power at an even higher rate than the caches - memory devices. The futility with saving power by cutting the effective rate is self explanatory. There is a way that is uses buffering to reduce accesses to memory and texture caching to supplant memory by sram. It ties with actual data flow across the die whereas the memory devices don't solve any bottlenecks, they are last level.
I'm not well versed enough, but there is no free lunch. SRAM offers much more than its substitutes.
 
Joined
Jan 8, 2017
Messages
9,403 (3.29/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
Well, guess what consumes power at an even higher rate than the caches - memory devices. The futility with saving power by cutting the effective rate is self explanatory. There is a way that is uses buffering to reduce accesses to memory and texture caching to supplant memory by sram. It ties with actual data flow across the die whereas the memory devices don't solve any bottlenecks, they are last level.
I'm not well versed enough, but there is no free lunch. SRAM offers much more than its substitutes.
Yes access to global memory is very inefficient power wise and cache hits improves that. But the problem is caches live on die and need to be cooled and eat away at the power budget of the chip.

1617720116916.png


Remember how the Infinity cache is placed around the CUs and not between them as to how you'd expect it to be ? I think it was a deliberate choice to place this huge chunk of cache on the extremities of the chip to reduce heat spots.
 
D

Deleted member 205776

Guest
So for those of you waiting for AMD to do to nVidia what they did to Intel....

Here it is.

Sounds like RDNA 3 will be an interesting generation for sure!
Didn't they say they'll take this chiplet approach on CDNA first and not RDNA?

they also became both more expensive and hard to get in the first place
Wasn't the case until Zen 3 and this chipocalypse... Zen 2 swept the floor with Intel and it was a real market disruptor.

All AMD did was force Intel to get off their ass and make reasonable products at a more reasonable price, and even force down the price on their 10th gens, which is always good for everyone. If it weren't for them I wouldn't have a 12 core in my system right now, and would probably have to make do with 6 cores from Intel, on my old 8700.

Now, if they could make Ngreedia do the same, that'd be great... but I'm not having high hopes here. Unlike Intel, NVIDIA has never been sleeping. They are a worthy competitor to AMD. We'll see how this approach works on CDNA first - doubt the next RDNA gen will have this. Maybe the one after.
 
Last edited by a moderator:
Joined
Jun 3, 2010
Messages
2,540 (0.48/day)
Yes access to global memory is very inefficient power wise and cache hits improves that. But the problem is caches live on die and need to be cooled and eat away at the power budget of the chip.

View attachment 195489

Remember how the Infinity cache is placed around the CUs and not between them as to how you'd expect them to be ? I think it was a deliberate choice to place this huge chunk of cache on the extremities of the chip to reduce heat spots.
Thanks for citing fancy references. I agree with most points, but I think we are being repetitive.
 
Top