• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD Patents Chiplet-based GPU Design With Active Cache Bridge

Raevenlord

News Editor
Joined
Aug 12, 2016
Messages
3,755 (1.25/day)
Location
Portugal
System Name The Ryzening
Processor AMD Ryzen 9 5900X
Motherboard MSI X570 MAG TOMAHAWK
Cooling Lian Li Galahad 360mm AIO
Memory 32 GB G.Skill Trident Z F4-3733 (4x 8 GB)
Video Card(s) Gigabyte RTX 3070 Ti
Storage Boot: Transcend MTE220S 2TB, Kintson A2000 1TB, Seagate Firewolf Pro 14 TB
Display(s) Acer Nitro VG270UP (1440p 144 Hz IPS)
Case Lian Li O11DX Dynamic White
Audio Device(s) iFi Audio Zen DAC
Power Supply Seasonic Focus+ 750 W
Mouse Cooler Master Masterkeys Lite L
Keyboard Cooler Master Masterkeys Lite L
Software Windows 10 x64
AMD on April 1st published a new patent application that seems to show the way its chiplet GPU design is moving towards. Before you say it, it's a patent application; there's no possibility for an April Fool's joke on this sort of move. The new patent develops on AMD's previous one, which only featured a passive bridge connecting the different GPU chiplets and their processing resources. If you want to read a slightly deeper dive of sorts on what chiplets are and why they are important for the future of graphics (and computing in general), look to this article here on TPU.

The new design interprets the active bridge connecting the chiplets as a last-level cache - think of it as L3, a unifying highway of data that is readily exposed to all the chiplets (in this patent, a three-chiplet design). It's essentially AMD's RDNA 2 Infinity Cache, though it's not only used as a cache here (and for good effect, if the Infinity Cache design on RDNA 2 and its performance uplift is anything to go by); it also serves as an active interconnect between the GPU chiplets that allow for the exchange and synchronization of information, whenever and however required. This also allows for the registry and cache to be exposed as a unified block for developers, abstracting them from having to program towards a system with a tri-way cache design. There are also of course yield benefits to be taken here, as there are with AMD's Zen chiplet designs, and the ability to scale up performance without any monolithic designs that are heavy in power requirements. The integrated, active cache bridge would also certainly help in reducing latency and maintaining chiplet processing coherency.



View at TechPowerUp Main Site
 

Mussels

Freshwater Moderator
Joined
Oct 6, 2004
Messages
58,413 (7.96/day)
Location
Oystralia
System Name Rainbow Sparkles (Power efficient, <350W gaming load)
Processor Ryzen R7 5800x3D (Undervolted, 4.45GHz all core)
Motherboard Asus x570-F (BIOS Modded)
Cooling Alphacool Apex UV - Alphacool Eisblock XPX Aurora + EK Quantum ARGB 3090 w/ active backplate
Memory 2x32GB DDR4 3600 Corsair Vengeance RGB @3866 C18-22-22-22-42 TRFC704 (1.4V Hynix MJR - SoC 1.15V)
Video Card(s) Galax RTX 3090 SG 24GB: Underclocked to 1700Mhz 0.750v (375W down to 250W))
Storage 2TB WD SN850 NVME + 1TB Sasmsung 970 Pro NVME + 1TB Intel 6000P NVME USB 3.2
Display(s) Phillips 32 32M1N5800A (4k144), LG 32" (4K60) | Gigabyte G32QC (2k165) | Phillips 328m6fjrmb (2K144)
Case Fractal Design R6
Audio Device(s) Logitech G560 | Corsair Void pro RGB |Blue Yeti mic
Power Supply Fractal Ion+ 2 860W (Platinum) (This thing is God-tier. Silent and TINY)
Mouse Logitech G Pro wireless + Steelseries Prisma XL
Keyboard Razer Huntsman TE ( Sexy white keycaps)
VR HMD Oculus Rift S + Quest 2
Software Windows 11 pro x64 (Yes, it's genuinely a good OS) OpenRGB - ditch the branded bloatware!
Benchmark Scores Nyooom.
I'll pretend i understand this and just say "wooo progress!"
 
Joined
Oct 22, 2014
Messages
14,061 (3.83/day)
Location
Sunshine Coast
System Name Lenovo ThinkCentre
Processor AMD 5650GE
Motherboard Lenovo
Memory 32 GB DDR4
Display(s) AOC 24" Freesync 1m.s. 75Hz
Mouse Lenovo
Keyboard Lenovo
Software W11 Pro 64 bit
Joined
Jan 8, 2017
Messages
9,401 (3.29/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
The cache hierarchy is already something that programmers do not have to deal with directly, that mechanism is hidden from you.

So more MH/s?
Not really, hashing algorithms are memory bound, so unless you increase the memory bandwidth it's not gonna matter how many chiplets there are.
 
Joined
Sep 28, 2012
Messages
979 (0.22/day)
System Name Poor Man's PC
Processor AMD Ryzen 7 7800X3D
Motherboard MSI B650M Mortar WiFi
Cooling Thermalright Phantom Spirit 120 with Arctic P12 Max fan
Memory 32GB GSkill Flare X5 DDR5 6000Mhz
Video Card(s) XFX Merc 310 Radeon RX 7900 XT
Storage XPG Gammix S70 Blade 2TB + 8 TB WD Ultrastar DC HC320
Display(s) Xiaomi G Pro 27i MiniLED
Case Asus A21 Case
Audio Device(s) MPow Air Wireless + Mi Soundbar
Power Supply Enermax Revolution DF 650W Gold
Mouse Logitech MX Anywhere 3
Keyboard Logitech Pro X + Kailh box heavy pale blue switch + Durock stabilizers
VR HMD Meta Quest 2
Benchmark Scores Who need bench when everything already fast?
At first glance I find it quite "challenging" to feed all cores with data, there will be scenario that GPU cores could "starve". But there is CPU access in the schematic, maybe as a command prefetcher or just DMA. AMD already has R-BAR so the CPU could play a big portion here.

-= edited=-
Remind me of hUMA, it all makes sense now why are they waiting to bring this to new AM5 platform with DDR5 RAM.
 
Last edited:
Joined
Feb 3, 2017
Messages
3,732 (1.32/day)
Processor Ryzen 7800X3D
Motherboard ROG STRIX B650E-F GAMING WIFI
Memory 2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s) INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage 2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s) 42" LG C2 OLED, 27" ASUS PG279Q
Case Thermaltake Core P5
Power Supply Fractal Design Ion+ Platinum 760W
Mouse Corsair Dark Core RGB Pro SE
Keyboard Corsair K100 RGB
VR HMD HTC Vive Cosmos
Not really, hashing algorithms are memory bound, so unless you increase the memory bandwidth it's not gonna matter how many chiplets there are.
Sure it matters. As long as AMD has a 4+GB caching chiplet it'll be awesome for mining :D
 
Joined
Feb 20, 2019
Messages
8,205 (3.93/day)
System Name Bragging Rights
Processor Atom Z3735F 1.33GHz
Motherboard It has no markings but it's green
Cooling No, it's a 2.2W processor
Memory 2GB DDR3L-1333
Video Card(s) Gen7 Intel HD (4EU @ 311MHz)
Storage 32GB eMMC and 128GB Sandisk Extreme U3
Display(s) 10" IPS 1280x800 60Hz
Case Veddha T2
Audio Device(s) Apparently, yes
Power Supply Samsung 18W 5V fast-charger
Mouse MX Anywhere 2
Keyboard Logitech MX Keys (not Cherry MX at all)
VR HMD Samsung Oddyssey, not that I'd plug it into this though....
Software W10 21H1, barely
Benchmark Scores I once clocked a Celeron-300A to 564MHz on an Abit BE6 and it scored over 9000.
Oh, okay. I think I get it.

Infinitycache is Infinity Fabric for GPUs.

So rather than Infinity Fabric being a unified transport giving all CPU chiplets access to the memory controllers, Each GPU chiplet will have a baby/pseudo memory controller that seeds data into a massive shared L3 cache for all GPU chiplets too feed off.

Neat, probably. The move to chiplets will hurt overall IPC and efficiency slightly but it will move away from the single-biggest constraint GPUs have right now - manufacturing difficulties and yields on massive monolithic dies. You only have to look at the fact a 64C/128T Threadripper is available on a consumer/mainstream platform for the masses at $4000, whilst Intel is struggling so hard to get more than 24C in a processor that they'll charge $10-14K for the privilege and sell it only to server integrators as it's too much of a special snowflake to work in any non-proprietary mainstream platform using a regular, unified driver model.

AMD is shitting out 80mm² scalable chiplets at fantastic yields because of the small dies with 8C/16T and craploads of cache, whilst Intel's smallest 8C/16T part is 276mm² with zero scalability and half the cache.

Using the same silicon wafer yield calculator for both, AMD's gets ~696 sellable dies per wafer compared to Intel's ~161 sellable dies per wafer. Four times easier to make and the smaller die size also means that 92% of AMD's product is a flawless 8-core part, whilst around 25% of Intel's output needs to be harvested to make 6-core or worse.

So, if you take that example alone, GPU chiplets can't come soon enough.
 
Last edited:
Joined
Jul 16, 2014
Messages
8,196 (2.18/day)
Location
SE Michigan
System Name Dumbass
Processor AMD Ryzen 7800X3D
Motherboard ASUS TUF gaming B650
Cooling Artic Liquid Freezer 2 - 420mm
Memory G.Skill Sniper 32gb DDR5 6000
Video Card(s) GreenTeam 4070 ti super 16gb
Storage Samsung EVO 500gb & 1Tb, 2tb HDD, 500gb WD Black
Display(s) 1x Nixeus NX_EDG27, 2x Dell S2440L (16:9)
Case Phanteks Enthoo Primo w/8 140mm SP Fans
Audio Device(s) onboard (realtek?) - SPKRS:Logitech Z623 200w 2.1
Power Supply Corsair HX1000i
Mouse Steeseries Esports Wireless
Keyboard Corsair K100
Software windows 10 H
Benchmark Scores https://i.imgur.com/aoz3vWY.jpg?2
Ravenlord said:
Before you say it, it's a patent application; there's no possibility for an April Fool's joke on this sort of move.

So this is a delayed April Fool Article? j/k :roll: :p

I expect the patent trolls are already digging for that one line of code or whatever so they can sue.

Infinitycache is Infinity Fabric for GPUs
not like they can use the same name, that serves, essentially, the same function.
 
Joined
Feb 20, 2019
Messages
8,205 (3.93/day)
System Name Bragging Rights
Processor Atom Z3735F 1.33GHz
Motherboard It has no markings but it's green
Cooling No, it's a 2.2W processor
Memory 2GB DDR3L-1333
Video Card(s) Gen7 Intel HD (4EU @ 311MHz)
Storage 32GB eMMC and 128GB Sandisk Extreme U3
Display(s) 10" IPS 1280x800 60Hz
Case Veddha T2
Audio Device(s) Apparently, yes
Power Supply Samsung 18W 5V fast-charger
Mouse MX Anywhere 2
Keyboard Logitech MX Keys (not Cherry MX at all)
VR HMD Samsung Oddyssey, not that I'd plug it into this though....
Software W10 21H1, barely
Benchmark Scores I once clocked a Celeron-300A to 564MHz on an Abit BE6 and it scored over 9000.
not like they can use the same name, that serves, essentially, the same function.
That's what I was implying though, they're not the same function.
  • Infinity Fabric connects cores to memory controllers, and cores manage their cache.
  • Infinity cache connects cache to memory controllers, and cores manage their memory controllers.
I mean, sure - they both connect things which is the same function - but so do nails, tape, and string - yet those things are allowed to have different names? :p
 
Joined
Mar 30, 2021
Messages
25 (0.02/day)
System Name Dell Alienware Aurora R10
Processor Ryzen 5600x
Motherboard Dell 570 or B550
Cooling Alienware AIO sandwiched between two Corsair ML120 Pro's
Memory G.SKILL Ripjaws V Series 32GB cl16
Video Card(s) Radeon RX 6800 XT
Storage Western Digital WD BLACK SN750 NVMe M.2 2280 2TB
Display(s) GIGABYTE G34WQC 34" 144Hz (plus 2 Dell 19" 1280x1024 to flank it)
Case Alienware Auraor r10
Audio Device(s) onboard
Power Supply Dell 1KW
Mouse Logitech Trackman Marble
Keyboard blue glowy thinhy 104 key KB
So for those of you waiting for AMD to do to nVidia what they did to Intel....

Here it is.

Sounds like RDNA 3 will be an interesting generation for sure!
 
Joined
Dec 23, 2012
Messages
1,715 (0.40/day)
Location
Somewhere Over There!
System Name Gen2
Processor Ryzen R9 5950X
Motherboard Asus ROG Crosshair Viii Hero Wifi
Cooling Lian Li 360 Galahad
Memory G.Skill Trident Z RGB 64gb @ 3600 Mhz CL14-13-13-24 1T @ 1.45V
Video Card(s) Sapphire RX 6900 XT Nitro+
Storage Seagate 520 1TB + Samsung 970 Evo Plus 1TB + lots of HDD's
Display(s) Samsung Odyssey G7
Case Lian Li PC-O11D XL White
Audio Device(s) Onboard
Power Supply Super Flower Leadex SE Platinum 1000W
Mouse Xenics Titan GX Air Wireless
Keyboard Kemove Snowfox 61
Software Main: Gentoo+Arch + Windows 11
Benchmark Scores Have tried but can't beat the leaders :)
So more MH/s?
I dont think so. Look at the 6000 series. vs rtx 3000. rtx 3000 have higher memory bandwidth thats why they have more MH/S. Miners like memory speed vs Core speed
 
Joined
Apr 5, 2021
Messages
10 (0.01/day)
Location
Brazil - São Paulo
System Name Windows 10 PRO 21H1 modded by me
Processor AMD A10 7800
Motherboard Gigabyte GA-F2A88XM-D3HP
Cooling Air colled for a while
Memory 2x HyperX 8GB DDR 3
Video Card(s) Radeon RX580 8GB Power Color
Storage CT120BX500SSD1+ ST2000DM008-2FR102
Display(s) 1 Sansung 18,5" LCD + 1 LG 22" LCD
Case Deep Cool Tesseract modified by me
Audio Device(s) Power Amplifier made and developed by me
Power Supply Corsair CX 750M
Mouse Philips wireless
Keyboard Philips wireless
Software severals
Benchmark Scores @SysSoft Sandra 8.9 KPT
Oh, okay. I think I get it.

Infinitycache is Infinity Fabric for GPUs.

So rather than Infinity Fabric being a unified transport giving all CPU chiplets access to the memory controllers, Each GPU chiplet will have a baby/pseudo memory controller that seeds data into a massive shared L3 cache for all GPU chiplets too feed off.

Neat, probably. The move to chiplets will hurt overall IPC and efficiency slightly but it will move away from the single-biggest constraint GPUs have right now - manufacturing difficulties and yields on massive monolithic dies. You only have to look at the fact a 64C/128T Threadripper is available on a consumer/mainstream platform for the masses at $4000, whilst Intel is struggling so hard to get more than 24C in a processor that they'll charge $10-14K for the privilege and sell it only to server integrators as it's too much of a special snowflake to work in any non-proprietary mainstream platform using a regular, unified driver model.

AMD is shitting out 80mm² scalable chiplets at fantastic yields because of the small dies with 8C/16T and craploads of cache, whilst Intel's smallest 8C/16T part is 276mm² with zero scalability and half the cache.

Using the same silicon wafer yield calculator for both, AMD's gets ~696 sellable dies per wafer compared to Intel's ~161 sellable dies per wafer. Four times easier to make and the smaller die size also means that 92% of AMD's product is a flawless 8-core part, whilst around 25% of Intel's output needs to be harvested to make 6-core or worse.

So, if you take that example alone, GPU chiplets can't come soon enough.
hello yes i totally agree with your reasoning
 
Joined
Oct 12, 2005
Messages
703 (0.10/day)
The main issue with multicore/multithread/multi chips is how you get the modified data spread accross others chips. This is where the latency come from. The L3 cache in CPU is there for that specific roles.

Let say you modify some data. You will need to have the updated data available for other execution units. The easy way is to save it to ram, and them read it back but this add huge latency.

They use the L3 cache for that, this save a lot of time but when you have multiple L3 cache, you need to have mechanism that detect if the data is in another L3 cache and then collect it. (very simplified explanation)

Having it in the bridge is probably the best solution as it will be aware of all others chiplets. But, connecting that to each chiplets will add latency and will have reduced bandwidth. But chip design is all about compromise and making the best choice that give the best performance overall.

We will see
 
Joined
Apr 30, 2011
Messages
2,700 (0.55/day)
Location
Greece
Processor AMD Ryzen 5 5600@80W
Motherboard MSI B550 Tomahawk
Cooling ZALMAN CNPS9X OPTIMA
Memory 2*8GB PATRIOT PVS416G400C9K@3733MT_C16
Video Card(s) Sapphire Radeon RX 6750 XT Pulse 12GB
Storage Sandisk SSD 128GB, Kingston A2000 NVMe 1TB, Samsung F1 1TB, WD Black 10TB
Display(s) AOC 27G2U/BK IPS 144Hz
Case SHARKOON M25-W 7.1 BLACK
Audio Device(s) Realtek 7.1 onboard
Power Supply Seasonic Core GC 500W
Mouse Sharkoon SHARK Force Black
Keyboard Trust GXT280
Software Win 7 Ultimate 64bit/Win 10 pro 64bit/Manjaro Linux
So more MH/s?
AMDs new cache for RDNA2 reduced mining performance and me thinks this one isn't one to help that type of workloads either...
 
Joined
Jun 3, 2010
Messages
2,540 (0.48/day)
I think AMD is going to leverage Infinity Cache to compete with Nvidia because they have been behind in the cache bandwidth race since Maxwell.
AMD had been successively expanding the chip resources, albeit never found the medium to express what it can do unequivocally.
 
Joined
Dec 29, 2010
Messages
3,797 (0.75/day)
Processor AMD 5900x
Motherboard Asus x570 Strix-E
Cooling Hardware Labs
Memory G.Skill 4000c17 2x16gb
Video Card(s) RTX 3090
Storage Sabrent
Display(s) Samsung G9
Case Phanteks 719
Audio Device(s) Fiio K5 Pro
Power Supply EVGA 1000 P2
Mouse Logitech G600
Keyboard Corsair K95
I think AMD is going to leverage Infinity Cache to compete with Nvidia because they have been behind in the cache bandwidth race since Maxwell.
AMD had been successively expanding the chip resources, albeit never found the medium to express what it can do unequivocally.
Huh? Did you even read the OP? This is gpu chiplet.
 
Joined
Jan 8, 2017
Messages
9,401 (3.29/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
The main issue with multicore/multithread/multi chips is how you get the modified data spread accross others chips. This is where the latency come from. The L3 cache in CPU is there for that specific roles.

Let say you modify some data. You will need to have the updated data available for other execution units. The easy way is to save it to ram, and them read it back but this add huge latency.
CPU cores often need to share data, GPU cores do not, what they need to execute is usually data independent.
 
Joined
Dec 29, 2010
Messages
3,797 (0.75/day)
Processor AMD 5900x
Motherboard Asus x570 Strix-E
Cooling Hardware Labs
Memory G.Skill 4000c17 2x16gb
Video Card(s) RTX 3090
Storage Sabrent
Display(s) Samsung G9
Case Phanteks 719
Audio Device(s) Fiio K5 Pro
Power Supply EVGA 1000 P2
Mouse Logitech G600
Keyboard Corsair K95
I'll pretend i understand this and just say "wooo progress!"
The biggest issue with gpu chiplets like SLI are the developers. Thus they have to architect a way to do it seamlessly w/o relying on devs to make it work. And here we are one step closer.
 
Joined
Apr 5, 2021
Messages
10 (0.01/day)
Location
Brazil - São Paulo
System Name Windows 10 PRO 21H1 modded by me
Processor AMD A10 7800
Motherboard Gigabyte GA-F2A88XM-D3HP
Cooling Air colled for a while
Memory 2x HyperX 8GB DDR 3
Video Card(s) Radeon RX580 8GB Power Color
Storage CT120BX500SSD1+ ST2000DM008-2FR102
Display(s) 1 Sansung 18,5" LCD + 1 LG 22" LCD
Case Deep Cool Tesseract modified by me
Audio Device(s) Power Amplifier made and developed by me
Power Supply Corsair CX 750M
Mouse Philips wireless
Keyboard Philips wireless
Software severals
Benchmark Scores @SysSoft Sandra 8.9 KPT
The main issue with multicore/multithread/multi chips is how you get the modified data spread accross others chips. This is where the latency come from. The L3 cache in CPU is there for that specific roles.

Let say you modify some data. You will need to have the updated data available for other execution units. The easy way is to save it to ram, and them read it back but this add huge latency.

They use the L3 cache for that, this save a lot of time but when you have multiple L3 cache, you need to have mechanism that detect if the data is in another L3 cache and then collect it. (very simplified explanation)

Having it in the bridge is probably the best solution as it will be aware of all others chiplets. But, connecting that to each chiplets will add latency and will have reduced bandwidth. But chip design is all about compromise and making the best choice that give the best performance overall.

We will see
yes I also agree with you, but in my view this already comes from the first chips you remember the memories of 512KB or even 1MB were also very expensive and I think this will not change so soon unfortunately; hmm on the other hand is the price of constant evolution that we have to pay...
 
Joined
Jan 21, 2021
Messages
17 (0.01/day)
Location
Vulcan
On one of the diagrams there’s an arrow going in from the CPU into the SDF. It appears the CPU will have direct access to the Scalable Data Fabric (which already makes up part of Infinity Fabric we see on Ryzen and Vega onwards GPUs) which will grant the ability of the CPU to read and write data to, from and between GPU chiplets thus connecting everything together. Which MAY allow for a more efficient and coherent data transfer between the CPU and GPU chiplets and between the GPU chiplets. The new (?maybe) interconnect within the GPU chiplet is the GDF lets call it Graphics Data Fabric which I dont know anything about yet which appears to offer all the WorkGroup Processors within the GPU chiplet coherency between them and the Level 2 cache. Interesting glimpse into the future.
 
Joined
Oct 12, 2005
Messages
703 (0.10/day)
CPU cores often need to share data, GPU cores do not, what they need to execute is usually data independent.
This is mostly true altought less and less true as there are more and more technique that reuse generated data. This is also why SLI/Crossfire is dead. The latency to move these data was just way too big. Temporal AA, ScreenSpace reflection, etc...
 
Joined
Jul 13, 2016
Messages
3,258 (1.07/day)
Processor Ryzen 7800X3D
Motherboard ASRock X670E Taichi
Cooling Noctua NH-D15 Chromax
Memory 32GB DDR5 6000 CL30
Video Card(s) MSI RTX 4090 Trio
Storage Too much
Display(s) Acer Predator XB3 27" 240 Hz
Case Thermaltake Core X9
Audio Device(s) Topping DX5, DCA Aeon II
Power Supply Seasonic Prime Titanium 850w
Mouse G305
Keyboard Wooting HE60
VR HMD Valve Index
Software Win 10
Oh, okay. I think I get it.

Infinitycache is Infinity Fabric for GPUs.

So rather than Infinity Fabric being a unified transport giving all CPU chiplets access to the memory controllers, Each GPU chiplet will have a baby/pseudo memory controller that seeds data into a massive shared L3 cache for all GPU chiplets too feed off.

Neat, probably. The move to chiplets will hurt overall IPC and efficiency slightly but it will move away from the single-biggest constraint GPUs have right now - manufacturing difficulties and yields on massive monolithic dies. You only have to look at the fact a 64C/128T Threadripper is available on a consumer/mainstream platform for the masses at $4000, whilst Intel is struggling so hard to get more than 24C in a processor that they'll charge $10-14K for the privilege and sell it only to server integrators as it's too much of a special snowflake to work in any non-proprietary mainstream platform using a regular, unified driver model.

AMD is shitting out 80mm² scalable chiplets at fantastic yields because of the small dies with 8C/16T and craploads of cache, whilst Intel's smallest 8C/16T part is 276mm² with zero scalability and half the cache.

Using the same silicon wafer yield calculator for both, AMD's gets ~696 sellable dies per wafer compared to Intel's ~161 sellable dies per wafer. Four times easier to make and the smaller die size also means that 92% of AMD's product is a flawless 8-core part, whilst around 25% of Intel's output needs to be harvested to make 6-core or worse.

So, if you take that example alone, GPU chiplets can't come soon enough.

Yes bouncing data around the dies will increase latency but that's easily mitigated by keeping data processing for each job within the die it's being worked on.
 
Joined
Jan 3, 2021
Messages
3,447 (2.46/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
I mean, sure - they both connect things which is the same function - but so do nails, tape, and string - yet those things are allowed to have different names? :p
Kudos for the inverse pun - mentioning nails, tape and string but mysteriously leaving out glue.
 
Joined
Mar 10, 2010
Messages
11,878 (2.22/day)
Location
Manchester uk
System Name RyzenGtEvo/ Asus strix scar II
Processor Amd R5 5900X/ Intel 8750H
Motherboard Crosshair hero8 impact/Asus
Cooling 360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory Corsair Vengeance Rgb pro 3600cas14 16Gb in four sticks./16Gb/16GB
Video Card(s) Powercolour RX7900XT Reference/Rtx 2060
Storage Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s) Samsung UAE28"850R 4k freesync.dell shiter
Case Lianli 011 dynamic/strix scar2
Audio Device(s) Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply corsair 1200Hxi/Asus stock
Mouse Roccat Kova/ Logitech G wireless
Keyboard Roccat Aimo 120
VR HMD Oculus rift
Software Win 10 Pro
Benchmark Scores 8726 vega 3dmark timespy/ laptop Timespy 6506
Oh, okay. I think I get it.

Infinitycache is Infinity Fabric for GPUs.

So rather than Infinity Fabric being a unified transport giving all CPU chiplets access to the memory controllers, Each GPU chiplet will have a baby/pseudo memory controller that seeds data into a massive shared L3 cache for all GPU chiplets too feed off.

Neat, probably. The move to chiplets will hurt overall IPC and efficiency slightly but it will move away from the single-biggest constraint GPUs have right now - manufacturing difficulties and yields on massive monolithic dies. You only have to look at the fact a 64C/128T Threadripper is available on a consumer/mainstream platform for the masses at $4000, whilst Intel is struggling so hard to get more than 24C in a processor that they'll charge $10-14K for the privilege and sell it only to server integrators as it's too much of a special snowflake to work in any non-proprietary mainstream platform using a regular, unified driver model.

AMD is shitting out 80mm² scalable chiplets at fantastic yields because of the small dies with 8C/16T and craploads of cache, whilst Intel's smallest 8C/16T part is 276mm² with zero scalability and half the cache.

Using the same silicon wafer yield calculator for both, AMD's gets ~696 sellable dies per wafer compared to Intel's ~161 sellable dies per wafer. Four times easier to make and the smaller die size also means that 92% of AMD's product is a flawless 8-core part, whilst around 25% of Intel's output needs to be harvested to make 6-core or worse.

So, if you take that example alone, GPU chiplets can't come soon enough.
While I agree with most of your points, I so think your wrong on efficiency and IPC because people (Not AMD but scientists I can't recall including those of Nvidia)have already proven that it can be both more efficient and give higher IPC, forget people even, AMD themselves also proved it with the Zen architecture
 
Top