• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Apple Patents Multi-Level Hybrid Memory Subsystem

Joined
May 24, 2007
Messages
5,429 (0.85/day)
Location
Tennessee
System Name AM5
Processor AMD Ryzen R9 7950X
Motherboard Asrock X670E Taichi
Cooling EK AIO Basic 360
Memory Corsair Vengeance DDR5 5600 64 Gb - XMP1 Profile
Video Card(s) AMD Reference 7900 XTX 24 Gb
Storage Crucial Gen 5 1 TB, Samsung Gen 4 980 1 TB / Samsung 8TB SSD
Display(s) Samsung 34" 240hz 4K
Case Fractal Define R7
Power Supply Seasonic PRIME PX-1300, 1300W 80+ Platinum, Full Modular
That lifestyle company can't be besting Intel... ;)
 
Joined
Jul 5, 2013
Messages
27,689 (6.66/day)
Update 21:14 UTC: We have been reached out by Mr. Kerry Creeron, an attorney with the firm of Banner & Witcoff, who provided us with additional insights about the patent. Mr. Creeron has provided us with his personal commentary about it, and you can find Mr. Creeron's quote below.
I suspect that this patent if approved will, in short order, will be contested and get invalidated. Memory schemes like this have been in use for decades and Apple's very minor "spin" on the concept is not enough for a patent to withstand critical scrutiny. This is Apple literally trying to be a patent troll.
 
Joined
Mar 21, 2016
Messages
2,508 (0.79/day)
Just put a 3D stacked DRAM chip under neath the CPU socket in the center tired to the socket and CPU directly.

I suspect that this patent if approved will, in short order, will be contested and get invalidated. Memory schemes like this have been in use for decades and Apple's very minor "spin" on the concept is not enough for a patent to withstand critical scrutiny. This is Apple literally trying to be a patent troll.

I agree this type of patent is bad for a level playing field of open competition. We've seen how this works with RAMBUS already it doesn't benefit consumers.
 

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
13,167 (2.81/day)
Location
Concord, NH, USA
System Name Apollo
Processor Intel Core i9 9880H
Motherboard Some proprietary Apple thing.
Memory 64GB DDR4-2667
Video Card(s) AMD Radeon Pro 5600M, 8GB HBM2
Storage 1TB Apple NVMe, 4TB External
Display(s) Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case MacBook Pro (16", 2019)
Audio Device(s) AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply 96w Power Adapter
Mouse Logitech MX Master 3
Keyboard Logitech G915, GL Clicky
Software MacOS 12.1
Just put a 3D stacked DRAM chip under neath the CPU socket in the center tired to the socket and CPU directly.
DRAM can't be put on the interposer for the CPU if it's under the CPU. It would have to be mounted to the PCB under the interposer and would make for a very complicated PCB design. I wouldn't envy that engineer's task. It's also not that much closer to the CPU compared to putting it next to it like with M1's system memory. There are a lot of cons and not a lot of benefits.
 
Joined
Mar 21, 2016
Messages
2,508 (0.79/day)
That's probably true and valid, but things are shrinking it'll get easier to place it there in due time. Also it's not to replace system memory more to provide a quicker buffer between it. I was speaking about wiring it under the motherboard socket as opposed to the underneath the middle of the CPU's PCB. A PCIE wired microSD card slot would be neat there as well. Consider this 3D stacked and 2TB of storage with PCIE 4.0 x16 slot wiring to it. If they could pull that off it would be rather amazing.
 
Joined
Oct 15, 2011
Messages
2,387 (0.50/day)
Location
Springfield, Vermont
System Name KHR-1
Processor Ryzen 9 5900X
Motherboard ASRock B550 PG Velocita (UEFI-BIOS P3.40)
Memory 32 GB G.Skill RipJawsV F4-3200C16D-32GVR
Video Card(s) Sapphire Nitro+ Radeon RX 6750 XT
Storage Western Digital Black SN850 1 TB NVMe SSD
Display(s) Alienware AW3423DWF OLED-ASRock PG27Q15R2A (backup)
Case Corsair 275R
Audio Device(s) Technics SA-EX140 receiver with Polk VT60 speakers
Power Supply eVGA Supernova G3 750W
Mouse Logitech G Pro (Hero)
Software Windows 11 Pro x64 23H2
I suspect that this patent if approved will, in short order, will be contested and get invalidated. Memory schemes like this have been in use for decades and Apple's very minor "spin" on the concept is not enough for a patent to withstand critical scrutiny. This is Apple literally trying to be a patent troll.
So, Apple is being accused of having another baseless patent, likened to round corners?
 
Joined
May 19, 2009
Messages
223 (0.04/day)
I would have thought moving to stacked HBM would be a much better option and resolve these issues, Apple could easily have 128GB or more of unified memory on the package at around 4TB/s bandwidth (maybe 8TB/s), then have a PCI-Ex 5.0 memory interface to ultra fast SSDs which could have terabytes of memory accessed faster than your average DDR4!
 

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
13,167 (2.81/day)
Location
Concord, NH, USA
System Name Apollo
Processor Intel Core i9 9880H
Motherboard Some proprietary Apple thing.
Memory 64GB DDR4-2667
Video Card(s) AMD Radeon Pro 5600M, 8GB HBM2
Storage 1TB Apple NVMe, 4TB External
Display(s) Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case MacBook Pro (16", 2019)
Audio Device(s) AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply 96w Power Adapter
Mouse Logitech MX Master 3
Keyboard Logitech G915, GL Clicky
Software MacOS 12.1
I would have thought moving to stacked HBM would be a much better option and resolve these issues, Apple could easily have 128GB or more of unified memory on the package at around 4TB/s bandwidth (maybe 8TB/s), then have a PCI-Ex 5.0 memory interface to ultra fast SSDs which could have terabytes of memory accessed faster than your average DDR4!
The complication with HBM is that it's slow but wide interface. There is a benefit to fast random access as opposed to relatively slow bulk access depending on the workload. You would need a big cache and a good caching strategy to compensate for it. You also have to consider that most data you want probably isn't in the same consecutive 1k, 2k, or 4k region that you're reading from or writing to, so while the maximum theoretical bandwidth is really nice, it's really unlikely that you'd saturate that because of the nature of memory requests that CPUs tend to make compared to GPUs. With that said though, even a fraction of HBM's speed could keep up with traditional DRAM. So maybe it's not as big of a problem as I think it could be.
 
Joined
Jan 3, 2021
Messages
3,484 (2.46/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
I would have thought moving to stacked HBM would be a much better option and resolve these issues, Apple could easily have 128GB or more of unified memory on the package at around 4TB/s bandwidth (maybe 8TB/s), then have a PCI-Ex 5.0 memory interface to ultra fast SSDs which could have terabytes of memory accessed faster than your average DDR4!
You have just described the 2024 Mac Pro. Starting at $15,000. If anyone develops HBM4 with much reduced latency/faster random access by then, that is.

The complication with HBM is that it's slow but wide interface. There is a benefit to fast random access as opposed to relatively slow bulk access depending on the workload. You would need a big cache and a good caching strategy to compensate for it. You also have to consider that most data you want probably isn't in the same consecutive 1k, 2k, or 4k region that you're reading from or writing to, so while the maximum theoretical bandwidth is really nice, it's really unlikely that you'd saturate that because of the nature of memory requests that CPUs tend to make compared to GPUs. With that said though, even a fraction of HBM's speed could keep up with traditional DRAM. So maybe it's not as big of a problem as I think it could be.
Yeah, you explained it nicely.
On the other hand, Apple could also make "traditional" DRAM wider. The M1 apparently has all the DRAM in two packages and a more powerful processor could have four or eight close to the processor die.
 

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
13,167 (2.81/day)
Location
Concord, NH, USA
System Name Apollo
Processor Intel Core i9 9880H
Motherboard Some proprietary Apple thing.
Memory 64GB DDR4-2667
Video Card(s) AMD Radeon Pro 5600M, 8GB HBM2
Storage 1TB Apple NVMe, 4TB External
Display(s) Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case MacBook Pro (16", 2019)
Audio Device(s) AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply 96w Power Adapter
Mouse Logitech MX Master 3
Keyboard Logitech G915, GL Clicky
Software MacOS 12.1
You have just described the 2024 Mac Pro. Starting at $15,000. If anyone develops HBM4 with much reduced latency/faster random access by then, that is.


Yeah, you explained it nicely.
On the other hand, Apple could also make "traditional" DRAM wider. The M1 apparently has all the DRAM in two packages and a more powerful processor could have four or eight close to the processor die.
HBM is efficient because the transistors aren't switched as fast. You lose that advantage if you try to drive it as fast as traditional DRAM. I don't think people really realize how much more power that higher switching frequencies require.
 

bug

Joined
May 22, 2015
Messages
13,755 (3.96/day)
Processor Intel i5-12600k
Motherboard Asus H670 TUF
Cooling Arctic Freezer 34
Memory 2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s) EVGA GTX 1060 SC
Storage 500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s) Dell U3219Q + HP ZR24w
Case Raijintek Thetis
Audio Device(s) Audioquest Dragonfly Red :D
Power Supply Seasonic 620W M12
Mouse Logitech G502 Proteus Core
Keyboard G.Skill KM780R
Software Arch Linux + Win10
The complication with HBM is that it's slow but wide interface. There is a benefit to fast random access as opposed to relatively slow bulk access depending on the workload. You would need a big cache and a good caching strategy to compensate for it. You also have to consider that most data you want probably isn't in the same consecutive 1k, 2k, or 4k region that you're reading from or writing to, so while the maximum theoretical bandwidth is really nice, it's really unlikely that you'd saturate that because of the nature of memory requests that CPUs tend to make compared to GPUs. With that said though, even a fraction of HBM's speed could keep up with traditional DRAM. So maybe it's not as big of a problem as I think it could be.
Exactly. Caches solve the latency problem, not the bandwidth problem. HBM is the opposite of that.
 
Joined
Jan 3, 2021
Messages
3,484 (2.46/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
Exactly. Caches solve the latency problem, not the bandwidth problem. HBM is the opposite of that.
Apple's "Cache DRAM" can only solve that problem if it's a special low-latency type of dynamic RAM. I don't know if anything like that is available, however, Intel's Crystal Well apparently was such a chip, with a latency of ~30 ns in addition to great bandwidth (measured by Anand).
 
  • Like
Reactions: bug

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
13,167 (2.81/day)
Location
Concord, NH, USA
System Name Apollo
Processor Intel Core i9 9880H
Motherboard Some proprietary Apple thing.
Memory 64GB DDR4-2667
Video Card(s) AMD Radeon Pro 5600M, 8GB HBM2
Storage 1TB Apple NVMe, 4TB External
Display(s) Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case MacBook Pro (16", 2019)
Audio Device(s) AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply 96w Power Adapter
Mouse Logitech MX Master 3
Keyboard Logitech G915, GL Clicky
Software MacOS 12.1
Exactly. Caches solve the latency problem, not the bandwidth problem. HBM is the opposite of that.
Well, cache solves both the bandwidth and latency problem. It doesn't doesn't solve the capacity problem. HBM solves the capacity and bandwidth problem. It's not great on latency, but that's a problem that can be solved, or at the very least, mitigated.

Let me put it another way. HBM2 is the reason why my MacBook Pro is silent with two 5k displays plugged into it. All the other GDDR models would have the fan whirring away due to memory being clocked up to drive them. That's heat and power that you can't afford on a mobile device. It's also how they could cram 40 CUs onto the Radeon Pro 5600m and stay within the 50w power envelope, all while having almost 400GB/s of max theoretical bandwidth. You can't tell me that's not an advantage.
 
Last edited:
Joined
Jan 3, 2021
Messages
3,484 (2.46/day)
Location
Slovenia
Processor i5-6600K
Motherboard Asus Z170A
Cooling some cheap Cooler Master Hyper 103 or similar
Memory 16GB DDR4-2400
Video Card(s) IGP
Storage Samsung 850 EVO 250GB
Display(s) 2x Oldell 24" 1920x1200
Case Bitfenix Nova white windowless non-mesh
Audio Device(s) E-mu 1212m PCI
Power Supply Seasonic G-360
Mouse Logitech Marble trackball, never had a mouse
Keyboard Key Tronic KT2000, no Win key because 1994
Software Oldwin
Well, cache solves both the bandwidth and latency problem. It doesn't doesn't solve the capacity problem. HBM solves the capacity and bandwidth problem.
And then Apple solves the HBM cost problem by putting the retail price somewhere in the geosynchronous orbit. Violà, problems gone.
 

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
13,167 (2.81/day)
Location
Concord, NH, USA
System Name Apollo
Processor Intel Core i9 9880H
Motherboard Some proprietary Apple thing.
Memory 64GB DDR4-2667
Video Card(s) AMD Radeon Pro 5600M, 8GB HBM2
Storage 1TB Apple NVMe, 4TB External
Display(s) Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case MacBook Pro (16", 2019)
Audio Device(s) AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply 96w Power Adapter
Mouse Logitech MX Master 3
Keyboard Logitech G915, GL Clicky
Software MacOS 12.1
And then Apple solves the HBM cost problem by putting the retail price somewhere in the geosynchronous orbit. Violà, problems gone.
Truth. I sold a kidney to afford my MacBook Pro. :laugh:
 

bug

Joined
May 22, 2015
Messages
13,755 (3.96/day)
Processor Intel i5-12600k
Motherboard Asus H670 TUF
Cooling Arctic Freezer 34
Memory 2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s) EVGA GTX 1060 SC
Storage 500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s) Dell U3219Q + HP ZR24w
Case Raijintek Thetis
Audio Device(s) Audioquest Dragonfly Red :D
Power Supply Seasonic 620W M12
Mouse Logitech G502 Proteus Core
Keyboard G.Skill KM780R
Software Arch Linux + Win10
Well, cache solves both the bandwidth and latency problem. It doesn't doesn't solve the capacity problem. HBM solves the capacity and bandwidth problem.
Ok, that's more accurate than what I said.
It's not great on latency, but that's a problem that can be solved, or at the very least, mitigated.
I'm not so sure HBM's latency can be as low as required for usage in a cache system.
Latency doesn't seem to move like at all. At least that's what happened for DDR.
 

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
13,167 (2.81/day)
Location
Concord, NH, USA
System Name Apollo
Processor Intel Core i9 9880H
Motherboard Some proprietary Apple thing.
Memory 64GB DDR4-2667
Video Card(s) AMD Radeon Pro 5600M, 8GB HBM2
Storage 1TB Apple NVMe, 4TB External
Display(s) Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case MacBook Pro (16", 2019)
Audio Device(s) AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply 96w Power Adapter
Mouse Logitech MX Master 3
Keyboard Logitech G915, GL Clicky
Software MacOS 12.1
I'm not so sure HBM's latency can be as low as required for usage in a cache system.
Latency doesn't seem to move like at all. At least that's what happened for DDR.
Well, HBM does have a latency penalty, but it makes up for that through its ability to burst a lot of data and because it's split into several channels, you can actually queue up a lot of memory requests and get stuff back rapidfire. So while there is overhead involved, it might not actually be that bad depending on how much data you need to pull at once. Think about it, AMD beefed up the size of its last level of cache with the latest Zen chips. Why, would they do that? The answer is simple, an off die I/O chiplet introduces latency and you need a way to buffer that latency. Depending on the caching strategy, that last level of cache might get a ton of hits and the more hits you get, the more insulated you are from the latency cost.

You also have to consider what Apple is doing. This level in the memory hierarchy has to also be able to support a GPU and AI circuitry as well. HBM is definitely well suited towards those sort of tasks, so all in all, it's probably a wash when it comes to latency. The real advantage comes from the memory bandwidth with relatively low power consumption and a high memory density.
 
Top