Thursday, December 31st 2020
Intel Confirms HBM is Supported on Sapphire Rapids Xeons
Intel has just released its "Architecture Instruction Set Extensions and Future Features Programming Reference" manual, which serves the purpose of providing the developers' information about Intel's upcoming hardware additions which developers can utilize later on. Today, thanks to the @InstLatX64 on Twitter we have information that Intel is bringing on-package High Bandwidth Memory (HBM) solution to its next-generation Sapphire Rapids Xeon processors. Specifically, there are two instructions mentioned: 0220H - HBM command/address parity error and 0221H - HBM data parity error. Both instructions are there to address data errors in HBM so the CPU operates with correct data.
The addition of HBM is just one of the many new technologies Sapphire Rapids brings. The platform is supposedly going to bring many new technologies like an eight-channel DDR5 memory controller enriched with Intel's Data Streaming Accelerator (DSA). To connect to all of the external accelerators, the platform uses PCIe 5.0 protocol paired with CXL 1.1 standard to enable cache coherency in the system. And as a reminder, this would not be the first time we see a server CPU use HBM. Fujitsu has developed an A64FX processor with 48 cores and HBM memory, and it is powering today's most powerful supercomputer - Fugaku. That is showing how much can a processor get improved by adding a faster memory on-board. We are waiting to see how Intel manages to play it out and what we end up seeing on the market when Sapphire Rapids is delivered.
Source:
@InstLatX64 on Twitter
The addition of HBM is just one of the many new technologies Sapphire Rapids brings. The platform is supposedly going to bring many new technologies like an eight-channel DDR5 memory controller enriched with Intel's Data Streaming Accelerator (DSA). To connect to all of the external accelerators, the platform uses PCIe 5.0 protocol paired with CXL 1.1 standard to enable cache coherency in the system. And as a reminder, this would not be the first time we see a server CPU use HBM. Fujitsu has developed an A64FX processor with 48 cores and HBM memory, and it is powering today's most powerful supercomputer - Fugaku. That is showing how much can a processor get improved by adding a faster memory on-board. We are waiting to see how Intel manages to play it out and what we end up seeing on the market when Sapphire Rapids is delivered.
14 Comments on Intel Confirms HBM is Supported on Sapphire Rapids Xeons
Looks like HBM might be planned to be used as sort of L4 cache but still.
AMD's new gpu's have that infinity Cache which is basically super fast memory and then have standard GDDR6 next to that.
So why not have this HBM be the Intel version of that and still have memory next to it, just an extra step just how ram is an inbetween for CPU and Storage.
Heck maybe AMD in the future will have Infinity Cache > HBM > GDDR6
But you are right though, like I said, I would expect HBM to be become L4 cache for now.
Another layer of memory might bring some other possibilities to the table as well. XPoint DIMMs are what stands out as something that would gain from this.
When you are adding HBM to a CPU via an interposer you are talking about considerably increasing the cost of manufacture and time to produce. AMD learned this with Vega.
Aside from a few professional scenarios, I don't really see how regular consumers would benefit from having both HBM and DDR. If you are having such a problem with cache misses (which neither AMD nor Intel do) that you need another layer of storage between the L4 and main memory, you'd be much wiser to improve the amount of cache your CPU has or tweak what it decides to store in cache. Cache is still vastly faster and of lower latency than HBM. HBM has much more bandwidth than DDR4 but consumer systems don't really need more bandwidth right now. Heck we are still using dual channel memory and you'd be hard pressed to find a game that actually benefits from quad channel.
The thing with AMD's L3 infinity cache is that it fixes a downside of their choice of memory. It doesn't go searching for a solution to a problem that doesn't exist.
www.anandtech.com/show/16051/3dfabric-the-home-for-tsmc-2-5d-and-3d-stacking-roadmap
In other words: Level 4 cache HBM is the logical next step to feed the beast.
This will be expensive though so I think this is going to be high end Xeon server and workstation only. Unless intel somehow figured out some new smart, cost effective way to implement this... I can't see it though...
From a latency perspective, HBM is the same latency as any other DRAM (including DDR4), so you may win in bandwidth, but without a latency win... there's a huge chance you're just slowing things down. Xeon Phi had a HMC + DDR4 version (HMC was a stacked-ram competitor to HBM), and that kind of architecture is really hard and non-obvious to optimize for. Latency-sensitive code would be better run off of DDR4 (which is cheaper, and therefore physically larger). Bandwidth-sensitive code would prefer HBM.
As a programmer: its very non-obvious if you'll be latency-sensitive or bandwidth-sensitive. As a system engineer, who combined multiple code together, it is further non-obvious... so configuring such a system is just too complicated in the real world.
----------
HBM-only would probably be the way to go. Unless someone figures out how to solve this complexity issue (or better predict latency vs bandwidth sensitive code).
If you do DDR4 -> HBM -> Cache, it means you're now incurring two DRAM latencies per read/write, instead of one. A more reasonable architecture is DDR4 -> Cache + HBM->Cache, splitting the two up. However, that architecture is very difficult to program. As such, the most reasonable in practice is HBM->Cache (and avoiding the use of DDR4 / DDR5).
Unless Intel wants another Xeon Phi I guess...