AMD CTO Teases Memory Upgrades for Revised Instinct MI300-series Accelerators

T0@st · Feb 26, 2024

Brett Simpson, Partner and Co-Founder of Arete Research, sat down with AMD CTO Mark Papermaster during the former's "Investor Webinar Conference." A transcript of the Arete + AMD question and answer session appeared online last week—the documented fireside chat concentrated mostly on "AI compute market" topics. Papermaster was asked about his company's competitive approach when taking on NVIDIA's very popular range of A100 and H100 AI GPUs, as well as the recently launched GH200 chip. The CTO did not reveal any specific pricing strategies—a "big picture" was painted instead: "I think what's important when you just step back is to look at total cost of ownership, not just one GPU, one accelerator, but total cost of ownership. But now when you also look at the macro, if there's not competition in the market, you're going to see not only a growth of the price of these devices due to the added content that they have, but you're -- without a check and balance, you're going to see very, very high margins, more than that could be sustained without a competitive environment."

Papermaster continued: "And what I think is very key with -- as AMD has brought competition market for these most powerful AI training and inference devices is you will see that check and balance. And we have a very innovative approach. We've been a leader in chiplet design. And so we have the right technology for the right purpose of the AI build-out that we do. We have, of course, a GPU accelerator. But there's many other circuitry associated with being able to scale and build out these large clusters, and we're very, very efficient in our design." Team Red started to ship its flagship accelerator, Instinct MI300X, to important customers at the start of 2024—Arete Research's Simpson asked about the possibility of follow-up models. In response, AMD's CTO referenced some recent history: "Well, I think the first thing that I'll highlight is what we did to arrive at this point, where we are a competitive force. We've been investing for years in building up our GPU road map to compete in both HPC and AI. We had a very, very strong harbor train that we've been on, but we had to build our muscle in the software enablement."

Papermaster proceeded with: "And so, we started years ago, a development of the ROCm software stack. It competes head on with CUDA. We're able to go head on. We're a GPU company just like NVIDIA. We've competed with NVIDIA for a year. So it's not surprising that a lot of the -- even the programming semantics that we use are similar because we've been, frankly, traversing the same journey for decades. And so, that brought us up to December 6 when we announced the MI300. We brought that competition...We're now shipping, we're now ramping. And that's exactly what we wanted...and it allowed us then to create yet a different environment of how we're working with our largest customers. We worked closely with them and got input from them on the MI300." AMD's launch of its latest MI300 products has generated a lot of buzz within AI industries—so much so that Team Green has adjusted its plans, according to Papermaster: "What you saw play out is, in fact, NVIDIA reacted to our announcement. They've actually accelerated their road map. We're not standing still. We made adjustments to accelerate our road map with both memory configurations around the MI300 family, derivatives of MI300, the generation next." The current lineup of Instinct accelerators relies on HMB3 parts, while NVIDIA and its production partners has already moved into HMB3E.

Team Red's CTO appeared not too concerned about that development: "we have been extremely experienced at bringing memory into the GPU compute cluster. We led the way. And what is now CoWoS at TSMC, which is the most widely used silicon substrate connectivity to have the most efficient connection of high-bandwidth memory to compute. And we worked extremely closely with all three memory vendors. So that is why we led with MI300, and we decided to invest more in the HBM complex. So we have a higher bandwidth. And that is fundamental along with the CDNA, which is our name of our IP, that's our GPU computation IP for AI, along with that, it was HBM know-how that allowed us to establish our leadership position in AI inferencing."

AMD is seemingly prepared to get Instinct upgraded with an extended high bandwidth memory standard: "We architected for the future. So we have 8-high stacks. We architected for 12-high stacks. We are shipping with MI300 HBM3. We've architected for HBM3E. So we understand memory. We have the relationship and we have the architectural know-how to really stay on top of the capabilities needed. And because of that deep history that we have not only with the memory vendors, but also with TSMC and the rest of the substrate supplier and OSAT community we've been focused as well on delivery and supply chain."

View at TechPowerUp Main Site | Source

AMD CTO Teases Memory Upgrades for Revised Instinct MI300-series Accelerators

T0@st

News Editor