Thursday, June 15th 2023

AMD Confirms that Instinct MI300X GPU Can Consume 750 W

AMD recently revealed its Instinct MI300X GPU at their Data Center and AI Technology Premiere event on Tuesday (June 15). The keynote presentation did not provide any details about the new accelerator model's power consumption, but that did not stop one tipster - Hoang Anh Phu - from obtaining this information from Team Red's post-event footnotes. A comparative observation was made: "MI300X (192 GB HBM3, OAM Module) TBP is 750 W, compared to last gen, MI250X TBP is only 500-560 W." A leaked Giga Computing roadmap from last month anticipated server-grade GPUs hitting the 700 W mark.

NVIDIA's Hopper H100 took the crown - with its demand for a maximum of 700 W - as the most power-hungry data center enterprise GPU until now. The MI300X's OCP Accelerator Module-based design now surpasses Team Green's flagship with a slightly greater rating. AMD's new "leadership generative AI accelerator" sports 304 CDNA 3 compute units, which is a clear upgrade over the MI250X's 220 (CDNA 2) CUs. Engineers have also introduced new 24G B HBM3 stacks, so the MI300X can be specced with 192 GB of memory (as a maximum), the MI250X is limited to a 128 GB memory capacity with its slower HBM2E stacks. We hope to see sample units producing benchmark results very soon, with the MI300X pitted against H100.
Sources: VideoCardz, AnhPhuH Tweet
Add your own comment

43 Comments on AMD Confirms that Instinct MI300X GPU Can Consume 750 W

#2
zo0lykas
192GB HBM3, future is here :)
Posted on Reply
#3
Denver
On paper this trumps the H100, how is the battle in real workloads?
Posted on Reply
#4
ZoneDymo
what cpu's save in powerconsumption, gpu's will make up for....
Posted on Reply
#6
fancucker
This is an immense feat of engineering and hopefully something that helps AMD garner well-deserved market share but issues abound:

1 - HBM3 is costly, and may negate the cost advantage AMD might have with the MI300. Nvidia is likely to ship with HBM3 products at the same timeframe or earlier
2 - There is no apparent equivalent to the transformer engine, which can triple performance in LLM scenarios in Nvidia counterparts
3 - Nvidias H100 is shipping in full volume today, with more researcher and technical support in their superior ecosystem
4 - AMD is yet to disclose benchmarks

I hope AMD can resolve or alleviate some of these issues because it seems like an excellent product overall.
Posted on Reply
#7
TumbleGeorge
All data here is old and partially incorrect. Must be updated.
Posted on Reply
#8
Tomgang
Uf that's a lot of power going throw a pcb and gpu.

And I think the 570 watt peak I have seen on my rtx 4090 with oc is bad enough. Yes I am aware this is meant for other things than gaming. But still 700 watt or more is a lot of power for one gpu.
Posted on Reply
#9
kapone32
TomgangUf that's a lot of power going throw a pcb and gpu.

And I think the 570 watt peak I have seen on my rtx 4090 with oc is bad enough. Yes I am aware this is meant for other things than gaming. But still 700 watt or more is a lot of power for one gpu.
This monster has about 5 to 10 times the transistors vs a 4090 and 192GB of HBM is no joke but it is actual not bad considering.
Posted on Reply
#10
R0H1T
DenverOn paper this trumps the H100, how is the battle in real workloads?
Depends on the software stack, this alongside MI300A will beat nearly any combinations of AI/CPU/GPU power Nvidia can muster but then again there's CUDA so!
Posted on Reply
#11
dgianstefani
TPU Proofreader
R0H1TDepends on the software stack, this alongside MI300A will beat nearly any combinations of AI/CPU/GPU power Nvidia can muster but then again there's CUDA so!
Entire GPU ecosystem is based around software, the hardware is just a checkbox.

NVIDIA is a software company.
Posted on Reply
#12
Wirko
TomgangUf that's a lot of power going throw a pcb and gpu.

And I think the 570 watt peak I have seen on my rtx 4090 with oc is bad enough. Yes I am aware this is meant for other things than gaming. But still 700 watt or more is a lot of power for one gpu.
If the trend continues, those accelerators are going to consume one half-Xeon in a couple years.
Posted on Reply
#13
R0H1T
dgianstefaniEntire GPU ecosystem is based around software, the hardware is just a checkbox.

NVIDIA is a software company.
AMD is getting there, if they can string a few consistent wins in a row they should be able to challenge Nvidia kinda like they did with Intel.
Posted on Reply
#14
kapone32
dgianstefaniEntire GPU ecosystem is based around software, the hardware is just a checkbox.

NVIDIA is a software company.
Well based on the Adrenline software vs Nvidia's offerings would belie what you are saying. Ray Tracing and DLSS do not apply in this space so.
Posted on Reply
#15
dgianstefani
TPU Proofreader
kapone32Well based on the Adrenline software vs Nvidia's offerings would belie what you are saying. Ray Tracing and DLSS do not apply in this space so.
Maybe by the time AMD has figured out drivers that can be installed on any of their consumer cards (see: several months gap between RDNA3 release and RDNA2 driver update, or, more recently exclusive driver for RX7600 only), they can start to figure out the software and partner support they need to succeed in datacentre GPUs.

By then NVIDIA will have built it's 1:1 virtual/real world model in the Omniverse, of which every major manufacturer has already signed onto, just as with CUDA dominance for the past decade.

From 15 months 3/22 ago this image.



www.nvidia.com/en-us/omniverse/ecosystem/ current ecosystem.

Where is AMD?
R0H1TAMD is getting there, if they can string a few consistent wins in a row they should be able to challenge Nvidia kinda like they did with Intel.
I hope so, a monopoly isn't a great situation, but the product needs to deliver. Zen did, but that was a hardware success not a software success. The software (AGESA) for Zen has been buggy in each iteration for years now, which they hope to fix eventually with opensource OpenSIL.
Posted on Reply
#16
kapone32
dgianstefaniMaybe by the time AMD has figured out drivers that can be installed on any of their consumer cards (see: several months gap between RDNA3 release and RDNA2 driver update, or, more recently exclusive driver for RX7600 only), they can start to figure out the software and partner support they need to succeed in datacentre GPUs.

By then NVIDIA will have built it's 1:1 virtual/real world model in the Omniverse, of which every major manufacturer has already signed onto, just as with CUDA dominance for the past decade.

From 15 months 3/22 ago this image.



www.nvidia.com/en-us/omniverse/ecosystem/ current ecosystem.

Where is AMD?
Let me see I have had AMD since the original 6800 and yes that was a Gigabyte board and it was a gremlin. Then I went with Sapphire and nothing sense. Did you know that Sapphire had upscaling in their software package before FSR or DLSS were even conversation pieces? I could paste the AMD stack as well and as much as people complained about the 3 months that AMD did not give them driver updates for Cards that were working fine is also nothing but the narrative. There is also that while you and I can debate it that AMD will be selling plenty of these as the Companies on that placard are in some ways optimizing for AMD or depending on AWS and Microsoft for their network but we will go on. This card is no joke at all so as much as you give little import to the hardware 192GB of HBM3 (spec that we don't know) is nothing to sneeze at and the amount of transistor counts is also crazy.
Posted on Reply
#17
dgianstefani
TPU Proofreader
kapone32Let me see I have had AMD since the original 6800 and yes that was a Gigabyte board and it was a gremlin. Then I went with Sapphire and nothing sense. Did you know that Sapphire had upscaling in their software package before FSR or DLSS were even conversation pieces? I could paste the AMD stack as well and as much as people complained about the 3 months that AMD did not give them driver updates for Cards that were working fine is also nothing but the narrative. There is also that while you and I can debate it that AMD will be selling plenty of these as the Companies on that placard are in some ways optimizing for AMD or depending on AWS and Microsoft for their network but we will go on. This card is no joke at all so as much as you give little import to the hardware 192GB of HBM3 (spec that we don't know) is nothing to sneeze at and the amount of transistor counts is also crazy.
Hardware is nice, sure.

"Narrative" aside, we'll see how they do in 2023 for enterprise GPU, in 2021 they couldn't breach 9% and the trend isn't changing, that percentage went down in 2022.

www.nasdaq.com/articles/better-buy:-nvidia-vs.-amd-2
Posted on Reply
#18
kapone32
dgianstefaniHardware is nice, sure.

"Narrative" aside, we'll see how they do in 2023 for enterprise GPU, in 2021 they couldn't breach 9% and the trend isn't changing, that percentage went down in 2022.

www.nasdaq.com/articles/better-buy:-nvidia-vs.-amd-2
Why do you think they released this card? I am aware of AMD's position in the Data Centre when it comes to GPUs. My argument was strictly software and again this is a monster of a GPU that could disrupt the stack.
Posted on Reply
#19
TheoneandonlyMrK
dgianstefaniMaybe by the time AMD has figured out drivers that can be installed on any of their consumer cards (see: several months gap between RDNA3 release and RDNA2 driver update, or, more recently exclusive driver for RX7600 only), they can start to figure out the software and partner support they need to succeed in datacentre GPUs.

By then NVIDIA will have built it's 1:1 virtual/real world model in the Omniverse, of which every major manufacturer has already signed onto, just as with CUDA dominance for the past decade.

From 15 months 3/22 ago this image.



www.nvidia.com/en-us/omniverse/ecosystem/ current ecosystem.

Where is AMD?


I hope so, a monopoly isn't a great situation, but the product needs to deliver. Zen did, but that was a hardware success not a software success. The software (AGESA) for Zen has been buggy in each iteration for years now, which they hope to fix eventually with opensource OpenSIL.
My word.

AMD drivers again? I have no issues and don't/have not seen the masses abnormally arrayed against AMD drivers, just the odd hyperbolic statement.
Posted on Reply
#20
Tek-Check
natr0nPretty amazing. They remove the air and basically create a vaccuum.. The cold ocean water is like the ambient temp for the container.
Yes, but you can't cheat physics. Imagine hundreds of those warming up local sea and disrupting flora and fauna.
Posted on Reply
#21
R0H1T
Yes & it's not like whatever they're putting them in wouldn't cause at least some sort of (chemical) reaction at the depths used for them. We're just dumping more of our problems in the oceans if it's not plastic it's excess heat :shadedshu:

If you really want to make them almost completely eco friendly just launch them into space!
Posted on Reply
#22
dgianstefani
TPU Proofreader
R0H1TYes & it's not like whatever they're putting them in wouldn't cause at least some sort of (chemical) reaction at the depths used for them. We're just dumping more of our problems in the oceans if it's not plastic it's excess heat :shadedshu:

If you really want to make them almost completely eco friendly just launch them into space!
Cooler chips consume less energy, due to lower voltage leakage.

If they're going to be run, they may as well be run more efficiently.
Posted on Reply
#23
R0H1T
But they are disrupting the ecology of that place, even if a little bit. I'd rather them use the power from solar panels on a satellite & irradiate whatever heat they're generating mostly outside our atmosphere. While this sounded like a fun experiment I'm not sure how practical it is longer term. Especially with oceans getting warmer every day. We're just accelerating this process arguably more quickly with something like it, the data for instance they're crunching/processing would need to travel longer distances & so is it really more (energy) efficient overall?
Posted on Reply
#24
dgianstefani
TPU Proofreader
R0H1TBut they are disrupting the ecology of that place, even if a little bit. I'd rather them use the power from solar panels on a satellite & irradiate whatever heat they're generating mostly outside our atmosphere. While this sounded like a fun experiment I'm not sure how practical it is longer term. Especially with oceans getting warmer every day. We're just accelerating this process arguably more quickly with something like it, the data for instance they're crunching/processing would need to travel longer distances & so is it really more (energy) efficient overall?
Heat from server farms won't even register compared to the effects of atmosphere changes and pollution from rivers, fuel oil container ships, manufacturing waste, farming run off, reduced reflective white from shrinking ice surface area etc. the list goes on. I doubt you could even calculate the difference submerged computers would make.

Average ocean temperature is 0-20ºC, even if that doubled (projections are a couple ºC increase over several centuries), it would still be an effective cooling medium.

I wouldn't be surprised to see local ecology find some way to benefit from the heat source, as with coral/bacteria ecosystems near underwater volcanos or vents etc.
Posted on Reply
#25
R-T-B
R0H1TIf you really want to make them almost completely eco friendly just launch them into space!
Space in low earth orbit is way way hotter than under the sea, at least in blackbody temp. The vacuum acts more like an insulator there anyways.
Posted on Reply
Add your own comment
Nov 21st, 2024 07:49 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts