Thursday, June 15th 2023
AMD Confirms that Instinct MI300X GPU Can Consume 750 W
AMD recently revealed its Instinct MI300X GPU at their Data Center and AI Technology Premiere event on Tuesday (June 15). The keynote presentation did not provide any details about the new accelerator model's power consumption, but that did not stop one tipster - Hoang Anh Phu - from obtaining this information from Team Red's post-event footnotes. A comparative observation was made: "MI300X (192 GB HBM3, OAM Module) TBP is 750 W, compared to last gen, MI250X TBP is only 500-560 W." A leaked Giga Computing roadmap from last month anticipated server-grade GPUs hitting the 700 W mark.
NVIDIA's Hopper H100 took the crown - with its demand for a maximum of 700 W - as the most power-hungry data center enterprise GPU until now. The MI300X's OCP Accelerator Module-based design now surpasses Team Green's flagship with a slightly greater rating. AMD's new "leadership generative AI accelerator" sports 304 CDNA 3 compute units, which is a clear upgrade over the MI250X's 220 (CDNA 2) CUs. Engineers have also introduced new 24G B HBM3 stacks, so the MI300X can be specced with 192 GB of memory (as a maximum), the MI250X is limited to a 128 GB memory capacity with its slower HBM2E stacks. We hope to see sample units producing benchmark results very soon, with the MI300X pitted against H100.
Sources:
VideoCardz, AnhPhuH Tweet
NVIDIA's Hopper H100 took the crown - with its demand for a maximum of 700 W - as the most power-hungry data center enterprise GPU until now. The MI300X's OCP Accelerator Module-based design now surpasses Team Green's flagship with a slightly greater rating. AMD's new "leadership generative AI accelerator" sports 304 CDNA 3 compute units, which is a clear upgrade over the MI250X's 220 (CDNA 2) CUs. Engineers have also introduced new 24G B HBM3 stacks, so the MI300X can be specced with 192 GB of memory (as a maximum), the MI250X is limited to a 128 GB memory capacity with its slower HBM2E stacks. We hope to see sample units producing benchmark results very soon, with the MI300X pitted against H100.
43 Comments on AMD Confirms that Instinct MI300X GPU Can Consume 750 W
The balance must come from stable long term investments like nuclear.
They're getting (server farms) that power dense city's have businesses waiting years for the capacity to be available, if at all and due to locality the heat could be put to work IMHO.
Like CAR engine's the most efficient way to run even servers is at its optimal load not idle, IE near max 24/7 in reality or off.
It's like people heard of global warming and set about showing them bitches who thinks they know how to global warm.
That's right throw waste nuclear engines in the deap ocean, throw some servers near the shore we might get st Ann's or Blackpool upto med temperature. :p I jest, mostly :D.
After reading this news, I (figuratively) just-now managed to pick my jaw up off the floor before typing this. :laugh: 'muh drivers!' hyperbole is especially complex (and amusing).
I've had issues in the past w/ both brands' drivers but, (almost-always) a clean driver install fixed things. When it didn't, a rollback worked.
(Note: Navi24, Vega, and Kepler-Maxwell are my 'newest' GPUs)
Now, with Intel 'in the mix' and well-behind on Driver Development, it's kinda hilarious to see Green v. Red still going "'yer drivers suck" at eachother. Right: 'noticing things'. Wrong direction, IMHO.
Think: Heat Pumps. In thermal wattage, there's recoverable waste heat availible to attach other industries/processes (or recover power for back-ups).
-with less environmental-disruption, too. But, that used to be the corporate-equivalent to telling kids there's veggies in their dinner Don't you know? 're supposed to be cancelled. /s :p
I disagree that 'currently available' Nuclear power is a panacea, or even a decent band-aid.
Thankfully, there's advanced, safe, and low/no-waste nuclear power solutions (on-paper, and 'shelved' both). There's also some good progress-on and proof-of-concepts-on various potentially-viable non-fission nuclear power generation technologies.
The issue(s) come(s) back to economic forces disinterested in employing/deploying and investing in those technologies.
Which, I do not believe TPU is an appropriate place to discuss (excepting, strictly-adhering to the technology-side) Big Brain idea: use Mars' poles.
Terraform Mars while having the most bitchin' "VERRRRY remote" rendering farms! Quick, someone tell Elon! :roll:
Which, (some) nations are putting money and effort into, apparently:
www.arabnews.com/node/1952741
www.thenationalnews.com/uae/science/uae-looks-to-regulate-asteroid-mining-as-it-aims-to-lure-private-space-sector-1.943028
ssa.gov.sa/en/home/
www.loc.gov/item/global-legal-monitor/2021-09-15/japan-space-resources-act-enacted/
Okay, sorry for the slide...
SPACE=COLD COLD=GOOD FOR COMPUTE
there, point made; and done.
on topic:
Was there mention of submersion or 'block and plumbing' liquid cooling? I am curious how datacenters are realistically going to react.
Does the increase in perf/watt warrant a 'shrink' in used rackspace? or, will cooling be expanded? I doubt this is an AMD-specific 'trend'
Semianalysis has a great write up of MI300 and how it compares to H100 and why it is critical for AMD that it succeeds.
MI300 Analysis
ROCm is coming to windows eventually... Also Github refuses to acknowledge the hole I found... They authenticate documentation After loading.... so if you slowroll the page load with traffic shaping in your router....
The support tables show RDNA2 and newer support on windows for ROCm stuff, and linux only for instinct cards, which is frankly expected.
This is for ROCm 5.6, current released version is 5.5.1
Current OAM power limit is 800w, they might use more than 750w, they might use less. Technically the 250x can use 560w but the frontier supercomputer they are running them at 500w for peak scaling performance.
Just as AMD's Epyc. These are packing up to 192 threads (96 cores) into one chip, only consuming up to 350W. This might sound like alot but you can effectively replace half a rack worth of server hardware with that, and run things quite more efficient.
I would be a sad panda but every month I see those checks come rolling on in, (and of course let's not forget the quarterlies as well) I have a wide grin on my face.
And I smile Just like J.K. Rowling. :peace:
But as many have commented I fully believe that there are ways in keeping tech cool without being so evasive to the environment.
We need hot water everywhere, many servers are joined to heating buildings or water around the world.
You rather waste energy only boiling water over 60C to kill the dreaded legionella or crunch data and kill it either way.
Heat is a resource humans need either way.
Hey, I see the topic has become very ecological. Apparently there was no problem burning gigawatts to "mine" crypto when we were the ones making money. I'm sure that maybe not exactly the discussants here, but there are colleagues in this forum who had whole "farms" of video cards, or had at least a few rigs. And now they are outraged by a device that can be used to do useful work for humanity, not just for the individual./end off
The summer Antartic sea is at it's lowest level ever recorded and the last 8 years have been record lows. Winter ice area fluctutates and did have a record year in 2014, but summer ice coverage is on a downward trend which exacerbates warming and reinforces sea ice melt. A one off small ice coverage summer is not cause for concern, but continued downward trends are very worrying. Apart from that you conveniently ignore the massive thinning of the huge ice-shelves such as in the Ross sea and the very real fact water temperatures under the Antarctic have increased several degrees in the last 50 years and is undercutting and melting the ice-shelves from below. This is due to excess energy being absorbed by the oceans.
I mean if you are going to trot out the tired and debunked climate denial BS at least try looking at the data yourself and coming up with a fresh argument.