Monday, February 10th 2020

AMD Radeon Instinct MI100 "Arcturus" Hits the Radar, We Have its BIOS
AMD's upcoming large post-Navi graphics chip, codenamed "Arcturus," will debut as "Radeon Instinct MI100", which is an AI-ML accelerator under the Radeon Instinct brand, which AMD calls "Server Accelerators." TechPowerUp accessed its BIOS, which is now up on our VGA BIOS database. The card goes with the device ID "0x1002 0x738C," which confirms "AMD" and "Arcturus,". The BIOS also confirms that memory size is at a massive 32 GB HBM2, clocked at 1000 MHz real (possibly 1 TB/s bandwidth, if memory bus width is 4096-bit).
Both Samsung (KHA884901X) and Hynix memory (H5VR64ESA8H) is supported, which is an important capability for AMD's supply chain. From the ID string "MI100 D34303 A1 XL 200W 32GB 1000m" we can derive that the TDP limit is set to a surprisingly low 200 W, especially considering this is a 128 CU / 8,192 shader count design. Vega 64 and Radeon Instinct MI60 for comparison have around 300 W power budget with 4,096 shaders, 5700 XT has 225 W with 2560 shaders, so either AMD achieved some monumental efficiency improvements with Arcturus or the whole design is intentionally running constrained, so that AMD doesn't reveal their hand to these partners, doing early testing of the card.
-- images removed --
Looking through the BIOS I also found what looks like several clock tables that top out at 1334 MHz, 1091 MHz, 1000 MHz. AMD's engineers typically list clocks in the following order: GPU clock, SOC clock, memory clock. This suggests that the GPU will tick at up to 1334 MHz, way lower than what Navi and Vega were able to achieve — maybe they do that to operate the chip in a more power-efficient way. The memory clock at 1000 MHz, matches the BIOS id string's "1000m", and falls in range with the 2.0 - 2.4 Gbps that Samsung is speccing their HBM2 memory chips at.
Arcturus' debut as a Radeon Instinct product follows the pattern of AMD debuting new big GPUs as low-volume/high-margin AI-ML accelerators first, followed by Radeon Pro and finally Radeon client graphics products. Arcturus is not "big Navi," rather it seems to be much closer to Vega than to Navi, which makes perfect sense given its target market. AMD's Linux sources mention "It's because Arcturus has not 3D engine", which could hint at what AMD did with this chip: take Vega and remove all 3D raster graphics ability, which shaves a few billion transistors off the silicon, freeing up space for more CUs. For gamers, AMD is planning a new line of Navi 20-series chips leveraging 7 nm EUV for launch throughout 2020. Various higher-ups at AMD, including its CEO, publicly hinted that a big client-segment GPU is in the works, and that the company is very much interested at taking another swing at premium 4K UHD gaming.
Sources:
Arcturus Linux Patches, Arcturus Linux Patches
Both Samsung (KHA884901X) and Hynix memory (H5VR64ESA8H) is supported, which is an important capability for AMD's supply chain. From the ID string "MI100 D34303 A1 XL 200W 32GB 1000m" we can derive that the TDP limit is set to a surprisingly low 200 W, especially considering this is a 128 CU / 8,192 shader count design. Vega 64 and Radeon Instinct MI60 for comparison have around 300 W power budget with 4,096 shaders, 5700 XT has 225 W with 2560 shaders, so either AMD achieved some monumental efficiency improvements with Arcturus or the whole design is intentionally running constrained, so that AMD doesn't reveal their hand to these partners, doing early testing of the card.
-- images removed --
Looking through the BIOS I also found what looks like several clock tables that top out at 1334 MHz, 1091 MHz, 1000 MHz. AMD's engineers typically list clocks in the following order: GPU clock, SOC clock, memory clock. This suggests that the GPU will tick at up to 1334 MHz, way lower than what Navi and Vega were able to achieve — maybe they do that to operate the chip in a more power-efficient way. The memory clock at 1000 MHz, matches the BIOS id string's "1000m", and falls in range with the 2.0 - 2.4 Gbps that Samsung is speccing their HBM2 memory chips at.
Arcturus' debut as a Radeon Instinct product follows the pattern of AMD debuting new big GPUs as low-volume/high-margin AI-ML accelerators first, followed by Radeon Pro and finally Radeon client graphics products. Arcturus is not "big Navi," rather it seems to be much closer to Vega than to Navi, which makes perfect sense given its target market. AMD's Linux sources mention "It's because Arcturus has not 3D engine", which could hint at what AMD did with this chip: take Vega and remove all 3D raster graphics ability, which shaves a few billion transistors off the silicon, freeing up space for more CUs. For gamers, AMD is planning a new line of Navi 20-series chips leveraging 7 nm EUV for launch throughout 2020. Various higher-ups at AMD, including its CEO, publicly hinted that a big client-segment GPU is in the works, and that the company is very much interested at taking another swing at premium 4K UHD gaming.
76 Comments on AMD Radeon Instinct MI100 "Arcturus" Hits the Radar, We Have its BIOS
Also, AMD already has a MI200 in the works. MI100 looks like it's gonna be pretty powerful, but MI200 will be even better, and is probably what they're gonna deploy in Frontier.
Out of all of those, FP64 units have become indispensable. It used to be that they were very expensive power and size wise and that's why GPUs of the past skimped on that but any real compute accelerator nowadays needs to have strong FP64 performance. 64 bit floating point is usually the de facto precision for simulations and that sort of stuff, you can do without tensor cores or INT8/INT4/FP16 but not without FP64 in a data center environment. That's why there have been no large Turing based Tesla's, because no one would have wanted them due to their poor FP64 performance.
I think there is a way for the software to read the specifications and put them into the database. No way, I smell much lower prices, after all, AMD has to regain some mindshare and lost positions. Probably fake limit of only 429 mm². That would mean bye-bye enthusiasts videocards.
Except extra battery and motor wouldn't be used for faster 0-60 and top speed rather hauling more stuff so a Tesla semi or pickup (not for gaming but workstation only).
Some things you get after reading this thread:
It's not a gaming card at all. It's workstation card through and through. But people will like a gaming card with this spec.
People don't understand that with raster stuff added back in this thing will become MASSIVE.
AMD also has brain-dead haters for their products who don't even understand what they're hating.
My personal take is that this is a good development. I've long prescribed that AMD should have different architectures for different markets instead of "jack of all trades, master of none". Now that CPU business is making them money, hope they spend clever money on R&D for RTG.
700mm die, at 1Ghz plus, and an interposer to hold it and HBM? I imagine yield loss is huge, but maybe they have perfected it so it's actually profitable. Now when do we get to see actual performance numbers?
420 mm² is impossible with these specs.
Have you got any source that they can manufacture larger than 429 mm² dies on N7+ and have you got any source that this particular chip is on N7+, and not in N7 ?
Obviously, this card can render. It's built around a normal GPU. It just can't provide a video signal - there are no outputs and no logic dedicated for this task.
It can be used in any scenario that can utilize GPGPU (including AI, obviously).
This is NOT similar to Tesla T4.
In green camp you have the V100, which is an all-mighty, all-round, dual-slot accelerator. Mi100 (like Mi60 now) will compete in this segment.
Nvidia also makes the Tesla T4, which has half of V100 Tensor (AI) potential, but just 5% of it's double-precision performance. T4 is single-slot, 1/4th of V100 price and uses 75W (V100 is up to 300W).
Which means that if you need V100's double-precision (all-round) performance, you buy a V100. You can't go wrong with this card.
But if you don't need it, you buy 2xTesla T4's - you get pretty much the same performance in stuff like Deep Learning (e.g. image recognition) for half the money and half the power.