Tuesday, June 25th 2024

AI Startup Etched Unveils Transformer ASIC Claiming 20x Speed-up Over NVIDIA H100

A new startup emerged out of stealth mode today to power the next generation of generative AI. Etched is a company that makes an application-specific integrated circuit (ASIC) to process "Transformers." The transformer is an architecture for designing deep learning models developed by Google and is now the powerhouse behind models like OpenAI's GPT-4o in ChatGPT, Anthropic Claude, Google Gemini, and Meta's Llama family. Etched wanted to create an ASIC for processing only the transformer models, making a chip called Sohu. The claim is Sohu outperforms NVIDIA's latest and greatest by an entire order of magnitude. Where a server configuration with eight NVIDIA H100 GPU clusters pushes Llama-3 70B models at 25,000 tokens per second, and the latest eight B200 "Blackwell" GPU cluster pushes 43,000 tokens/s, the eight Sohu clusters manage to output 500,000 tokens per second.

Why is this important? Not only does the ASIC outperform Hopper by 20x and Blackwell by 10x, but it also serves so many tokens per second that it enables an entirely new fleet of AI applications requiring real-time output. The Sohu architecture is so efficient that 90% of the FLOPS can be used, while traditional GPUs boast a 30-40% FLOP utilization rate. This translates into inefficiency and waste of power, which Etched hopes to solve by building an accelerator dedicated to power transformers (the "T" in GPT) at massive scales. Given that the frontier model development costs more than one billion US dollars, and hardware costs are measured in tens of billions of US Dollars, having an accelerator dedicated to powering a specific application can help advance AI faster. AI researchers often say that "scale is all you need" (resembling the legendary "attention is all you need" paper), and Etched wants to build on that.
However, there are some doubts going forward. While it is generally believed that transformers are the "future" of AI development, having an ASIC solves the problem until the operations change. For example, this is reminiscent of the crypto mining craze, which brought a few cycles of crypto ASIC miners that are now worthless pieces of sand, like Ethereum miners used to dig the ETH coin on proof of work staking, and now that ETH has transitioned to proof of stake, ETH mining ASICs are worthless.

Nonetheless, Etched wants the success formula to be simple: run transformer-based models on the Sohu ASIC with an open-source software ecosystem and scale it to massive sizes. While details are scarce, we know that the ASIC runs on 144 GB of HBM3E memory, and the chip is manufactured on TSMC's 4 nm process. Enabling AI models with 100 trillion parameters, more than 55x bigger than GPT-4's 1.8 trillion parameter design.
Source: Etched
Add your own comment

37 Comments on AI Startup Etched Unveils Transformer ASIC Claiming 20x Speed-up Over NVIDIA H100

#26
Vya Domus
dragontamer5788but as usual the question is if AMD's software can keep up.
Is there any actual example of an open source piece of software that runs much faster on equivalent Nvidia hardware simply because it's using CUDA ? HIP is basically a straight up copy of CUDA, it doesn't have some features like dynamic parallelism but there is no obvious reason that I can see for software just being intrinsically worse all of the time.
Posted on Reply
#28
dragontamer5788
Vya DomusIs there any actual example of an open source piece of software that runs much faster on equivalent Nvidia hardware simply because it's using CUDA ? HIP is basically a straight up copy of CUDA, it doesn't have some features like dynamic parallelism but there is no obvious reason that I can see for software just being intrinsically worse all of the time.
Its not about "running faster", its about "running at all".

AMD only officially supports HIP / ROCm on their most expensive MI GPUs, which are in the $5,000+ tier. There used to be cards like Rx 580 that worked for a while, but then the latest HIP / ROCm drops support and then bugs start to creep in. So now what? You either throw away the Rx 580 and upgrade to the Vega (or whatever HIP supports), only to find out that Vega64 loses support and its time to upgrade to Rx 7800. Etc. etc.

NVidia's software support simply lasts long enough for your projects to actually work. Case in point: try to run Blender's HIP on an Rx 580 or Vega.

Then, try CUDA on an NVidia 1080 Ti. Which still works.

----------

Even MI level chips, like the MI60, lose support faster than NVidia chips.
Posted on Reply
#29
Hakker
Specialized ASICs will always take over. Easiest way to look at it is Bitcoin. First it was all CPU, then GPU's came into play for great gains until ASIC came along and absolutely shattered GPU mining. The same is with the current A.I. rage. Full ASIC solutions will take it over. At the end of the day Nvidia's products are still general purpose solutions.
Posted on Reply
#30
Firedrops
AI hardware is where battery tech was 5 years ago - every 2 weeks someone will claim a 50 bajillion percent increase coming SoonTM.
Posted on Reply
#31
Vya Domus
dragontamer5788Its not about "running faster", its about "running at all".
But this isn't a matter of software being worse, it's a matter of software simply not being developed because Nvidia has a monopoly in these industries.
dragontamer5788AMD only officially supports HIP / ROCm on their most expensive MI GPUs, which are in the $5,000+ tier.
We weren't talking about consumer GPUs here.
Posted on Reply
#32
ScaLibBDP
>>...ASIC outperform Hopper by 20x and Blackwell by 10x...

During last a couple of years from time to time I hear news like that. It is very impressive, however, experts always look at benchmarks and in most cases companies do Not release benchmarks since it will show Real Performance (!) of a hardware and it could be very different to internal in-house made evaluations!

Next, if you're interested to see more Hardware News like this one take a look at The Linley Group youtube channel:

www.youtube.com/@LinleygroupVideos/videos

The channel has 109 videos ( I watched All of them! ), it is in a frozen state ( last video was uploaded 3 years ago ), and almost every second company was making statements that "...We Made It Better Than NVIDIA!..".

I didn't follow these companies since it would be a waste of time for me but I think that most of them do Not do well. At the same time NVIDIA made $22.6 billion in the last quarter ( ended on April 28th of 2024 ), and NVIDIA makes more and more money.
Posted on Reply
#33
dragontamer5788
Vya DomusBut this isn't a matter of software being worse, it's a matter of software simply not being developed because Nvidia has a monopoly in these industries.


We weren't talking about consumer GPUs here.
NVidia is ahead in things like Blender rendering because NVidia has the money to pay for software devs.

As anyone knows in any computing project: it's the software that's expensive. The hardware is whatever today. Even at NVidia prices, the software is where the bulk of the costs are going.

AMD has always made fine hardware. It's just the software support that's lacking. And yes, making sure that MI60 works for more than 5 years is important.

In this thread, people are talking about how T100 or other older NVidia cards can be used instead of H100 or other more recent chips.

Do you see anyone, anywhere, ever saying the same about MI25? MI60? MI100?

I get that AMD doesn't have the resources to keep software support on all of their GPUs. But... This crap is important to the people spending $100,000,000+ on software development on GPU platforms. You can't just cut support like AMD does every few years and expect a community to grow.

Eventually, AMD will make enough money to make a stable software platform for its GPUs. ROCm is better but people are still nervous about getting burned by previous losses of software support.
Posted on Reply
#34
Vya Domus
dragontamer5788And yes, making sure that MI60 works for more than 5 years is important.
First GPUs with CUDA support where dropped pretty soon as well, no one is using MI60s at this point in time anyway so no I really don't think it's important at all.
dragontamer5788NVidia is ahead in things like Blender rendering because NVidia has the money to pay for software devs.
I don't know why Blender is faster on Nvidia GPUs and neither do you or anyone else, the CUDA backend isn't open source as far as I know so we don't know why it's faster, it could be hardware related. This is the point that I am making, you keep saying the software is worse but I can't see any conclusive evidence that really is the case, software isn't being developed on the AMD side of things for obvious market share reasons but this isn't proof the software is inferior.
Posted on Reply
#35
dragontamer5788
Vya DomusFirst GPUs with CUDA support where dropped pretty soon as well, no one is using MI60s at this point in time anyway so no I really don't think it's important at all.
Uhhhh... MI60 was released in 2018. That's the same time as the P100 and the NVidia 1080, both of which have support for CUDA today (and even Blender rendering).

It says a lot about AMD's software support that AMD cannot support a professional level $5000+ card like MI60 as long as Nvidia can support a consumer card like the GTX 1080.
I don't know why Blender is faster on Nvidia GPUs and neither do you or anyone else, the CUDA backend isn't open source as far as I know so we don't know why it's faster, it could be hardware related. This is the point that I am making, you keep saying the software is worse but I can't see any conclusive evidence that really is the case, software isn't being developed on the AMD side of things for obvious market share reasons but this isn't proof the software is inferior.
blender/intern/cycles/kernel/device/gpu/kernel.h at main · blender/blender · GitHub

This is the CUDA / HIP / OneAPI source code to the Blender Cycles renderer kernel. You can see that its largely the same code between AMD, NVidia and Intel.

Note: AMD's original contributions to Blender were the OpenCL kernels. Which if you haven't noticed, has been completely thrown away by the Blender team by Blender 4.0. Instead, the NVidia CUDA code has remained the same and instead the CUDA code serves as the basis for HIP and OneAPI today.

I dare you to pretend that I don't know much about Blender's kernels or GPU code. I'm not a professional or anything, but I did spend some time studying this code to learn my hobby GPU abilities. I've been reading this code and following its development for years at this point (albeit at a hobby level, but its seriously one of the best demonstrations of how GPU code evolves over time in a real project). The good, the bad, the lessons learned... Blender team has experienced it and they've exerpienced it in public.

AMD's code and optimizations were thrown out with OpenCL. That's the problem, the code reached a dead end and couldn't be built on top of anymore. Blender's CUDA in contrast has over a decade of growth and stability.
Posted on Reply
#36
Vya Domus
dragontamer5788MI60 was released in 2018. That's the same time as the P100 and the NVidia 1080, both of which have support for CUDA today (and even Blender rendering).
That's not the point, it's unrealistic to expect that a piece of software would continue to support hardware released so early in it's lifecycle, CUDA was already a thing for like 8 years when P100 was released. Plus I am pretty sure you can still compile code that would run on an MI60 even if it's not officially supported anymore.
dragontamer5788This is the CUDA / HIP / OneAPI source code to the Blender Cycles renderer kernel. You can see that its largely the same code between AMD, NVidia and Intel.
If this is the case this doesn't support your argument, it means the software side is fine seeing as the code is the same everywhere, it's the hardware that makes the difference after all.
Posted on Reply
#37
KLMR
A render and a chart, thats what you need to "emerge".
Posted on Reply
Add your own comment
Dec 4th, 2024 04:03 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts