- Joined
- Aug 19, 2017
- Messages
- 2,606 (0.98/day)
A new startup emerged out of stealth mode today to power the next generation of generative AI. Etched is a company that makes an application-specific integrated circuit (ASIC) to process "Transformers." The transformer is an architecture for designing deep learning models developed by Google and is now the powerhouse behind models like OpenAI's GPT-4o in ChatGPT, Anthropic Claude, Google Gemini, and Meta's Llama family. Etched wanted to create an ASIC for processing only the transformer models, making a chip called Sohu. The claim is Sohu outperforms NVIDIA's latest and greatest by an entire order of magnitude. Where a server configuration with eight NVIDIA H100 GPU clusters pushes Llama-3 70B models at 25,000 tokens per second, and the latest eight B200 "Blackwell" GPU cluster pushes 43,000 tokens/s, the eight Sohu clusters manage to output 500,000 tokens per second.
Why is this important? Not only does the ASIC outperform Hopper by 20x and Blackwell by 10x, but it also serves so many tokens per second that it enables an entirely new fleet of AI applications requiring real-time output. The Sohu architecture is so efficient that 90% of the FLOPS can be used, while traditional GPUs boast a 30-40% FLOP utilization rate. This translates into inefficiency and waste of power, which Etched hopes to solve by building an accelerator dedicated to power transformers (the "T" in GPT) at massive scales. Given that the frontier model development costs more than one billion US dollars, and hardware costs are measured in tens of billions of US Dollars, having an accelerator dedicated to powering a specific application can help advance AI faster. AI researchers often say that "scale is all you need" (resembling the legendary "attention is all you need" paper), and Etched wants to build on that.
However, there are some doubts going forward. While it is generally believed that transformers are the "future" of AI development, having an ASIC solves the problem until the operations change. For example, this is reminiscent of the crypto mining craze, which brought a few cycles of crypto ASIC miners that are now worthless pieces of sand, like Ethereum miners used to dig the ETH coin on proof of work staking, and now that ETH has transitioned to proof of stake, ETH mining ASICs are worthless.
Nonetheless, Etched wants the success formula to be simple: run transformer-based models on the Sohu ASIC with an open-source software ecosystem and scale it to massive sizes. While details are scarce, we know that the ASIC runs on 144 GB of HBM3E memory, and the chip is manufactured on TSMC's 4 nm process. Enabling AI models with 100 trillion parameters, more than 55x bigger than GPT-4's 1.8 trillion parameter design.
View at TechPowerUp Main Site | Source
Why is this important? Not only does the ASIC outperform Hopper by 20x and Blackwell by 10x, but it also serves so many tokens per second that it enables an entirely new fleet of AI applications requiring real-time output. The Sohu architecture is so efficient that 90% of the FLOPS can be used, while traditional GPUs boast a 30-40% FLOP utilization rate. This translates into inefficiency and waste of power, which Etched hopes to solve by building an accelerator dedicated to power transformers (the "T" in GPT) at massive scales. Given that the frontier model development costs more than one billion US dollars, and hardware costs are measured in tens of billions of US Dollars, having an accelerator dedicated to powering a specific application can help advance AI faster. AI researchers often say that "scale is all you need" (resembling the legendary "attention is all you need" paper), and Etched wants to build on that.
However, there are some doubts going forward. While it is generally believed that transformers are the "future" of AI development, having an ASIC solves the problem until the operations change. For example, this is reminiscent of the crypto mining craze, which brought a few cycles of crypto ASIC miners that are now worthless pieces of sand, like Ethereum miners used to dig the ETH coin on proof of work staking, and now that ETH has transitioned to proof of stake, ETH mining ASICs are worthless.
Nonetheless, Etched wants the success formula to be simple: run transformer-based models on the Sohu ASIC with an open-source software ecosystem and scale it to massive sizes. While details are scarce, we know that the ASIC runs on 144 GB of HBM3E memory, and the chip is manufactured on TSMC's 4 nm process. Enabling AI models with 100 trillion parameters, more than 55x bigger than GPT-4's 1.8 trillion parameter design.
View at TechPowerUp Main Site | Source