- Joined
- May 21, 2024
- Messages
- 675 (3.61/day)
Today at Hot Chips 2024, FuriosaAI is pulling back the curtain on RNGD (pronounced "Renegade"), our new AI accelerator designed for high-performance, highly efficient large language model (LLM) and multimodal model inference in data centers. As part of his Hot Chips presentation, Furiosa co-founder and CEO June Paik is sharing technical details and providing the first hands-on look at the fully functioning RNGD card.
With a TDP of 150 watts, a novel chip architecture, and advanced memory technology like HBM3, RNGD is optimized for inference with demanding LLMs and multimodal models. It's built to deliver high performance, power efficiency, and programmability all in a single product - a trifecta that the industry has struggled to achieve in GPUs and other AI chips.
A key milestone for RNGD
As industry experts will know, the process of squeezing every drop of performance from a chip takes many steps. Furiosa achieved a full bring-up of RNGD just weeks after obtaining the first silicon samples - an exceptionally rapid timeline in the chip industry. TSMC delivered the first RNGD chips in May, we booted the hardware less than a week later, and we were running industry standard Llama 3.1 models in early June.
We started delivering the first RNGD silicon to early access customers in July and showed our first private demo last week. There's much more work to do before RNGD is running in data centers around the world, but we've reached an exciting milestone and we're pleased to be able to share these updates on our progress.
With more updates to come
Our priority now is refining our software stack as we ramp up RNGD production. This roadmap follows our successful track record with Furiosa's first-generation chip, introduced in 2021.
With our first-gen product, which targeted computer vision applications in data centers and edge server deployments, Furiosa submitted our first MLPerf benchmark results three weeks after receiving first silicon. We then used compiler enhancements to achieve a 113% performance increase in the next MLPerf submission six months later.
This is a typical path for new silicon. For example, six months after launching their powerful H100 chip and submitting it to MLPerf, NVIDIA announced 2.4x performance improvements achieved entirely through software improvements.
The process will be similar with RNGD. Right now, a single RNGD is generating about 12 queries per second when running the GPT-J 6B model, but we expect that number to increase as we refine our software stack over the coming weeks and months. We're also sharing RNGD target performance numbers on several LLMs:
Furiosa has deliberately kept a low profile until now, because we know the industry doesn't need more hype and bold promises about things that don't yet exist. (Also, Furiosa is 95% engineers, so marketing hasn't exactly been top of mind.)
Stay informed on the latest RNGD news
But Hot Chips is an exciting turning point for Furiosa and RNGD. If you come by our Hot Chips booth this week, you'll see we've brought a large engineering team to talk with anyone who is interested in our work. We're eager to hear what the AI community thinks of RNGD, what questions you have, and what you want to hear from us as we work to make the chip widely available in early 2025. We'll also showcase the first live demo of RNGD.
Stay tuned for more benchmark results, availability details, and other updates in the coming weeks and months.
View at TechPowerUp Main Site | Source
With a TDP of 150 watts, a novel chip architecture, and advanced memory technology like HBM3, RNGD is optimized for inference with demanding LLMs and multimodal models. It's built to deliver high performance, power efficiency, and programmability all in a single product - a trifecta that the industry has struggled to achieve in GPUs and other AI chips.
A key milestone for RNGD
As industry experts will know, the process of squeezing every drop of performance from a chip takes many steps. Furiosa achieved a full bring-up of RNGD just weeks after obtaining the first silicon samples - an exceptionally rapid timeline in the chip industry. TSMC delivered the first RNGD chips in May, we booted the hardware less than a week later, and we were running industry standard Llama 3.1 models in early June.
We started delivering the first RNGD silicon to early access customers in July and showed our first private demo last week. There's much more work to do before RNGD is running in data centers around the world, but we've reached an exciting milestone and we're pleased to be able to share these updates on our progress.
With more updates to come
Our priority now is refining our software stack as we ramp up RNGD production. This roadmap follows our successful track record with Furiosa's first-generation chip, introduced in 2021.
With our first-gen product, which targeted computer vision applications in data centers and edge server deployments, Furiosa submitted our first MLPerf benchmark results three weeks after receiving first silicon. We then used compiler enhancements to achieve a 113% performance increase in the next MLPerf submission six months later.
This is a typical path for new silicon. For example, six months after launching their powerful H100 chip and submitting it to MLPerf, NVIDIA announced 2.4x performance improvements achieved entirely through software improvements.
The process will be similar with RNGD. Right now, a single RNGD is generating about 12 queries per second when running the GPT-J 6B model, but we expect that number to increase as we refine our software stack over the coming weeks and months. We're also sharing RNGD target performance numbers on several LLMs:
Furiosa has deliberately kept a low profile until now, because we know the industry doesn't need more hype and bold promises about things that don't yet exist. (Also, Furiosa is 95% engineers, so marketing hasn't exactly been top of mind.)
Stay informed on the latest RNGD news
But Hot Chips is an exciting turning point for Furiosa and RNGD. If you come by our Hot Chips booth this week, you'll see we've brought a large engineering team to talk with anyone who is interested in our work. We're eager to hear what the AI community thinks of RNGD, what questions you have, and what you want to hear from us as we work to make the chip widely available in early 2025. We'll also showcase the first live demo of RNGD.
Stay tuned for more benchmark results, availability details, and other updates in the coming weeks and months.
View at TechPowerUp Main Site | Source