Meta Announces New MTIA AI Accelerator with Improved Performance to Ease NVIDIA's Grip

AleksandarK · Apr 11, 2024

Meta has announced the next generation of its Meta Training and Inference Accelerator (MTIA) chip, which is designed to train and infer AI models at scale. The newest MTIA chip is a second-generation design of Meta's custom silicon for AI, and it is being built on TSMC's 5 nm technology. Running at the frequency of 1.35 GHz, the new chip is getting a boost to 90 Watts of TDP per package compared to just 25 Watts for the first-generation design. Basic Linear Algebra Subprograms (BLAS) processing is where the chip shines, and it includes matrix multiplication and vector/SIMD processing. At GEMM matrix processing, each chip can process 708 TeraFLOPS at INT8 (presumably meant FP8 in the spec) with sparsity, 354 TeraFLOPS without, 354 TeraFLOPS at FP16/BF16 with sparsity, and 177 TeraFLOPS without.

Classical vector and processing is a bit slower at 11.06 TeraFLOPS at INT8 (FP8), 5.53 TeraFLOPS at FP16/BF16, and 2.76 TFLOPS single-precision FP32. The MTIA chip is specifically designed to run AI training and inference on Meta's PyTorch AI framework, with an open-source Triton backend that produces compiler code for optimal performance. Meta uses this for all its Llama models, and with Llama3 just around the corner, it could be trained on these chips. To package it into a system, Meta puts two of these chips onto a board and pairs them with 128 GB of LPDDR5 memory. The board is connected via PCIe Gen 5 to a system where 12 boards are stacked densely. This process is repeated six times in a single rack for 72 boards and 144 chips in a single rack for a total of 101.95 PetaFLOPS, assuming linear scaling at INT8 (FP8) precision. Of course, linear scaling is not quite possible in scale-out systems, which could bring it down to under 100 PetaFLOPS per rack.

Below, you can see images of the chip floorplan, specifications compared to the prior version, as well as the system.

View at TechPowerUp Main Site | Source

Wirko · Apr 11, 2024

Thanks for correcting Meta's "TFLOPS/s" from their AI-generated specifications list to TFLOPS.

AleksandarK · Apr 11, 2024

Wirko said:
Thanks for correcting Meta's "TFLOPS/s" from their AI-generated specifications list to TFLOPS.

Yeah I noticed that as well. The S literally stands for Second so not sure why do it again. ¯\_(ツ)_/¯

Daven · Apr 11, 2024

AleksandarK said:
Yeah I noticed that as well. The S literally stands for Second so not sure why do it again. ¯\_(ツ)_/¯

Marketing team ‘gone wild’ I guess.

not_my_real_name · Apr 11, 2024

AleksandarK said:
Yeah I noticed that as well. The S literally stands for Second so not sure why do it again. ¯\_(ツ)_/¯

AI acceleration, you know...

Daven · Apr 11, 2024

not_my_real_name said:
AI acceleration, you know...

Lol, I get it, nice joke. Meters per second per second and all that.

tomo82 · Apr 11, 2024

Anyone notice from spec sheet: nearly 4x power for 'only' 2x performance of 1st Gen?

Edit: Was looking at the Instances flops, maybe that is a bad comparison ¯\_(ツ)_/¯

Wirko · Apr 11, 2024

Daven said:
Lol, I get it, nice joke. Meters per second per second and all that.

But if they are actually right... then we better run... and run fast!

konga · Apr 11, 2024

Owen1982 said:
Anyone notice from spec sheet: nearly 4x power for 'only' 2x performance of 1st Gen?

Edit: Was looking at the Instances flops, maybe that is a bad comparison ¯\_(ツ)_/¯

I don't know much about AI compute specs, but it's definitely over 3x faster in several metrics in their spec sheet. This is a card designed primarily for their own use, so it only really needs to be faster in the ways that matter to them, anyway.

Steevo · Apr 11, 2024

I too RPMs at that.

ThrashZone · Apr 11, 2024

Hi,
More worrying is Meta's grip a known bad actor hehe

the54thvoid · Apr 11, 2024

ThrashZone said:
Hi,
More worrying is Meta's grip a known bad actor hehe

They're a bad actor for all sides - that makes them chaotic evil, I believe.

And with a bit of PR, they think they can swipe away Nvidia's marketshare? Don't think it works that way.

Wirko · Apr 11, 2024

konga said:
This is a card designed primarily for their own use, so it only really needs to be faster in the ways that matter to them, anyway.

They have published detailed specs, it looks like they are going to sell the card to others too.

Onasi · Apr 11, 2024

the54thvoid said:
And with a bit of PR, they think they can swipe away Nvidia's marketshare? Don't think it works that way.

It’s Meta, the company who apparently thought that saying “metaverse” enough times and even rebranding themselves as it would inevitably lead to said stupid idea becoming a reality and making them bank. They are Chaotic Evil alright, also in the sense that any rational thought had left the building a while ago.

Wirko · Apr 11, 2024

ThrashZone said:
Hi,
More worrying is Meta's grip a known bad actor hehe

At least they've released some interesting stuff as open source - maybe they'll release the software ecosystem for this chip.

persondb · Apr 11, 2024

Wirko said:
Thanks for correcting Meta's "TFLOPS/s" from their AI-generated specifications list to TFLOPS.

It's even funnier because they put it as

Classical vector and processing is a bit slower at 11.06 TeraFLOPS at INT8 (...)

INT8 isn't 'FLOPS' as FLOPS' are 'FLoating point OPerations per Second'
There is no floating point in int8...

AleksandarK · Apr 11, 2024

persondb said:
It's even funnier because they put it as

INT8 isn't 'FLOPS' as FLOPS' are 'FLoating point OPerations per Second'
There is no floating point in int8...

Which is true. I assume they meant FP8, which is the hot new low-precision format everyone is pushing.

ToTTenTranz · Apr 11, 2024

Unless Meta is going to be selling these as AIB products, it's not really going to ease Nvidia's grip.

Nvidia has plenty of competition in cloud services regardless of the hardware. What the market is lacking is competition in hardware that can be bought to run in the clients' installations.

Solaris17 · Apr 12, 2024

ToTTenTranz said:
Unless Meta is going to be selling these as AIB products, it's not really going to ease Nvidia's grip.

It’s eases the grip Nvidia has on them.

Minus Infinity · Apr 15, 2024

Honestly if it came to a choice, I would choose Nvidia over Meta any day of the week. Fcukerberg is one of the three biggest scum on the planet. Huang an amateur compared to this clown. I put Meta and Google in the same category. Nvidia is next tier down.

Processor	i5-6600K
Motherboard	Asus Z170A
Cooling	some cheap Cooler Master Hyper 103 or similar
Memory	16GB DDR4-2400
Video Card(s)	IGP
Storage	Samsung 850 EVO 250GB
Display(s)	2x Oldell 24" 1920x1200
Case	Bitfenix Nova white windowless non-mesh
Audio Device(s)	E-mu 1212m PCI
Power Supply	Seasonic G-360
Mouse	Logitech Marble trackball, never had a mouse
Keyboard	Key Tronic KT2000, no Win key because 1994
Software	Oldwin

System Name	work PC
Processor	AMD Ryzen 5 1600
Motherboard	ASRock AB350 Pro4
Cooling	DeepCool GAMMAXX C40
Memory	32GB (HyperX Fury 16GB DDR4 PC4-21300 x2)
Video Card(s)	Palit GeForce RTX 2070 Dual 8GB
Storage	Samsung SSD 970 EVO Plus 250GB + 4TB WD R HDD
Display(s)	LG (27") + ViewSonic VP2030b (20")
Case	Zalman S1
Audio Device(s)	Superlux HD 681, etc.
Mouse	Redragon M650
Keyboard	logitech K280e

System Name	Computer!
Processor	i7-6700K
Motherboard	AsRock Z170 Extreme 7+
Cooling	EKWB on CPU & GPU, 240 slim and 360 Monsta, Aquacomputer Aquabus D5, Aquaaero 6 Pro.
Memory	32Gb Kingston Hyper-X 3Ghz
Video Card(s)	Asus 980 Ti Strix
Storage	2 x 950 Pro
Display(s)	Old Acer thing
Case	NZXT 440 Modded
Audio Device(s)	onboard
Power Supply	Seasonic PII 600W Platinum
Mouse	Razer Deathadder Chroma
Keyboard	Logitech G15
Software	Win 10 Pro

Processor	i5-6600K
Motherboard	Asus Z170A
Cooling	some cheap Cooler Master Hyper 103 or similar
Memory	16GB DDR4-2400
Video Card(s)	IGP
Storage	Samsung 850 EVO 250GB
Display(s)	2x Oldell 24" 1920x1200
Case	Bitfenix Nova white windowless non-mesh
Audio Device(s)	E-mu 1212m PCI
Power Supply	Seasonic G-360
Mouse	Logitech Marble trackball, never had a mouse
Keyboard	Key Tronic KT2000, no Win key because 1994
Software	Oldwin

System Name	Compy 386
Processor	7800X3D
Motherboard	Asus
Cooling	Air for now.....
Memory	64 GB DDR5 6400Mhz
Video Card(s)	7900XTX 310 Merc
Storage	Samsung 990 2TB, 2 SP 2TB SSDs, 24TB Enterprise drives
Display(s)	55" Samsung 4K HDR
Audio Device(s)	ATI HDMI
Mouse	Logitech MX518
Keyboard	Razer
Software	A lot.
Benchmark Scores	Its fast. Enough.

Meta Announces New MTIA AI Accelerator with Improved Performance to Ease NVIDIA's Grip

AleksandarK

News Editor

Wirko

AleksandarK

News Editor

Daven

not_my_real_name

Daven

tomo82

Wirko

konga

Steevo

ThrashZone

the54thvoid

Super Intoxicated Moderator

Wirko

Onasi

Wirko

persondb

AleksandarK

News Editor

ToTTenTranz

Solaris17

Super Dainty Moderator

Minus Infinity

System Name	Ghetto Rigs z490\|x99\|Acer 17 Nitro 7840hs/ 5600c40-2x16/ 4060/ 1tb acer stock m.2/ 4tb sn850x
Processor	10900k w/Optimus Foundation \| 5930k w/Black Noctua D15
Motherboard	z490 Maximus XII Apex \| x99 Sabertooth
Cooling	oCool D5 res-combo/280 GTX/ Optimus Foundation/ gpu water block \| Blk D15
Memory	Trident-Z Royal 4000c16 2x16gb \| Trident-Z 3200c14 4x8gb
Video Card(s)	Titan Xp-water \| evga 980ti gaming-w/ air
Storage	970evo+500gb & sn850x 4tb \| 860 pro 256gb \| Acer m.2 1tb/ sn850x 4tb\| Many2.5" sata's ssd 3.5hdd's
Display(s)	1-AOC G2460PG 24"G-Sync 144Hz/ 2nd 1-ASUS VG248QE 24"/ 3rd LG 43" series
Case	D450 \| Cherry Entertainment center on Test bench
Audio Device(s)	Built in Realtek x2 with 2-Insignia 2.0 sound bars & 1-LG sound bar
Power Supply	EVGA 1000P2 with APC AX1500 \| 850P2 with CyberPower-GX1325U
Mouse	Redragon 901 Perdition x3
Keyboard	G710+x3
Software	Win-7 pro x3 and win-10 & 11pro x3
Benchmark Scores	Are in the benchmark section

Processor	Ryzen 7800X3D
Motherboard	MSI MAG Mortar B650 (wifi)
Cooling	be quiet! Dark Rock Pro 4
Memory	32GB Kingston Fury
Video Card(s)	Gainward RTX4070ti
Storage	Seagate FireCuda 530 M.2 1TB / Samsumg 960 Pro M.2 512Gb
Display(s)	LG 32" 165Hz 1440p GSYNC
Case	Asus Prime AP201
Audio Device(s)	On Board
Power Supply	be quiet! Pure POwer M12 850w Gold (ATX3.0)
Software	W10

System Name	The Workhorse
Processor	AMD Ryzen R9 5900X
Motherboard	Gigabyte Aorus B550 Pro
Cooling	CPU - Noctua NH-D15S Case - 3 Noctua NF-A14 PWM at the bottom, 2 Fractal Design 180mm at the front
Memory	GSkill Trident Z 3200CL14
Video Card(s)	NVidia GTX 1070 MSI QuickSilver
Storage	Adata SX8200Pro
Display(s)	LG 32GK850G
Case	Fractal Design Torrent (Solid)
Audio Device(s)	FiiO E-10K DAC/Amp, Samson Meteorite USB Microphone
Power Supply	Corsair RMx850 (2018)
Mouse	Razer Viper (Original) on a X-Raypad Equate Plus V2
Keyboard	Cooler Master QuickFire Rapid TKL keyboard (Cherry MX Black)
Software	Windows 11 Pro (24H2)

Processor	Ryzen 9 5900X
Motherboard	Gigabyte X570 Aorus Pro
Cooling	AiO 240mm
Memory	2x 32GB Kingston Fury Beast 3600MHz CL18
Video Card(s)	Radeon RX 6900XT Reference (amd.com)
Storage	O.S.: 256GB SATA \| 2x 1TB SanDisk SSD SATA Data \| Games: 1TB Samsung 970 Evo
Display(s)	LG 34" UWQHD
Audio Device(s)	X-Fi XtremeMusic + Gigaworks SB750 7.1 THX
Power Supply	XFX 850W
Mouse	Logitech G502 Wireless
VR HMD	Lenovo Explorer
Software	Windows 10 64bit

System Name	RogueOne
Processor	Xeon W9-3495x
Motherboard	ASUS w790E Sage SE
Cooling	SilverStone XE360-4677
Memory	128gb Gskill Zeta R5 DDR5 RDIMMs
Video Card(s)	MSI SUPRIM Liquid X 4090
Storage	1x 2TB WD SN850X \| 2x 8TB GAMMIX S70
Display(s)	49" Philips Evnia OLED (49M2C8900)
Case	Thermaltake Core P3 Pro Snow
Audio Device(s)	Moondrop S8's on schitt Gunnr
Power Supply	Seasonic Prime TX-1600
Mouse	Razer Viper mini signature edition (mercury white)
Keyboard	Monsgeek M3 Lavender, Moondrop Luna lights
VR HMD	Quest 3
Software	Windows 11 Pro Workstation
Benchmark Scores	I dont have time for that.