NVIDIA Triton Inference Server Running A100 Tensor Core GPUs Boosts Bing Advert Delivery

T0@st · Jun 6, 2023

Inference software enables shift to NVIDIA A100 Tensor Core GPUs, delivering 7x throughput for the search giant. Jiusheng Chen's team just got accelerated. They're delivering personalized ads to users of Microsoft Bing with 7x throughput at reduced cost, thanks to NVIDIA Triton Inference Server running on NVIDIA A100 Tensor Core GPUs. It's an amazing achievement for the principal software engineering manager and his crew.

Tuning a Complex System
Bing's ad service uses hundreds of models that are constantly evolving. Each must respond to a request within as little as 10 milliseconds, about 10x faster than the blink of an eye. The latest speedup got its start with two innovations the team delivered to make AI models run faster: Bang and EL-Attention. Together, they apply sophisticated techniques to do more work in less time with less computer memory. Model training was based on Azure Machine Learning for efficiency.

Flying With NVIDIA A100 MIG
Next, the team upgraded the ad service from NVIDIA T4 to A100 GPUs. The latter's Multi-Instance GPU (MIG) feature lets users split one GPU into several instances. Chen's team maxed out the MIG feature, transforming one physical A100 into seven independent ones. That let the team reap a 7x throughput per GPU with inference response in 10 ms.

Flexible, Easy, Open Software
Triton enabled the shift, in part, because it lets users simultaneously run different runtime software, frameworks and AI modes on isolated instances of a single GPU. The inference software comes in a software container, so it's easy to deploy. And open-source Triton - also available with enterprise-grade security and support through NVIDIA AI Enterprise - is backed by a community that makes the software better over time.

Accelerating Bing's ad system with Triton on A100 GPUs is one example of what Chen likes about his job. He gets to witness breakthroughs with AI.

While the scenarios often change, the team's goal remains the same - creating a win for its users and advertisers.

View at TechPowerUp Main Site | Source

P4-630 · Jun 6, 2023

T0@st said:
They're delivering personalized ads to users of Microsoft Bing with 7x throughput at reduced cost, thanks to NVIDIA Triton Inference Server running on NVIDIA A100 Tensor Core GPUs. It's an amazing achievement for the principal software engineering manager and his crew.

Great for people that like watching/clicking ads lol.....

Haile Selassie · Jun 6, 2023

Mankind creating most advanced chips to date only to use to serve ads better.

This world is doomed.

Lew Zealand · Jun 6, 2023

Oh yay and just when I was getting worried about the viability of advertising, in swoops our savior.

QuietBob · Jun 6, 2023

Try out the new & improved Bing!
Now serving you even more customized ads, seven times more efficiently :clap:

Chrispy_ · Jun 6, 2023

The only way AI can make Bing advertising better is a Skynet situation where Bing is the target.

Minus Infinity · Jun 7, 2023

Good for the heads up, I'll double down on making sure I never use Bing.

Dimitriman · Jun 7, 2023

Haile Selassie said:
Mankind creating most advanced chips to date only to use to serve ads better.

This world is doomed.

that is what it's all about. I look at the new apple AR ski googles and all I see is a more efficient ad delivery system. all silicon things are quickly becoming that, ad delivery platforms.

Jism · Jun 7, 2023

Haile Selassie said:
Mankind creating most advanced chips to date only to use to serve ads better.

This world is doomed.

If you have a fully functional AI system you can replace all the human workers. Profit because that's where less salary overall is paid out.

AI is going to kill alot of jobs. All the human interaction will be gone in the future. All led by AI models.

Lew Zealand · Jun 7, 2023

Jism said:
If you have a fully functional AI system you can replace all the human workers. Profit because that's where less salary overall is paid out.

AI is going to kill alot of jobs. All the human interaction will be gone in the future. All led by AI models.

Like steam power. Like electrification. Like air conditioning. Like computers. And so on...

System Name	The TPU Typewriter
Processor	AMD Ryzen 5 5600 (non-X)
Motherboard	GIGABYTE B550M DS3H Micro ATX
Cooling	DeepCool AS500
Memory	Kingston Fury Renegade RGB 32 GB (2 x 16 GB) DDR4-3600 CL16
Video Card(s)	PowerColor Radeon RX 7800 XT 16 GB Hellhound OC
Storage	Samsung 980 Pro 1 TB M.2-2280 PCIe 4.0 X4 NVME SSD
Display(s)	Lenovo Legion Y27q-20 27" QHD IPS monitor
Case	GameMax Spark M-ATX (re-badged Jonsbo D30)
Audio Device(s)	FiiO K7 Desktop DAC/Amp + Philips Fidelio X3 headphones, or ARTTI T10 Planar IEMs
Power Supply	ADATA XPG CORE Reactor 650 W 80+ Gold ATX
Mouse	Roccat Kone Pro Air
Keyboard	Cooler Master MasterKeys Pro L
Software	Windows 10 64-bit Home Edition

System Name	AlderLake
Processor	Intel i7 12700K P-Cores @ 5Ghz
Motherboard	Gigabyte Z690 Aorus Master
Cooling	Noctua NH-U12A 2 fans + Thermal Grizzly Kryonaut Extreme + 5 case fans
Memory	32GB DDR5 Corsair Dominator Platinum RGB 6000MT/s CL36
Video Card(s)	MSI RTX 2070 Super Gaming X Trio
Storage	Samsung 980 Pro 1TB + 970 Evo 500GB + 850 Pro 512GB + 860 Evo 1TB x2
Display(s)	23.8" Dell S2417DG 165Hz G-Sync 1440p
Case	Be quiet! Silent Base 600 - Window
Audio Device(s)	Panasonic SA-PMX94 / Realtek onboard + B&O speaker system / Harman Kardon Go + Play / Logitech G533
Power Supply	Seasonic Focus Plus Gold 750W
Mouse	Logitech MX Anywhere 2 Laser wireless
Keyboard	RAPOO E9270P Black 5GHz wireless
Software	Windows 11
Benchmark Scores	Cinebench R23 (Single Core) 1936 @ stock Cinebench R23 (Multi Core) 23006 @ stock

System Name	Gamey #2 / #3
Processor	Ryzen 7 5800X3D / Ryzen 7 5700X3D
Motherboard	Asrock B450M P4 / MSi B450 ProVDH M
Cooling	IDCool SE-226-XT / IDCool SE-224-XTS
Memory	32GB 3200 CL16 / 16GB 3200 CL16
Video Card(s)	Challenger B580 / Quadro M2000
Storage	4TB Team MP34 / 2TB WD SN570
Display(s)	LG 32GK650F 1440p 144Hz VA
Case	Corsair 4000Air / TT Versa H18
Power Supply	EVGA 650 G3 / EVGA BQ 500

Processor	5800X3D -30 CO
Motherboard	MSI B550 Tomahawk
Cooling	DeepCool Assassin III
Memory	32GB G.SKILL Ripjaws V @ 3800 CL14
Video Card(s)	ASRock MBA 7900XTX
Storage	1TB WD SN850X + 1TB ADATA SX8200 Pro
Display(s)	Dell S2721QS 4K60
Case	Cooler Master CM690 II Advanced USB 3.0
Audio Device(s)	Audiotrak Prodigy Cube Black (JRC MUSES 8820D) + CAL (recabled)
Power Supply	Seasonic Prime TX-750
Mouse	Logitech Cordless Desktop Wave
Keyboard	Logitech Cordless Desktop Wave
Software	Windows 10 Pro

System Name	Bragging Rights
Processor	Atom Z3735F 1.33GHz
Motherboard	It has no markings but it's green
Cooling	No, it's a 2.2W processor
Memory	2GB DDR3L-1333
Video Card(s)	Gen7 Intel HD (4EU @ 311MHz)
Storage	32GB eMMC and 128GB Sandisk Extreme U3
Display(s)	10" IPS 1280x800 60Hz
Case	Veddha T2
Audio Device(s)	Apparently, yes
Power Supply	Samsung 18W 5V fast-charger
Mouse	MX Anywhere 2
Keyboard	Logitech MX Keys (not Cherry MX at all)
VR HMD	Samsung Oddyssey, not that I'd plug it into this though....
Software	W10 21H1, barely
Benchmark Scores	I once clocked a Celeron-300A to 564MHz on an Abit BE6 and it scored over 9000.

NVIDIA Triton Inference Server Running A100 Tensor Core GPUs Boosts Bing Advert Delivery

News Editor