NVIDIA Announces up to 5x Faster TensorRT-LLM for Windows, and ChatGPT API-like Interface

btarunr · Nov 15, 2023

Even as CPU vendors are working to mainstream accelerated AI for client PCs, and Microsoft setting the pace for more AI in everyday applications with Windows 11 23H2 Update; NVIDIA is out there reminding you that every GeForce RTX GPU is an AI accelerator. This is thanks to its Tensor cores, and the SIMD muscle of the ubiquitous CUDA cores. NVIDIA has been making these for over 5 years now, and has an install base of over 100 million. The company is hence focusing on bring generative AI acceleration to more client- and enthusiast relevant use-cases, such as large language models.

NVIDIA at the Microsoft Ignite event announced new optimizations, models, and resources to bring accelerated AI to everyone with an NVIDIA GPU that meets the hardware requirements. To begin with, the company introduced an update to TensorRT-LLM for Windows, a library that leverages NVIDIA RTX architecture for accelerating large language models (LLMs). The new TensorRT-LLM version 0.6.0 will release later this month, and improve LLM inference performance by up to 5 times in terms of tokens per second, when compared to the initial release of TensorRT-LLM from October 2023. In addition, TensorRT-LLM 0.6.0 will introduce support for popular LLMs, including Mistral 7B and Nemtron-3 8B. Accelerating these two will require a GeForce RTX 30-series "Ampere" or 40-series "Ada" GPU with at least 8 GB of main memory.

OpenAI's ChatGPT is the hottest consumer application since Google, but it is a cloud-based service, which entails transmitting information over the Internet, and is limited in the size of data-sets; making it impractical for enterprises or organizations that require foolproof data privacy and limitless scaling for data-sets, which only a localized AI can provide. NVIDIA will soon be enabling TensorRT-LLM for Windows to support a similar interface to ChatAPI through a new wrapper. NVIDIA says that for those developing applications around ChatAPI, it takes changing just 2 lines of code to benefit from local AI. The new wrapper will work with any LLM that's optimized for TensorRT-LLM, such as Llama 2, Mistral, and NV LLM. The company plans to release this as a reference project on GitHub.

NVIDIA is working with Microsoft to accelerate Llama 2 and Stable Diffusion on RTX via new optimizations the DirectML API. Developers can experience these by downloading the latest ONNX runtime along with a new upcoming version of NVIDIA GeForce drivers that the company will release on November 21, 2023.

View at TechPowerUp Main Site

neatfeatguy · Nov 15, 2023

Of course it is, but only if you enable DLSS 3.0.

Legacy-ZA · Nov 15, 2023

Just enable Frame-Generation on RTX2000/3000 series, don't believe any of your excuses nVidia.

lexluthermiester · Nov 15, 2023

Legacy-ZA said:
Just enable Frame-Generation on RTX2000/3000 series, don't believe any of your excuses nVidia.

Right there with you.

the54thvoid · Nov 16, 2023

Worringly, it's another pitch to try and sell volume to business, and not the 'professional' cards either. So, with LLM's on the ascendancy (for now), NV can try and flog more RTX graphics cards to inflate prices and keep pushing gaming cards out of the reach of most gamers.

It'd be better if they pushed the professional line for that, as businesses can spend more and do legitimate tax write offs to recoup costs.

SOAREVERSOR · Nov 16, 2023

Legacy-ZA said:
Just enable Frame-Generation on RTX2000/3000 series, don't believe any of your excuses nVidia.

And the tech companies looked at the PC gaming peons and said, LOL NO! And they were right. And all was correct in the world again.

wolf · Nov 17, 2023

A worrying amount of straight up off topic posts here...

A good mate with a 3080 is playing around with stable diffusion, I'll see what he makes of all this too.

JAB Creations · Nov 17, 2023

Translation: they knowing put out terrible GPUs with ridiculously low VRAM, the AI boom is already cooling and they want to manipulate people to try to sell pallets that are just sitting in warehouses.

System Name	RBMK-1000
Processor	AMD Ryzen 7 5700G
Motherboard	Gigabyte B550 AORUS Elite V2
Cooling	DeepCool Gammax L240 V2
Memory	2x 16GB DDR4-3200
Video Card(s)	Galax RTX 4070 Ti EX
Storage	Samsung 990 1TB
Display(s)	BenQ 1440p 60 Hz 27-inch
Case	Corsair Carbide 100R
Audio Device(s)	ASUS SupremeFX S1220A
Power Supply	Cooler Master MWE Gold 650W
Mouse	ASUS ROG Strix Impact
Keyboard	Gamdias Hermes E2
Software	Windows 11 Pro

System Name	Personal / HTPC
Processor	Ryzen 5900x / Ryzen 5600X3D
Motherboard	Asrock x570 Phantom Gaming 4 /ASRock B550 Phantom Gaming
Cooling	Corsair H100i / bequiet! Pure Rock Slim 2
Memory	32GB DDR4 3200 / 16GB DDR4 3200
Video Card(s)	EVGA XC3 Ultra RTX 3080Ti / EVGA RTX 3060 XC
Storage	500GB Pro 970, 250 GB SSD, 1TB & 500GB Western Digital / lots
Display(s)	Dell - S3220DGF & S3222DGM 32"
Case	Titan Silent 2 / CM HAF XB Evo
Audio Device(s)	Logitech G35 headset
Power Supply	850W SeaSonic X Series / 750W SeaSonic X Series
Mouse	Logitech G502
Keyboard	Black Microsoft Natural Elite Keyboard
Software	Windows 10 Pro 64 / Windows 10 Pro 64

Processor	AMD Ryzen 9 5900X
Motherboard	ASUS ROG STRIX B550-F GAMING (WI-FI)
Cooling	Noctua NH-D15 G2
Memory	32GB G.Skill DDR4 3600Mhz CL18
Video Card(s)	ASUS RTX 5070Ti OC TUF
Storage	SAMSUNG 990 PRO 2TB
Display(s)	Dell S3220DGF
Case	Corsair iCUE 4000X
Audio Device(s)	ASUS Xonar D2X
Power Supply	Corsair AX760 Platinum
Mouse	Razer DeathAdder V2 - Wireless
Keyboard	Corsair K70 PRO - OPX Linear Switches
Software	Microsoft Windows 11 - Enterprise (64-bit)

Processor	Ryzen 7800X3D
Motherboard	MSI MAG Mortar B650 (wifi)
Cooling	be quiet! Dark Rock Pro 4
Memory	32GB Kingston Fury
Video Card(s)	MSI RTX 5080 Vanguard SOC
Storage	Seagate FireCuda 530 M.2 1TB / Samsumg 960 Pro M.2 512Gb
Display(s)	LG 32" 165Hz 1440p GSYNC
Case	Asus Prime AP201
Audio Device(s)	On Board
Power Supply	be quiet! Pure POwer M12 850w Gold (ATX3.0)
Software	W10

System Name	MightyX
Processor	Ryzen 9800X3D
Motherboard	Gigabyte B650I AX
Cooling	Scythe Fuma 2
Memory	32GB DDR5 6000 CL30 tuned
Video Card(s)	Palit Gamerock RTX 5080 oc
Storage	WD Black SN850X 2TB
Display(s)	LG 42C2 4K OLED
Case	Coolermaster NR200P
Audio Device(s)	LG SN5Y / Focal Clear
Power Supply	Corsair SF750 Platinum
Mouse	Corsair Dark Core RBG Pro SE
Keyboard	Glorious GMMK Compact w/pudding
VR HMD	Meta Quest 3
Software	case populated with Artic P12's
Benchmark Scores	4k120 OLED Gsync bliss

NVIDIA Announces up to 5x Faster TensorRT-LLM for Windows, and ChatGPT API-like Interface

btarunr

Editor & Senior Moderator

neatfeatguy

Legacy-ZA

lexluthermiester

the54thvoid

Super Intoxicated Moderator

SOAREVERSOR

wolf

Better Than Native

JAB Creations

Similar threads

Processor	AMD Ryzen 3800X / AMD 8350
Motherboard	ASRock X570 Phantom Gaming X / Gigabyte 990FXA-UD5 Revision 3.0
Cooling	Stock / Corsair H100
Memory	32GB / 24GB
Video Card(s)	Sapphire RX 6800 / AMD Radeon 290X (Toggling until 6950XT)
Storage	C:\ 1TB SSD, D:\ RAID-1 1TB SSD, 2x4TB-RAID-1
Display(s)	Samsung U32E850R
Case	be quiet! Dark Base Pro 900 Black rev. 2 / Fractal Design
Audio Device(s)	Creative Sound Blaster X-Fi
Power Supply	EVGA Supernova 1300G2 / EVGA Supernova 850G+
Mouse	Logitech M-U0007
Keyboard	Logitech G110 / Logitech G110