• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

NVIDIA Announces up to 5x Faster TensorRT-LLM for Windows, and ChatGPT API-like Interface

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
47,230 (7.55/day)
Location
Hyderabad, India
System Name RBMK-1000
Processor AMD Ryzen 7 5700G
Motherboard ASUS ROG Strix B450-E Gaming
Cooling DeepCool Gammax L240 V2
Memory 2x 8GB G.Skill Sniper X
Video Card(s) Palit GeForce RTX 2080 SUPER GameRock
Storage Western Digital Black NVMe 512GB
Display(s) BenQ 1440p 60 Hz 27-inch
Case Corsair Carbide 100R
Audio Device(s) ASUS SupremeFX S1220A
Power Supply Cooler Master MWE Gold 650W
Mouse ASUS ROG Strix Impact
Keyboard Gamdias Hermes E2
Software Windows 11 Pro
Even as CPU vendors are working to mainstream accelerated AI for client PCs, and Microsoft setting the pace for more AI in everyday applications with Windows 11 23H2 Update; NVIDIA is out there reminding you that every GeForce RTX GPU is an AI accelerator. This is thanks to its Tensor cores, and the SIMD muscle of the ubiquitous CUDA cores. NVIDIA has been making these for over 5 years now, and has an install base of over 100 million. The company is hence focusing on bring generative AI acceleration to more client- and enthusiast relevant use-cases, such as large language models.

NVIDIA at the Microsoft Ignite event announced new optimizations, models, and resources to bring accelerated AI to everyone with an NVIDIA GPU that meets the hardware requirements. To begin with, the company introduced an update to TensorRT-LLM for Windows, a library that leverages NVIDIA RTX architecture for accelerating large language models (LLMs). The new TensorRT-LLM version 0.6.0 will release later this month, and improve LLM inference performance by up to 5 times in terms of tokens per second, when compared to the initial release of TensorRT-LLM from October 2023. In addition, TensorRT-LLM 0.6.0 will introduce support for popular LLMs, including Mistral 7B and Nemtron-3 8B. Accelerating these two will require a GeForce RTX 30-series "Ampere" or 40-series "Ada" GPU with at least 8 GB of main memory.



OpenAI's ChatGPT is the hottest consumer application since Google, but it is a cloud-based service, which entails transmitting information over the Internet, and is limited in the size of data-sets; making it impractical for enterprises or organizations that require foolproof data privacy and limitless scaling for data-sets, which only a localized AI can provide. NVIDIA will soon be enabling TensorRT-LLM for Windows to support a similar interface to ChatAPI through a new wrapper. NVIDIA says that for those developing applications around ChatAPI, it takes changing just 2 lines of code to benefit from local AI. The new wrapper will work with any LLM that's optimized for TensorRT-LLM, such as Llama 2, Mistral, and NV LLM. The company plans to release this as a reference project on GitHub.

NVIDIA is working with Microsoft to accelerate Llama 2 and Stable Diffusion on RTX via new optimizations the DirectML API. Developers can experience these by downloading the latest ONNX runtime along with a new upcoming version of NVIDIA GeForce drivers that the company will release on November 21, 2023.

View at TechPowerUp Main Site
 
Joined
May 18, 2009
Messages
2,950 (0.52/day)
Location
MN
System Name Personal / HTPC
Processor Ryzen 5900x / Ryzen 5600X3D
Motherboard Asrock x570 Phantom Gaming 4 /ASRock B550 Phantom Gaming
Cooling Corsair H100i / bequiet! Pure Rock Slim 2
Memory 32GB DDR4 3200 / 16GB DDR4 3200
Video Card(s) EVGA XC3 Ultra RTX 3080Ti / EVGA RTX 3060 XC
Storage 500GB Pro 970, 250 GB SSD, 1TB & 500GB Western Digital / lots
Display(s) Dell - S3220DGF & S3222DGM 32"
Case CoolerMaster HAF XB Evo / CM HAF XB Evo
Audio Device(s) Logitech G35 headset
Power Supply 850W SeaSonic X Series / 750W SeaSonic X Series
Mouse Logitech G502
Keyboard Black Microsoft Natural Elite Keyboard
Software Windows 10 Pro 64 / Windows 10 Pro 64
Of course it is, but only if you enable DLSS 3.0.
 
Joined
Dec 14, 2011
Messages
1,031 (0.22/day)
Location
South-Africa
Processor AMD Ryzen 9 5900X
Motherboard ASUS ROG STRIX B550-F GAMING (WI-FI)
Cooling Corsair iCUE H115i Elite Capellix 280mm
Memory 32GB G.Skill DDR4 3600Mhz CL18
Video Card(s) ASUS GTX 1650 TUF
Storage Sabrent Rocket 1TB M.2
Display(s) Dell S3220DGF
Case Corsair iCUE 4000X
Audio Device(s) ASUS Xonar D2X
Power Supply Corsair AX760 Platinum
Mouse Razer DeathAdder V2 - Wireless
Keyboard Redragon K618 RGB PRO
Software Microsoft Windows 11 - Enterprise (64-bit)
Just enable Frame-Generation on RTX2000/3000 series, don't believe any of your excuses nVidia.
 

the54thvoid

Super Intoxicated Moderator
Staff member
Joined
Dec 14, 2009
Messages
13,047 (2.39/day)
Location
Glasgow - home of formal profanity
Processor Ryzen 7800X3D
Motherboard MSI MAG Mortar B650 (wifi)
Cooling be quiet! Dark Rock Pro 4
Memory 32GB Kingston Fury
Video Card(s) Gainward RTX4070ti
Storage Seagate FireCuda 530 M.2 1TB / Samsumg 960 Pro M.2 512Gb
Display(s) LG 32" 165Hz 1440p GSYNC
Case Asus Prime AP201
Audio Device(s) On Board
Power Supply be quiet! Pure POwer M12 850w Gold (ATX3.0)
Software W10
Worringly, it's another pitch to try and sell volume to business, and not the 'professional' cards either. So, with LLM's on the ascendancy (for now), NV can try and flog more RTX graphics cards to inflate prices and keep pushing gaming cards out of the reach of most gamers.

It'd be better if they pushed the professional line for that, as businesses can spend more and do legitimate tax write offs to recoup costs.
 
Joined
Apr 13, 2022
Messages
1,174 (1.23/day)
Just enable Frame-Generation on RTX2000/3000 series, don't believe any of your excuses nVidia.
And the tech companies looked at the PC gaming peons and said, LOL NO! And they were right. And all was correct in the world again.
 

wolf

Better Than Native
Joined
May 7, 2007
Messages
8,168 (1.27/day)
System Name MightyX
Processor Ryzen 5800X3D
Motherboard Gigabyte X570 I Aorus Pro WiFi
Cooling Scythe Fuma 2
Memory 32GB DDR4 3600 CL16
Video Card(s) Asus TUF RTX3080 Deshrouded
Storage WD Black SN850X 2TB
Display(s) LG 42C2 4K OLED
Case Coolermaster NR200P
Audio Device(s) LG SN5Y / Focal Clear
Power Supply Corsair SF750 Platinum
Mouse Corsair Dark Core RBG Pro SE
Keyboard Glorious GMMK Compact w/pudding
VR HMD Meta Quest 3
Software case populated with Artic P12's
Benchmark Scores 4k120 OLED Gsync bliss
A worrying amount of straight up off topic posts here...

A good mate with a 3080 is playing around with stable diffusion, I'll see what he makes of all this too.
 
Joined
May 13, 2015
Messages
632 (0.18/day)
Processor AMD Ryzen 3800X / AMD 8350
Motherboard ASRock X570 Phantom Gaming X / Gigabyte 990FXA-UD5 Revision 3.0
Cooling Stock / Corsair H100
Memory 32GB / 24GB
Video Card(s) Sapphire RX 6800 / AMD Radeon 290X (Toggling until 6950XT)
Storage C:\ 1TB SSD, D:\ RAID-1 1TB SSD, 2x4TB-RAID-1
Display(s) Samsung U32E850R
Case be quiet! Dark Base Pro 900 Black rev. 2 / Fractal Design
Audio Device(s) Creative Sound Blaster X-Fi
Power Supply EVGA Supernova 1300G2 / EVGA Supernova 850G+
Mouse Logitech M-U0007
Keyboard Logitech G110 / Logitech G110
Translation: they knowing put out terrible GPUs with ridiculously low VRAM, the AI boom is already cooling and they want to manipulate people to try to sell pallets that are just sitting in warehouses.
 
Top