Wednesday, November 15th 2023

NVIDIA Announces up to 5x Faster TensorRT-LLM for Windows, and ChatGPT API-like Interface

Even as CPU vendors are working to mainstream accelerated AI for client PCs, and Microsoft setting the pace for more AI in everyday applications with Windows 11 23H2 Update; NVIDIA is out there reminding you that every GeForce RTX GPU is an AI accelerator. This is thanks to its Tensor cores, and the SIMD muscle of the ubiquitous CUDA cores. NVIDIA has been making these for over 5 years now, and has an install base of over 100 million. The company is hence focusing on bring generative AI acceleration to more client- and enthusiast relevant use-cases, such as large language models.

NVIDIA at the Microsoft Ignite event announced new optimizations, models, and resources to bring accelerated AI to everyone with an NVIDIA GPU that meets the hardware requirements. To begin with, the company introduced an update to TensorRT-LLM for Windows, a library that leverages NVIDIA RTX architecture for accelerating large language models (LLMs). The new TensorRT-LLM version 0.6.0 will release later this month, and improve LLM inference performance by up to 5 times in terms of tokens per second, when compared to the initial release of TensorRT-LLM from October 2023. In addition, TensorRT-LLM 0.6.0 will introduce support for popular LLMs, including Mistral 7B and Nemtron-3 8B. Accelerating these two will require a GeForce RTX 30-series "Ampere" or 40-series "Ada" GPU with at least 8 GB of main memory.
OpenAI's ChatGPT is the hottest consumer application since Google, but it is a cloud-based service, which entails transmitting information over the Internet, and is limited in the size of data-sets; making it impractical for enterprises or organizations that require foolproof data privacy and limitless scaling for data-sets, which only a localized AI can provide. NVIDIA will soon be enabling TensorRT-LLM for Windows to support a similar interface to ChatAPI through a new wrapper. NVIDIA says that for those developing applications around ChatAPI, it takes changing just 2 lines of code to benefit from local AI. The new wrapper will work with any LLM that's optimized for TensorRT-LLM, such as Llama 2, Mistral, and NV LLM. The company plans to release this as a reference project on GitHub.

NVIDIA is working with Microsoft to accelerate Llama 2 and Stable Diffusion on RTX via new optimizations the DirectML API. Developers can experience these by downloading the latest ONNX runtime along with a new upcoming version of NVIDIA GeForce drivers that the company will release on November 21, 2023.
Add your own comment

7 Comments on NVIDIA Announces up to 5x Faster TensorRT-LLM for Windows, and ChatGPT API-like Interface

#1
neatfeatguy
Of course it is, but only if you enable DLSS 3.0.
Posted on Reply
#2
Legacy-ZA
Just enable Frame-Generation on RTX2000/3000 series, don't believe any of your excuses nVidia.
Posted on Reply
#3
lexluthermiester
Legacy-ZAJust enable Frame-Generation on RTX2000/3000 series, don't believe any of your excuses nVidia.
Right there with you.
Posted on Reply
#4
the54thvoid
Super Intoxicated Moderator
Worringly, it's another pitch to try and sell volume to business, and not the 'professional' cards either. So, with LLM's on the ascendancy (for now), NV can try and flog more RTX graphics cards to inflate prices and keep pushing gaming cards out of the reach of most gamers.

It'd be better if they pushed the professional line for that, as businesses can spend more and do legitimate tax write offs to recoup costs.
Posted on Reply
#5
SOAREVERSOR
Legacy-ZAJust enable Frame-Generation on RTX2000/3000 series, don't believe any of your excuses nVidia.
And the tech companies looked at the PC gaming peons and said, LOL NO! And they were right. And all was correct in the world again.
Posted on Reply
#6
wolf
Better Than Native
A worrying amount of straight up off topic posts here...

A good mate with a 3080 is playing around with stable diffusion, I'll see what he makes of all this too.
Posted on Reply
#7
JAB Creations
Translation: they knowing put out terrible GPUs with ridiculously low VRAM, the AI boom is already cooling and they want to manipulate people to try to sell pallets that are just sitting in warehouses.
Posted on Reply
Nov 21st, 2024 11:18 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts