• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

NVIDIA Outlines Cost Benefits of Inference Platform

T0@st

News Editor
Joined
Mar 7, 2023
Messages
2,289 (3.29/day)
Location
South East, UK
Businesses across every industry are rolling out AI services this year. For Microsoft, Oracle, Perplexity, Snap and hundreds of other leading companies, using the NVIDIA AI inference platform—a full stack comprising world-class silicon, systems and software—is the key to delivering high-throughput and low-latency inference and enabling great user experiences while lowering cost. NVIDIA's advancements in inference software optimization and the NVIDIA Hopper platform are helping industries serve the latest generative AI models, delivering excellent user experiences while optimizing total cost of ownership. The Hopper platform also helps deliver up to 15x more energy efficiency for inference workloads compared to previous generations.

AI inference is notoriously difficult, as it requires many steps to strike the right balance between throughput and user experience. But the underlying goal is simple: generate more tokens at a lower cost. Tokens represent words in a large language model (LLM) system—and with AI inference services typically charging for every million tokens generated, this goal offers the most visible return on AI investments and energy used per task. Full-stack software optimization offers the key to improving AI inference performance and achieving this goal.




Cost-Effective User Throughput
Businesses are often challenged with balancing the performance and costs of inference workloads. While some customers or use cases may work with an out-of-the-box or hosted model, others may require customization. NVIDIA technologies simplify model deployment while optimizing cost and performance for AI inference workloads. In addition, customers can experience flexibility and customizability with the models they choose to deploy.

NVIDIA NIM microservices, NVIDIA Triton Inference Server and the NVIDIA TensorRT library are among the inference solutions NVIDIA offers to suit users' needs:
  • NVIDIA NIM inference microservices are prepackaged and performance-optimized for rapidly deploying AI foundation models on any infrastructure—cloud, data centers, edge or workstations.
  • NVIDIA Triton Inference Server, one of the company's most popular open-source projects, allows users to package and serve any model regardless of the AI framework it was trained on.
  • NVIDIA TensorRT is a high-performance deep learning inference library that includes runtime and model optimizations to deliver low-latency and high-throughput inference for production applications.

Available in all major cloud marketplaces, the NVIDIA AI Enterprise software platform includes all these solutions and provides enterprise-grade support, stability, manageability and security.

With the framework-agnostic NVIDIA AI inference platform, companies save on productivity, development, and infrastructure and setup costs. Using NVIDIA technologies can also boost business revenue by helping companies avoid downtime and fraudulent transactions, increase e-commerce shopping conversion rates and generate new, AI-powered revenue streams.

Cloud-Based LLM Inference
To ease LLM deployment, NVIDIA has collaborated closely with every major cloud service provider to ensure that the NVIDIA inference platform can be seamlessly deployed in the cloud with minimal or no code required. NVIDIA NIM is integrated with cloud-native services such as:
  • Amazon SageMaker AI, Amazon Bedrock Marketplace, Amazon Elastic Kubernetes Service
  • Google Cloud's Vertex AI, Google Kubernetes Engine
  • Microsoft Azure AI Foundry coming soon, Azure Kubernetes Service
  • Oracle Cloud Infrastructure's data science tools, Oracle Cloud Infrastructure Kubernetes Engine

Plus, for customized inference deployments, NVIDIA Triton Inference Server is deeply integrated into all major cloud service providers.

For example, using the OCI Data Science platform, deploying NVIDIA Triton is as simple as turning on a switch in the command line arguments during model deployment, which instantly launches an NVIDIA Triton inference endpoint.

Similarly, with Azure Machine Learning, users can deploy NVIDIA Triton either with no-code deployment through the Azure Machine Learning Studio or full-code deployment with Azure Machine Learning CLI. AWS provides one-click deployment for NVIDIA NIM from SageMaker Marketplace and Google Cloud provides a one-click deployment option on Google Kubernetes Engine (GKE). Google Cloud provides a one-click deployment option on Google Kubernetes Engine, while AWS offers NVIDIA Triton on its AWS Deep Learning containers.

The NVIDIA AI inference platform also uses popular communication methods for delivering AI predictions, automatically adjusting to accommodate the growing and changing needs of users within a cloud-based infrastructure.

From accelerating LLMs to enhancing creative workflows and transforming agreement management, NVIDIA's AI inference platform is driving real-world impact across industries. Learn how collaboration and innovation are enabling the organizations below to achieve new levels of efficiency and scalability.

The full article can be found here.

Learn more about how NVIDIA is delivering breakthrough inference performance results and stay up to date with the latest AI inference performance updates.

View at TechPowerUp Main Site | Source
 
Joined
Dec 12, 2016
Messages
2,153 (0.72/day)
I'm guessing this press release is due to the DeepSeek debacle but is Nvidia really trying to sell us on lower TCO after saying things like "The more you buy, the more you save"? This company truly represents the absolute worse of our humanity. Nvidia has been raking in 75-80% margins by charging $45,000 for their data center GPUs which means the cost is $9,000. Greed can really come back and bite you in the ass.
 
Joined
May 10, 2023
Messages
553 (0.88/day)
Location
Brazil
Processor 5950x
Motherboard B550 ProArt
Cooling Fuma 2
Memory 4x32GB 3200MHz Corsair LPX
Video Card(s) 2x RTX 3090
Display(s) LG 42" C2 4k OLED
Power Supply XPG Core Reactor 850W
Software I use Arch btw
Too bad they still haven't come up with a proper solution for GPU sharing inside of k8s. Having to deal with MPS is far from ideal, and stuff like vGPUs and MIG are not that flexible :(
 
Joined
Aug 12, 2010
Messages
164 (0.03/day)
Location
Brazil
Processor Ryzen 7 7800X3D
Motherboard ASRock B650M PG Riptide
Cooling AMD Wraith Max + 2x Noctua Redux NF-P14r + 2x NF-P12
Memory 2x16GB ADATA XPG Lancer Blade DDR5-6000
Video Card(s) Powercolor RX 7800 XT Fighter OC
Storage ADATA Legend 970 2TB PCIe 5.0
Display(s) Dell 32" S3222DGM - 1440P 165Hz + P2422H
Case HYTE Y40
Audio Device(s) Microsoft Xbox TLL-00008
Power Supply Cooler Master MWE 750 V2
Mouse Alienware AW320M
Keyboard Alienware AW510K
Software Windows 11 Pro
This paid PR sounds very much like desperation.
 
Joined
Oct 28, 2012
Messages
1,260 (0.28/day)
Processor AMD Ryzen 3700x
Motherboard asus ROG Strix B-350I Gaming
Cooling Deepcool LS520 SE
Memory crucial ballistix 32Gb DDR4
Video Card(s) RTX 3070 FE
Storage WD sn550 1To/WD ssd sata 1To /WD black sn750 1To/Seagate 2To/WD book 4 To back-up
Display(s) LG GL850
Case Dan A4 H2O
Audio Device(s) sennheiser HD58X
Power Supply Corsair SF600
Mouse MX master 3
Keyboard Master Key Mx
Software win 11 pro
I'm guessing this press release is due to the DeepSeek debacle but is Nvidia really trying to sell us on lower TCO after saying things like "The more you buy, the more you save"? This company truly represents the absolute worse of our humanity. Nvidia has been raking in 75-80% margins by charging $45,000 for their data center GPUs which means the cost is $9,000. Greed can really come back and bite you in the ass.
Things could have been worse. There that one guy who increased the price of a drug by 11 000%. 75-80% is the common margin of big pharmaceutical companies. And big pharmacy happens to be one of NVIDIA customers
 
Joined
Nov 4, 2005
Messages
12,083 (1.72/day)
System Name Compy 386
Processor 7800X3D
Motherboard Asus
Cooling Air for now.....
Memory 64 GB DDR5 6400Mhz
Video Card(s) 7900XTX 310 Merc
Storage Samsung 990 2TB, 2 SP 2TB SSDs, 24TB Enterprise drives
Display(s) 55" Samsung 4K HDR
Audio Device(s) ATI HDMI
Mouse Logitech MX518
Keyboard Razer
Software A lot.
Benchmark Scores Its fast. Enough.
Joined
Apr 13, 2022
Messages
1,265 (1.24/day)
I'm guessing this press release is due to the DeepSeek debacle but is Nvidia really trying to sell us on lower TCO after saying things like "The more you buy, the more you save"? This company truly represents the absolute worse of our humanity. Nvidia has been raking in 75-80% margins by charging $45,000 for their data center GPUs which means the cost is $9,000. Greed can really come back and bite you in the ass.
The issue is this isn't what's happening.

Point blank most "AI" is not using professional grade stuff and those who do have so much money to set on fire the cost doesn't matter. In most "AI" cases since the GTX 8800 people are just buying up rack tons of evga (dead), ASUS, founders, gigabyte, and MSI cards and kit bashing them into a system as CUDA does not care.

But that's the crux of it. What sells nvidia is not the GPU but CUDA. If someone can take a crack at nvidia's entire ecosystem it comes crashing down. This won't translate into cheaper gaming GPUs it will mean that high end GPUs are no longer worth the effort to develop. Outside of flex AI cases. That's nvidias entire house of cards. GPUs for gaming are not worth the investment at the high end. AI is only worth it if you have an ecosystem lock. Kill that lock and good bye.
 
Joined
Jan 11, 2022
Messages
1,058 (0.95/day)
The issue is this isn't what's happening.

Point blank most "AI" is not using professional grade stuff and those who do have so much money to set on fire the cost doesn't matter. In most "AI" cases since the GTX 8800 people are just buying up rack tons of evga (dead), ASUS, founders, gigabyte, and MSI cards and kit bashing them into a system as CUDA does not care.

But that's the crux of it. What sells nvidia is not the GPU but CUDA. If someone can take a crack at nvidia's entire ecosystem it comes crashing down. This won't translate into cheaper gaming GPUs it will mean that high end GPUs are no longer worth the effort to develop. Outside of flex AI cases. That's nvidias entire house of cards. GPUs for gaming are not worth the investment at the high end. AI is only worth it if you have an ecosystem lock. Kill that lock and good bye.
I doubt CUDA is all that important to big users like open ai meta and I’m fairly certain those are working bare metal anyway
 
Joined
May 10, 2023
Messages
553 (0.88/day)
Location
Brazil
Processor 5950x
Motherboard B550 ProArt
Cooling Fuma 2
Memory 4x32GB 3200MHz Corsair LPX
Video Card(s) 2x RTX 3090
Display(s) LG 42" C2 4k OLED
Power Supply XPG Core Reactor 850W
Software I use Arch btw
I doubt CUDA is all that important to big users like open ai meta and I’m fairly certain those are working bare metal anyway
Even if bare metal, they're still using CUDA.
CUDA is the API that allow them to do GPGPU on Nvidia GPUs, and is the major API supported by all the big machine learning frameworks.
AMD is trying to play catch up with ROCm, and are getting some traction, but are still way behind Nvidia.
 
Top