News Posts matching #TensorRT

Return to Keyword Browsing

NVIDIA TensorRT Boosts Stable Diffusion 3.5 Performance on NVIDIA GeForce RTX and RTX PRO GPUs

Press Release by

Jun 12th, 2025 08:08 Discuss (10 Comments)

Generative AI has reshaped how people create, imagine and interact with digital content. As AI models continue to grow in capability and complexity, they require more VRAM, or video random access memory. The base Stable Diffusion 3.5 Large model, for example, uses over 18 GB of VRAM - limiting the number of systems that can run it well. By applying quantization to the model, noncritical layers can be removed or run with lower precision. NVIDIA GeForce RTX 40 Series and the Ada Lovelace generation of NVIDIA RTX PRO GPUs support FP8 quantization to help run these quantized models, and the latest-generation NVIDIA Blackwell GPUs also add support for FP4.

NVIDIA collaborated with Stability AI to quantize its latest model, Stable Diffusion (SD) 3.5 Large, to FP8 - reducing VRAM consumption by 40%. Further optimizations to SD3.5 Large and Medium with the NVIDIA TensorRT software development kit (SDK) double performance. In addition, TensorRT has been reimagined for RTX AI PCs, combining its industry-leading performance with just-in-time (JIT), on-device engine building and an 8x smaller package size for seamless AI deployment to more than 100 million RTX AI PCs. TensorRT for RTX is now available as a standalone SDK for developers.

Read full story

NVIDIA Blackwell Delivers Breakthrough Performance in Latest MLPerf Training Results

Press Release by

Jun 6th, 2025 07:50 Discuss (0 Comments)

NVIDIA is working with companies worldwide to build out AI factories—speeding the training and deployment of next-generation AI applications that use the latest advancements in training and inference. The NVIDIA Blackwell architecture is built to meet the heightened performance requirements of these new applications. In the latest round of MLPerf Training—the 12th since the benchmark's introduction in 2018—the NVIDIA AI platform delivered the highest performance at scale on every benchmark and powered every result submitted on the benchmark's toughest large language model (LLM)-focused test: Llama 3.1 405B pretraining.

The NVIDIA platform was the only one that submitted results on every MLPerf Training v5.0 benchmark—underscoring its exceptional performance and versatility across a wide array of AI workloads, spanning LLMs, recommendation systems, multimodal LLMs, object detection and graph neural networks. The at-scale submissions used two AI supercomputers powered by the NVIDIA Blackwell platform: Tyche, built using NVIDIA GB200 NVL72 rack-scale systems, and Nyx, based on NVIDIA DGX B200 systems. In addition, NVIDIA collaborated with CoreWeave and IBM to submit GB200 NVL72 results using a total of 2,496 Blackwell GPUs and 1,248 NVIDIA Grace CPUs.

Read full story

NVIDIA Outlines Cost Benefits of Inference Platform

Press Release by

Jan 27th, 2025 07:54 Discuss (8 Comments)

Businesses across every industry are rolling out AI services this year. For Microsoft, Oracle, Perplexity, Snap and hundreds of other leading companies, using the NVIDIA AI inference platform—a full stack comprising world-class silicon, systems and software—is the key to delivering high-throughput and low-latency inference and enabling great user experiences while lowering cost. NVIDIA's advancements in inference software optimization and the NVIDIA Hopper platform are helping industries serve the latest generative AI models, delivering excellent user experiences while optimizing total cost of ownership. The Hopper platform also helps deliver up to 15x more energy efficiency for inference workloads compared to previous generations.

AI inference is notoriously difficult, as it requires many steps to strike the right balance between throughput and user experience. But the underlying goal is simple: generate more tokens at a lower cost. Tokens represent words in a large language model (LLM) system—and with AI inference services typically charging for every million tokens generated, this goal offers the most visible return on AI investments and energy used per task. Full-stack software optimization offers the key to improving AI inference performance and achieving this goal.

Read full story

AAEON Releases NVIDIA Jetson Orin-powered BOXER-8654AI-KIT

Press Release by

Jan 19th, 2025 23:22 Discuss (0 Comments)

AAEON (stock code: 6579), a leading provider of edge AI platforms, has announced the release of the BOXER-8654AI-KIT, a development kit equipped with either an 8 GB or 16 GB NVIDIA Jetson Orin NX module. While AAEON's BOXER series has become synonymous with NVIDIA-powered embedded AI systems, the BOXER-8654AI-KIT provides its customers with an avenue to develop embedded solutions from the ground up.

The BOXER-8654AI-KIT's module is accompanied by an advanced carrier board featuring a full-function I/O boasting six USB 3.2 Gen 2 ports, an Out-of-Band (OOB) management box header, and four LAN ports with optional power boards for PoE functionality. For display, the kit's carrier board offers one physical HDMI Type-A port, as well as two MIPI CSI connectors, each equipped with four lanes to allow for two MIPI cameras to be installed per interface. Other physical I/O features that stand out are its two DB-9 ports, one of which offers an RS-232 (RX/TX/GND)/422/485 signal, with the other providing a CANBus interface. Additionally, an NVIDIA Jetson-compatible 40-pin header offers I2C, SPI, UART, and other serial interfaces.

Read full story

NVIDIA and Microsoft Showcase Blackwell Preview, Omniverse Industrial AI and RTX AI PCs at Microsoft Ignite

Press Release by

Nov 20th, 2024 02:32 Discuss (10 Comments)

NVIDIA and Microsoft today unveiled product integrations designed to advance full-stack NVIDIA AI development on Microsoft platforms and applications. At Microsoft Ignite, Microsoft announced the launch of the first cloud private preview of the Azure ND GB200 V6 VM series, based on the NVIDIA Blackwell platform. The Azure ND GB200 v6 will be a new AI-optimized virtual machine (VM) series and combines the NVIDIA GB200 NVL72 rack design with NVIDIA Quantum InfiniBand networking.

In addition, Microsoft revealed that Azure Container Apps now supports NVIDIA GPUs, enabling simplified and scalable AI deployment. Plus, the NVIDIA AI platform on Azure includes new reference workflows for industrial AI and an NVIDIA Omniverse Blueprint for creating immersive, AI-powered visuals. At Ignite, NVIDIA also announced multimodal small language models (SLMs) for RTX AI PCs and workstations, enhancing digital human interactions and virtual assistants with greater realism.

Read full story

Advantech Launches AIR-310, Ultra-Low-Profile Scalable AI Inference

Press Release by

Sep 26th, 2024 15:15 Discuss (0 Comments)

Advantech, a leading provider of edge computing solutions, introduces the AIR-310, a compact edge AI inference system featuring an MXM GPU card. Powered by 12th/13th/14th Gen Intel Core 65 W desktop processors, the AIR-310 delivers up to 12.99 TFLOPS of scalable AI performance via the NVIDIA Quadro 2000A GPU card in a 1.5U chassis (215 x 225 x 55 mm). Despite its compact size, it offers versatile connectivity with three LAN ports and four USB 3.0 ports, enabling seamless integration of sensors and cameras for vision AI applications.

The system includes smart fan management, operates in temperatures from 0 to 50°C (32 to 122°F), and is shock-resistant, capable of withstanding 3G vibration and 30G shock. Bundled with Intel Arc A370 and NVIDIA A2000 GPUs, it is certified to IEC 61000-6-2, IEC 61000-6-4, and CB/UL standards, ensuring stable 24/7 operation in harsh environments, including space-constrained or mobile equipment. The AIR-310 supports Windows 11, Linux Ubuntu 24.04, and the Edge AI SDK, enabling accelerated inference deployment for applications such as factory inspections, real-time video surveillance, GenAI/LLM, and medical imaging.

Read full story

Stability AI Outs Stable Diffusion 3 Medium, Company's Most Advanced Image Generation Model

by

Jun 13th, 2024 02:30 Discuss (12 Comments)

Stability AI, a maker of various generative AI models and the company behind text-to-image Stable Diffusion models, has released its latest Stable Diffusion 3 (SD3) Medium AI model. Running on two billion dense parameters, the SD3 Medium is the company's most advanced text-to-image model to date. It boasts features like generating highly realistic and detailed images across a wide range of styles and compositions. It demonstrates capabilities in handling intricate prompts that involve spatial reasoning, actions, and diverse artistic directions. The model's innovative architecture, including the 16-channel variational autoencoder (VAE), allows it to overcome common challenges faced by other models, such as accurately rendering realistic human faces and hands.

Additionally, it achieves exceptional text quality, with precise letter formation, kerning, and spacing, thanks to the Diffusion Transformer architecture. Notably, the model is resource-efficient, capable of running smoothly on consumer-grade GPUs without compromising performance due to its low VRAM footprint. Furthermore, it exhibits impressive fine-tuning abilities, allowing it to absorb and replicate nuanced details from small datasets, making it highly customizable for specific use cases that users may have. Being an open-weight model, it is available for download on HuggingFace, and it has libraries optimized for both NVIDIA's TensorRT (all modern NVIDIA GPUs) and AMD Radeon/Instinct GPUs.

AAEON Announces Support for NVIDIA Jetpack 6.0 on BOXER-8640AI

Computex Press Release by

Jun 4th, 2024 00:57 Discuss (0 Comments)

AAEON, a leading designer and manufacturer of AI Edge solutions, has today confirmed that its Fanless Embedded AI System, the BOXER-8640AI, will support the newly released NVIDIA Jetpack 6.0 software development kit.

The NVIDIA Jetpack 6.0 SDK is equipped with a range of new features conducive to accelerating AI application development, such as Jetson Platform Services, a modular, API-driven suite of pre-built and cloud-native software services.

Read full story

NVIDIA's Arm-based AI PC Processor Could Leverage Arm Cortex X5 CPU Cores and Blackwell Graphics

by

May 27th, 2024 03:11 Discuss (25 Comments)

Last week, we got confirmation from the highest levels of Dell and NVIDIA that the latter is making a client PC processor for the Windows on Arm (WoA) AI PC ecosystem that only has one player in it currently, Qualcomm. Michael Dell hinted that this NVIDIA AI PC processor would be ready in 2025. Since then, speculation has been rife about the various IP blocks NVIDIA could use in the development of this chip, the two key areas of debate have been the CPU cores and the process node.

Given that NVIDIA is gunning toward a 2025 launch of its AI PC processor, the company could implement reference Arm IP CPU cores, such as the Arm Cortex X5 "Blackhawk," and not venture out toward developing its own CPU cores on the Arm machine architecture, unlike Apple. Depending on how the market recieves its chips, NVIDIA could eventually develop its own cores. Next up, the company could use the most advanced 3 nm-class foundry node available in 2025 for its chip, such as the TSMC N3P. Given that even Apple and Qualcomm will build their contemporary notebook chips on this node, it would be a logical choice of node for NVIDIA. Then there's graphics and AI acceleration hardware.

Read full story

Qualcomm's Success with Windows AI PC Drawing NVIDIA Back to the Client SoC Business

by

May 22nd, 2024 07:14 Discuss (34 Comments)

NVIDIA is eying a comeback to the client processor business, reveals a Bloomberg interview with the CEOs of NVIDIA and Dell. For NVIDIA, all it takes is a simple driver update that exposes every GeForce GPU with tensor cores as an NPU to Windows 11, with translation layers to get popular client AI apps to work with TensorRT. But that would need you to have a discrete NVIDIA GPU. What about the vast market of Windows AI PCs powered by the likes of Qualcomm, Intel, and AMD, who each sell 15 W-class processors with integrated NPUs capable of 50 AI TOPS, which is all that Copilot+ needs? NVIDIA held an Arm license for decades now, and makes Arm-based CPUs to this day, with the NVIDIA Grace, however, that is a large server processor meant for its AI GPU servers.

NVIDIA already made client processors under the Tegra brand targeting smartphones, which it winded down last decade. It's since been making Drive PX processors for its automotive self-driving hardware division; and of course there's Grace. NVIDIA hinted that it might have a client CPU for the AI PC market in 2025. In the interview Bloomberg asked NVIDIA CEO Jensen Huang a pointed question on whether NVIDIA has a place in the AI PC market. Dell CEO Michael Dell, who was also in the interview, interjected "come back next year," to which Jensen affirmed "exactly." Dell would be in a front-and-center position to know if NVIDIA is working on a new PC processor for launch in 2025, and Jensen's nod almost confirms this

Read full story

New Performance Optimizations Supercharge NVIDIA RTX AI PCs for Gamers, Creators and Developers

Press Release by

May 21st, 2024 14:39 Discuss (0 Comments)

NVIDIA today announced at Microsoft Build new AI performance optimizations and integrations for Windows that help deliver maximum performance on NVIDIA GeForce RTX AI PCs and NVIDIA RTX workstations. Large language models (LLMs) power some of the most exciting new use cases in generative AI and now run up to 3x faster with ONNX Runtime (ORT) and DirectML using the new NVIDIA R555 Game Ready Driver. ORT and DirectML are high-performance tools used to run AI models locally on Windows PCs.

WebNN, an application programming interface for web developers to deploy AI models, is now accelerated with RTX via DirectML, enabling web apps to incorporate fast, AI-powered capabilities. And PyTorch will support DirectML execution backends, enabling Windows developers to train and infer complex AI models on Windows natively. NVIDIA and Microsoft are collaborating to scale performance on RTX GPUs. These advancements build on NVIDIA's world-leading AI platform, which accelerates more than 500 applications and games on over 100 million RTX AI PCs and workstations worldwide.

Read full story

NVIDIA Hopper Leaps Ahead in Generative AI at MLPerf

Press Release by

Mar 27th, 2024 11:47 Discuss (15 Comments)

It's official: NVIDIA delivered the world's fastest platform in industry-standard tests for inference on generative AI. In the latest MLPerf benchmarks, NVIDIA TensorRT-LLM—software that speeds and simplifies the complex job of inference on large language models—boosted the performance of NVIDIA Hopper architecture GPUs on the GPT-J LLM nearly 3x over their results just six months ago. The dramatic speedup demonstrates the power of NVIDIA's full-stack platform of chips, systems and software to handle the demanding requirements of running generative AI. Leading companies are using TensorRT-LLM to optimize their models. And NVIDIA NIM—a set of inference microservices that includes inferencing engines like TensorRT-LLM—makes it easier than ever for businesses to deploy NVIDIA's inference platform.

Raising the Bar in Generative AI
TensorRT-LLM running on NVIDIA H200 Tensor Core GPUs—the latest, memory-enhanced Hopper GPUs—delivered the fastest performance running inference in MLPerf's biggest test of generative AI to date. The new benchmark uses the largest version of Llama 2, a state-of-the-art large language model packing 70 billion parameters. The model is more than 10x larger than the GPT-J LLM first used in the September benchmarks. The memory-enhanced H200 GPUs, in their MLPerf debut, used TensorRT-LLM to produce up to 31,000 tokens/second, a record on MLPerf's Llama 2 benchmark. The H200 GPU results include up to 14% gains from a custom thermal solution. It's one example of innovations beyond standard air cooling that systems builders are applying to their NVIDIA MGX designs to take the performance of Hopper GPUs to new heights.

Read full story

UL Announces the Procyon AI Image Generation Benchmark Based on Stable Diffusion

Press Release by

Mar 26th, 2024 05:25 Discuss (7 Comments)

We're excited to announce we're expanding our AI Inference benchmark offerings with the UL Procyon AI Image Generation Benchmark, coming Monday, 25th March. AI has the potential to be one of the most significant new technologies hitting the mainstream this decade, and many industry leaders are competing to deliver the best AI Inference performance through their hardware. Last year, we launched the first of our Procyon AI Inference Benchmarks for Windows, which measured AI Inference performance with a workload using Computer Vision.

The upcoming UL Procyon AI Image Generation Benchmark provides a consistent, accurate and understandable workload for measuring the AI performance of high-end hardware, built with input from members of the industry to ensure fair and comparable results across all supported hardware.

Read full story

Microsoft and NVIDIA Announce Major Integrations to Accelerate Generative AI for Enterprises Everywhere

Press Release by

Mar 18th, 2024 16:40 Discuss (1 Comment)

At GTC on Monday, Microsoft Corp. and NVIDIA expanded their longstanding collaboration with powerful new integrations that leverage the latest NVIDIA generative AI and Omniverse technologies across Microsoft Azure, Azure AI services, Microsoft Fabric and Microsoft 365.

"Together with NVIDIA, we are making the promise of AI real, helping to drive new benefits and productivity gains for people and organizations everywhere," said Satya Nadella, Chairman and CEO, Microsoft. "From bringing the GB200 Grace Blackwell processor to Azure, to new integrations between DGX Cloud and Microsoft Fabric, the announcements we are making today will ensure customers have the most comprehensive platforms and tools across every layer of the Copilot stack, from silicon to software, to build their own breakthrough AI capability."

"AI is transforming our daily lives - opening up a world of new opportunities," said Jensen Huang, founder and CEO of NVIDIA. "Through our collaboration with Microsoft, we're building a future that unlocks the promise of AI for customers, helping them deliver innovative solutions to the world."

Read full story

Intel Gaudi2 Accelerator Beats NVIDIA H100 at Stable Diffusion 3 by 55%

by

Mar 12th, 2024 00:44 Discuss (8 Comments)

Stability AI, the developers behind the popular Stable Diffusion generative AI model, have run some first-party performance benchmarks for Stable Diffusion 3 using popular data-center AI GPUs, including the NVIDIA H100 "Hopper" 80 GB, A100 "Ampere" 80 GB, and Intel's Gaudi2 96 GB accelerator. Unlike the H100, which is a super-scalar CUDA+Tensor core GPU; the Gaudi2 is purpose-built to accelerate generative AI and LLMs. Stability AI published its performance findings in a blog post, which reveals that the Intel Gaudi2 96 GB is posting a roughly 56% higher performance than the H100 80 GB.

With 2 nodes, 16 accelerators, and a constant batch size of 16 per accelerator (256 in all), the Intel Gaudi2 array is able to generate 927 images per second, compared to 595 images for the H100 array, and 381 images per second for the A100 array, keeping accelerator and node counts constant. Scaling things up a notch to 32 nodes, and 256 accelerators or a batch size of 16 per accelerator (total batch size of 4,096), the Gaudi2 array is posting 12,654 images per second; or 49.4 images per-second per-device; compared to 3,992 images per second or 15.6 images per-second per-device for the older-gen A100 "Ampere" array.

Read full story

NVIDIA Introduces NVIDIA RTX 2000 Ada Generation GPU

Press Release by

Feb 12th, 2024 11:31 Discuss (23 Comments)

Generative AI is driving change across industries—and to take advantage of its benefits, businesses must select the right hardware to power their workflows. The new NVIDIA RTX 2000 Ada Generation GPU delivers the latest AI, graphics and compute technology to compact workstations, offering up to 1.5x the performance of the previous-generation RTX A2000 12 GB in professional workflows. From crafting stunning 3D environments to streamlining complex design reviews to refining industrial designs, the card's capabilities pave the way for an AI-accelerated future, empowering professionals to achieve more without compromising on performance or capabilities. Modern multi-application workflows, such as AI-powered tools, multi-display setups and high-resolution content, put significant demands on GPU memory. With 16 GB of memory in the RTX 2000 Ada, professionals can tap the latest technologies and tools to work faster and better with their data.

Powered by NVIDIA RTX technology, the new GPU delivers impressive realism in graphics with NVIDIA DLSS, delivering ultra-high-quality, photorealistic ray-traced images more than 3x faster than before. In addition, the RTX 2000 Ada enables an immersive experience for enterprise virtual-reality workflows, such as for product design and engineering design reviews. With its blend of performance, versatility and AI capabilities, the RTX 2000 Ada helps professionals across industries achieve efficiencies. Architects and urban planners can use it to accelerate visualization workflows and structural analysis, enhancing design precision. Product designers and engineers using industrial PCs can iterate rapidly on product designs with fast, photorealistic rendering and AI-powered generative design. Content creators can edit high-resolution videos and images seamlessly, and use AI for realistic visual effects and content creation assistance. And in vital embedded applications and edge computing, the RTX 2000 Ada can power real-time data processing for medical devices, optimize manufacturing processes with predictive maintenance and enable AI-driven intelligence in retail environments.

Read full story

Dropbox and NVIDIA Team to Bring Personalized Generative AI to Millions of Customers

Press Release by

Nov 17th, 2023 08:32 Discuss (21 Comments)

Today, Dropbox, Inc. and NVIDIA announced a collaboration to supercharge knowledge work and improve productivity for millions of Dropbox customers through the power of AI. The companies' collaboration will expand Dropbox's extensive AI functionality with new uses for personalized generative AI to improve search accuracy, provide better organization, and simplify workflows for its customers across their cloud content.

Dropbox plans to leverage NVIDIA's AI foundry consisting of NVIDIA AI Foundation Models, NVIDIA AI Enterprise software and NVIDIA accelerated computing to enhance its latest AI-powered product experiences. These include Dropbox Dash, universal search that connects apps, tools, and content in a single search bar to help customers find what they need; Dropbox AI, a tool that allows customers to ask questions and get summaries on large files across their entire Dropbox; among other AI capabilities in Dropbox.

Read full story

NVIDIA Announces up to 5x Faster TensorRT-LLM for Windows, and ChatGPT API-like Interface

by

Nov 15th, 2023 10:00 Discuss (7 Comments)

Even as CPU vendors are working to mainstream accelerated AI for client PCs, and Microsoft setting the pace for more AI in everyday applications with Windows 11 23H2 Update; NVIDIA is out there reminding you that every GeForce RTX GPU is an AI accelerator. This is thanks to its Tensor cores, and the SIMD muscle of the ubiquitous CUDA cores. NVIDIA has been making these for over 5 years now, and has an install base of over 100 million. The company is hence focusing on bring generative AI acceleration to more client- and enthusiast relevant use-cases, such as large language models.

NVIDIA at the Microsoft Ignite event announced new optimizations, models, and resources to bring accelerated AI to everyone with an NVIDIA GPU that meets the hardware requirements. To begin with, the company introduced an update to TensorRT-LLM for Windows, a library that leverages NVIDIA RTX architecture for accelerating large language models (LLMs). The new TensorRT-LLM version 0.6.0 will release later this month, and improve LLM inference performance by up to 5 times in terms of tokens per second, when compared to the initial release of TensorRT-LLM from October 2023. In addition, TensorRT-LLM 0.6.0 will introduce support for popular LLMs, including Mistral 7B and Nemtron-3 8B. Accelerating these two will require a GeForce RTX 30-series "Ampere" or 40-series "Ada" GPU with at least 8 GB of main memory.

Read full story

Return to Keyword Browsing

Jun 30th, 2025 21:27 CDT change timezone

Latest GPU Drivers

New Forum Posts

20:33 by Logic
Laptop overclocking adventures (1238)
19:49 by iameatingjam
[INTEL]-How To Update Your Microcode for Intel HX 13/14th Gen. CPUs Laptops/Mobile Easily. (172)
19:40 by Tyler-98-W68
Will you buy a RTX 5090? (584)
19:24 by AusWolf
The TPU UK Clubhouse (26530)
19:24 by Logic
Optane and "enable write caching " (27)
18:51 by A Computer Guy
Question about Intel Optane SSDs (87)
18:45 by lexluthermiester
Do you use Linux? (664)
18:45 by Gold Leader
Remember Fermi? Well here's my EVGA GTX 480 that I picked up for just 19 Euros! (9)
17:56 by HermannSW
Vega owners club (587)
17:38 by mclaren85
Can you guess Which game it is? (194)

Popular Reviews

Jun 30th, 2025 ASUS ROG Crosshair X870E Extreme Review
Jun 20th, 2025 Sapphire Radeon RX 9060 XT Pulse OC 16 GB Review - Samsung Memory Tested
Jun 27th, 2025 AVerMedia CamStream 4K Review
Jun 26th, 2025 Lexar NQ780 4 TB Review
Nov 6th, 2024 AMD Ryzen 7 9800X3D Review - The Best Gaming Processor
May 13th, 2025 Upcoming Hardware Launches 2025 (Updated May 2025)
Mar 5th, 2025 Sapphire Radeon RX 9070 XT Nitro+ Review - Beating NVIDIA
Mar 11th, 2025 AMD Ryzen 9 9950X3D Review - Great for Gaming and Productivity
May 27th, 2025 NVIDIA GeForce RTX 5060 8 GB Review
Jun 25th, 2025 ASRock Phantom Gaming Z890 Riptide Wi-Fi Review

TPU on YouTube

Controversial News Posts