News Posts matching #Tensor Cores

Return to Keyword Browsing

PNY Announces Support for the New NVIDIA RTX PRO Blackwell Graphics Card Family

PNY announced today it is adding the new NVIDIA RTX PRO 6000 Blackwell Workstation Edition, RTX PRO 6000 Blackwell Max-Q Workstation Edition, RTX PRO 5000 Blackwell, RTX PRO 4500 Blackwell, and RTX PRO 4000 Blackwell graphics cards to its lineup of NVIDIA RTX PRO GPU offerings for professionals.

Built for professionals and the future of work, NVIDIA RTX PRO Desktop GPUs based on the revolutionary NVIDIA Blackwell architecture deliver exceptional performance for AI-enhanced creative, design, and engineering workflows. Featuring the latest generation Tensor Cores, RT Cores, and up to 96 GB of ultra-high-speed GDDR7 memory, they enable groundbreaking advancements in AI, ray tracing, and neural graphics technology. Supercharge workstations for the next era of AI-driven workflows with the ultimate tools for professionals.

ASUS Introduces Ascent GX10 AI Supercomputer Powered by NVIDIA GB10 Grace Blackwell Superchip

ASUS today announces its groundbreaking AI supercomputer, ASUS Ascent GX10, powered by the state-of-the-art NVIDIA GB10 Grace Blackwell Superchip. This revolutionary device places the formidable capabilities of a petaFLOP-scale AI supercomputer directly onto the desks of developers, AI researchers and data scientists around the globe.

As the size and complexity of generative AI models grow, local development efforts face increasing challenges. Prototyping, tuning and inferencing large models require substantial memory and compute performance. To address these needs, Ascent GX10 is designed to provide developers with a powerful, economical desktop solution for AI development.

NVIDIA & Partners Will Discuss Supercharging of AI Development at GTC 2025

Generative AI is redefining computing, unlocking new ways to build, train and optimize AI models on PCs and workstations. From content creation and large and small language models to software development, AI-powered PCs and workstations are transforming workflows and enhancing productivity. At GTC 2025, running March 17-21 in the San Jose Convention Center, experts from across the AI ecosystem will share insights on deploying AI locally, optimizing models and harnessing cutting-edge hardware and software to enhance AI workloads—highlighting key advancements in RTX AI PCs and workstations.

Develop and Deploy on RTX
RTX GPUs are built with specialized AI hardware called Tensor Cores that provide the compute performance needed to run the latest and most demanding AI models. These high-performance GPUs can help build digital humans, chatbots, AI-generated podcasts and more. With more than 100 million GeForce RTX and NVIDIA RTX GPUs users, developers have a large audience to target when new AI apps and features are deployed. In the session "Build Digital Humans, Chatbots, and AI-Generated Podcasts for RTX PCs and Workstations," Annamalai Chockalingam, senior product manager at NVIDIA, will showcase the end-to-end suite of tools developers can use to streamline development and deploy incredibly fast AI-enabled applications.

NVIDIA Recommends GeForce RTX 5070 Ti GPU to AI Content Creators

The NVIDIA GeForce RTX 5070 Ti graphics cards—built on the NVIDIA Blackwell architecture—are out now, ready to power generative AI content creation and accelerate creative performance. GeForce RTX 5070 Ti GPUs feature fifth-generation Tensor Cores with support for FP4, doubling performance and reducing VRAM requirements to run generative AI models.

In addition, the GPU comes equipped with two ninth-generation encoders and a sixth-generation decoder that add support for the 4:2:2 pro-grade color format and increase encoding quality for HEVC and AV1. This combo accelerates video editing workflows, reducing export times by 8x compared with single encoder GPUs without 4:2:2 support like the GeForce RTX 3090. The GeForce RTX 5070 Ti GPU also includes 16G B of fast GDDR7 memory and 896 GB/sec of total memory bandwidth—a 78% increase over the GeForce RTX 4070 Ti GPU.

DeepSeek-R1 Goes Live on NVIDIA NIM

DeepSeek-R1 is an open model with state-of-the-art reasoning capabilities. Instead of offering direct responses, reasoning models like DeepSeek-R1 perform multiple inference passes over a query, conducting chain-of-thought, consensus and search methods to generate the best answer. Performing this sequence of inference passes—using reason to arrive at the best answer—is known as test-time scaling. DeepSeek-R1 is a perfect example of this scaling law, demonstrating why accelerated computing is critical for the demands of agentic AI inference.

As models are allowed to iteratively "think" through the problem, they create more output tokens and longer generation cycles, so model quality continues to scale. Significant test-time compute is critical to enable both real-time inference and higher-quality responses from reasoning models like DeepSeek-R1, requiring larger inference deployments. R1 delivers leading accuracy for tasks demanding logical inference, reasoning, math, coding and language understanding while also delivering high inference efficiency.

New NVIDIA Broadcast AI Features Now Streaming With GeForce RTX 50 Series GPUs

New GeForce RTX 5090 and RTX 5080 GPUs - built on the NVIDIA Blackwell architecture - are now available to power generative AI content creation and accelerate creative performance. GeForce RTX 5090 and RTX 5080 GPUs feature fifth-generation Tensor Cores with support for FP4, reducing the VRAM requirements to run generative AI models while doubling performance. For example, Black Forest Labs' FLUX models - available on Hugging Face this week - at FP4 precision require less than 10 GB of VRAM, compared with over 23 GB at FP16. With a GeForce RTX 5090 GPU, the FLUX.1 [dev] model can generate images in just over five seconds, compared with 15 seconds on FP16 or 10 seconds on FP8 on a GeForce RTX 4090 GPU.

GeForce RTX 50 Series GPUs also come equipped with ninth-generation encoders and sixth-generation decoders that add support for 4:2:2 and increase encoding quality for HEVC and AV1. Fourth-generation RT Cores paired with DLSS 4 provide creators with super-smooth 3D rendering viewports. The GeForce RTX 5090 GPU includes 32 GB of ultra-fast GDDR7 memory and 1,792 GB/sec of total memory bandwidth - a 77% bandwidth increase over the GeForce RTX 4090 GPU. It also includes three encoders and two decoders, reducing export times by a third compared with the prior generation.

NVIDIA's Frame Generation Technology Could Come to GeForce RTX 30 Series

NVIDIA's deep learning super sampling (DLSS) has undergone many iterations to the current version 4 with the transformer model, delivering new technologies such as DLSS Multi Frame Generation, predicting multiple frames in advance to generate the upcoming frame, and increasing the frame output per second. However, not every NVIDIA GPU generation supports these more modern DLSS technologies. In an interview with Digital Foundry, Bryan Catanzaro, VP of Applied Deep Learning Research at NVIDIA, commented on trickling down some DLSS technologies to older GPU generations. For example, DLSS Ray Reconstruction, Super Resolution, and Deep Learning Anti-Aliasing (DLAA) work on NVIDIA GeForce RTX 20/30/40/50 series GPUs. However, the RTX 40 series carries an exclusive DLSS Frame Generation, and the newest RTX 50 series carries the DLSS Multi Frame Generation as an exclusive feature.

However, there is hope for older hardware. "I think this is primarily a question of optimization and also engineering and then the ultimate user experience. We're launching this Frame Generation, the best Multi Frame Generation technology, with the 50 Series, and we'll see what we're able to squeeze out of older hardware in the future." So, frame generation will most likely arrive on the older RTX 30 series, with even a slight possibility of the RTX 20 series getting the DLSS frame generation. Due to compute budget constraints, the multi-frame generation will most likely stay an RTX 50 series exclusive as it has more raw computing power to handle this technology.

NVIDIA Puts Grace Blackwell on Every Desk and at Every AI Developer's Fingertips

NVIDIA today unveiled NVIDIA Project DIGITS, a personal AI supercomputer that provides AI researchers, data scientists and students worldwide with access to the power of the NVIDIA Grace Blackwell platform. Project DIGITS features the new NVIDIA GB10 Grace Blackwell Superchip, offering a petaflop of AI computing performance for prototyping, fine-tuning and running large AI models.

With Project DIGITS, users can develop and run inference on models using their own desktop system, then seamlessly deploy the models on accelerated cloud or data center infrastructure. "AI will be mainstream in every application for every industry. With Project DIGITS, the Grace Blackwell Superchip comes to millions of developers," said Jensen Huang, founder and CEO of NVIDIA. "Placing an AI supercomputer on the desks of every data scientist, AI researcher and student empowers them to engage and shape the age of AI."

NVIDIA Blackwell GeForce RTX 50 Series Opens New World of AI Computer Graphics

NVIDIA today unveiled the most advanced consumer GPUs for gamers, creators and developers—the GeForce RTX 50 Series Desktop and Laptop GPUs. Powered by the NVIDIA Blackwell architecture, fifth-generation Tensor Cores and fourth-generation RT Cores, the GeForce RTX 50 Series delivers breakthroughs in AI-driven rendering, including neural shaders, digital human technologies, geometry and lighting.

"Blackwell, the engine of AI, has arrived for PC gamers, developers and creatives," said Jensen Huang, founder and CEO of NVIDIA. "Fusing AI-driven neural rendering and ray tracing, Blackwell is the most significant computer graphics innovation since we introduced programmable shading 25 years ago." The GeForce RTX 5090 GPU—the fastest GeForce RTX GPU to date—features 92 billion transistors, providing over 3,352 trillion AI operations per second (TOPS) of computing power. Blackwell architecture innovations and DLSS 4 mean the GeForce RTX 5090 GPU outperforms the GeForce RTX 4090 GPU by up to 2x.

VeriSilicon Unveils Next-Gen Vitality Architecture GPU IP Series

VeriSilicon today announced the launch of its latest Vitality architecture Graphics Processing Unit (GPU) IP series, designed to deliver high-performance computing across a wide range of applications, including cloud gaming, AI PC, and both discrete and integrated graphics cards.

VeriSilicon's new generation Vitality GPU architecture delivers exceptional advancements in computational performance with scalability. It incorporates advanced features such as a configurable Tensor Core AI accelerator and a 32 MB to 64 MB Level 3 (L3) cache, offering both powerful processing power and superior energy efficiency. Additionally, the Vitality architecture supports up to 128 channels of cloud gaming per core, addressing the needs of high concurrency and high image quality cloud-based entertainment, while enabling large-scale desktop gaming and applications on Windows systems. With robust support for Microsoft DirectX 12 APIs and AI acceleration libraries, this architecture is ideally suited for a wide range of performance-intensive applications and complex computing workloads.

NVIDIA Blackwell Sets New Standard for Generative AI in MLPerf Inference Benchmark

As enterprises race to adopt generative AI and bring new services to market, the demands on data center infrastructure have never been greater. Training large language models is one challenge, but delivering LLM-powered real-time services is another. In the latest round of MLPerf industry benchmarks, Inference v4.1, NVIDIA platforms delivered leading performance across all data center tests. The first-ever submission of the upcoming NVIDIA Blackwell platform revealed up to 4x more performance than the NVIDIA H100 Tensor Core GPU on MLPerf's biggest LLM workload, Llama 2 70B, thanks to its use of a second-generation Transformer Engine and FP4 Tensor Cores.

The NVIDIA H200 Tensor Core GPU delivered outstanding results on every benchmark in the data center category - including the latest addition to the benchmark, the Mixtral 8x7B mixture of experts (MoE) LLM, which features a total of 46.7 billion parameters, with 12.9 billion parameters active per token. MoE models have gained popularity as a way to bring more versatility to LLM deployments, as they're capable of answering a wide variety of questions and performing more diverse tasks in a single deployment. They're also more efficient since they only activate a few experts per inference - meaning they deliver results much faster than dense models of a similar size.

Supermicro Launches Plug-and-Play SuperCluster for NVIDIA Omniverse

Supermicro, Inc., a Total IT Solution Provider for AI, Cloud, Storage, and 5G/Edge, is announcing a new addition to its SuperCluster portfolio of plug-and-play AI infrastructure solutions for the NVIDIA Omniverse platform to deliver the high-performance generative AI-enhanced 3D workflows at enterprise scale. This new SuperCluster features the latest Supermicro NVIDIA OVX systems and allows enterprises to easily scale as workloads increase.

"Supermicro has led the industry in developing GPU-optimized products, traditionally for 3D graphics and application acceleration, and now for AI," said Charles Liang, president and CEO of Supermicro. "With the rise of AI, enterprises are seeking computing infrastructure that combines all these capabilities into a single package. Supermicro's SuperCluster features fully interconnected 4U PCIe GPU NVIDIA-Certified Systems for NVIDIA Omniverse, with up to 256 NVIDIA L40S PCIe GPUs per scalable unit. The system helps deliver high performance across the Omniverse platform, including generative AI integrations. By developing this SuperCluster for Omniverse, we're not just offering a product; we're providing a gateway to the future of application development and innovation."

New Performance Optimizations Supercharge NVIDIA RTX AI PCs for Gamers, Creators and Developers

NVIDIA today announced at Microsoft Build new AI performance optimizations and integrations for Windows that help deliver maximum performance on NVIDIA GeForce RTX AI PCs and NVIDIA RTX workstations. Large language models (LLMs) power some of the most exciting new use cases in generative AI and now run up to 3x faster with ONNX Runtime (ORT) and DirectML using the new NVIDIA R555 Game Ready Driver. ORT and DirectML are high-performance tools used to run AI models locally on Windows PCs.

WebNN, an application programming interface for web developers to deploy AI models, is now accelerated with RTX via DirectML, enabling web apps to incorporate fast, AI-powered capabilities. And PyTorch will support DirectML execution backends, enabling Windows developers to train and infer complex AI models on Windows natively. NVIDIA and Microsoft are collaborating to scale performance on RTX GPUs. These advancements build on NVIDIA's world-leading AI platform, which accelerates more than 500 applications and games on over 100 million RTX AI PCs and workstations worldwide.

NVIDIA Launches the RTX A400 and A1000 Professional Graphics Cards

AI integration across design and productivity applications is becoming the new standard, fueling demand for advanced computing performance. This means professionals and creatives will need to tap into increased compute power, regardless of the scale, complexity or scope of their projects. To meet this growing need, NVIDIA is expanding its RTX professional graphics offerings with two new NVIDIA Ampere architecture-based GPUs for desktops: the NVIDIA RTX A400 and NVIDIA RTX A1000.

They expand access to AI and ray tracing technology, equipping professionals with the tools they need to transform their daily workflows. The RTX A400 GPU introduces accelerated ray tracing and AI to the RTX 400 series GPUs. With 24 Tensor Cores for AI processing, it surpasses traditional CPU-based solutions, enabling professionals to run cutting-edge AI applications, such as intelligent chatbots and copilots, directly on their desktops. The GPU delivers real-time ray tracing, so creators can build vivid, physically accurate 3D renders that push the boundaries of creativity and realism.

Intel Launches Gaudi 3 AI Accelerator: 70% Faster Training, 50% Faster Inference Compared to NVIDIA H100, Promises Better Efficiency Too

During the Vision 2024 event, Intel announced its latest Gaudi 3 AI accelerator, promising significant improvements over its predecessor. Intel claims the Gaudi 3 offers up to 70% improvement in training performance, 50% better inference, and 40% better efficiency than Nvidia's H100 processors. The new AI accelerator is presented as a PCIe Gen 5 dual-slot add-in card with a 600 W TDP or an OAM module with 900 W. The PCIe card has the same peak 1,835 TeraFLOPS of FP8 performance as the OAM module despite a 300 W lower TDP. The PCIe version works as a group of four per system, while the OAM HL-325L modules can be run in an eight-accelerator configuration per server. This likely will result in a lower sustained performance, given the lower TDP, but it confirms that the same silicon is used, just finetuned with a lower frequency. Built on TSMC's N5 5 nm node, the AI accelerator features 64 Tensor Cores, delivering double the FP8 and quadruple FP16 performance over the previous generation Gaudi 2.

The Gaudi 3 AI chip comes with 128 GB of HBM2E with 3.7 TB/s of bandwidth and 24 200 Gbps Ethernet NICs, with dual 400 Gbps NICs used for scale-out. All of that is laid out on 10 tiles that make up the Gaudi 3 accelerator, which you can see pictured below. There is 96 MB of SRAM split between two compute tiles, which acts as a low-level cache that bridges data communication between Tensor Cores and HBM memory. Intel also announced support for the new performance-boosting standardized MXFP4 data format and is developing an AI NIC ASIC for Ultra Ethernet Consortium-compliant networking. The Gaudi 3 supports clusters of up to 8192 cards, coming from 1024 nodes comprised of systems with eight accelerators. It is on track for volume production in Q3, offering a cost-effective alternative to NVIDIA accelerators with the additional promise of a more open ecosystem. More information and a deeper dive can be found in the Gaudi 3 Whitepaper.

AAEON BOXER-8653AI & BOXER-8623AI Expand Vertical Market Potential in a More Compact Form

Leading provider of embedded PC solutions, AAEON, is delighted to announce the official launch of two new additions to its rich line of embedded AI systems, the BOXER-8653AI and BOXER-8623AI, which are powered by the NVIDIA Jetson Orin NX and Jetson Orin Nano, respectively. Measuring just 180 mm x 136 mm x 75 mm, both systems are compact and easily wall-mounted for discreet deployment, which AAEON indicate make them ideal for use in both indoor and outdoor settings such as factories and parking lots. Adding to this is the systems' environmental resilience, with the BOXER-8653AI sporting a wide -15°C to 60°C temperature tolerance and the BOXER-8623AI able to operate between -15°C and 65°C, with both supporting a 12 V ~ 24 V power input range via a 2-pin terminal block.

The BOXER-8653AI benefits from the NVIDIA Jetson Orin NX module, offering up to 70 TOPS of AI inference performance for applications that require extremely fast analysis of vast quantities of data. Meanwhile, the BOXER-8623AI utilizes the more efficient, yet still powerful NVIDIA Jetson Orin Nano module, capable of up to 40 TOPS. Both systems consequently make use of the 1024-core NVIDIA Ampere architecture GPU with 32 Tensor Cores.

NVIDIA AI GPU Customers Reportedly Selling Off Excess Hardware

The NVIDIA H100 Tensor Core GPU was last year's hot item for HPC and AI industry segments—the largest purchasers were reported to have acquired up to 150,000 units each. Demand grew so much that lead times of 36 to 52 weeks became the norm for H100-based server equipment. The latest rumblings indicate that things have stabilized—so much so that some organizations are "offloading chips" as the supply crunch cools off. Apparently it is more cost-effective to rent AI processing sessions through cloud service providers (CSPs)—the big three being Amazon Web Services, Google Cloud, and Microsoft Azure.

According to a mid-February Seeking Alpha report, wait times for the NVIDIA H100 80 GB GPU model have been reduced down to around three to four months. The Information believes that some companies have already reduced their order counts, while others have hardware sitting around, completely unused. Maintenance complexity and costs are reportedly cited as a main factors in "offloading" unneeded equipment, and turning to renting server time from CSPs. Despite improved supply conditions, AI GPU demand is still growing—driven mainly by organizations dealing with LLM models. A prime example being Open AI—as pointed out by The Information—insider murmurings have Sam Altman & Co. seeking out alternative solutions and production avenues.

NVIDIA Introduces NVIDIA RTX 2000 Ada Generation GPU

Generative AI is driving change across industries—and to take advantage of its benefits, businesses must select the right hardware to power their workflows. The new NVIDIA RTX 2000 Ada Generation GPU delivers the latest AI, graphics and compute technology to compact workstations, offering up to 1.5x the performance of the previous-generation RTX A2000 12 GB in professional workflows. From crafting stunning 3D environments to streamlining complex design reviews to refining industrial designs, the card's capabilities pave the way for an AI-accelerated future, empowering professionals to achieve more without compromising on performance or capabilities. Modern multi-application workflows, such as AI-powered tools, multi-display setups and high-resolution content, put significant demands on GPU memory. With 16 GB of memory in the RTX 2000 Ada, professionals can tap the latest technologies and tools to work faster and better with their data.

Powered by NVIDIA RTX technology, the new GPU delivers impressive realism in graphics with NVIDIA DLSS, delivering ultra-high-quality, photorealistic ray-traced images more than 3x faster than before. In addition, the RTX 2000 Ada enables an immersive experience for enterprise virtual-reality workflows, such as for product design and engineering design reviews. With its blend of performance, versatility and AI capabilities, the RTX 2000 Ada helps professionals across industries achieve efficiencies. Architects and urban planners can use it to accelerate visualization workflows and structural analysis, enhancing design precision. Product designers and engineers using industrial PCs can iterate rapidly on product designs with fast, photorealistic rendering and AI-powered generative design. Content creators can edit high-resolution videos and images seamlessly, and use AI for realistic visual effects and content creation assistance. And in vital embedded applications and edge computing, the RTX 2000 Ada can power real-time data processing for medical devices, optimize manufacturing processes with predictive maintenance and enable AI-driven intelligence in retail environments.

Mod Unlocks FSR 3 Fluid Motion Frames on Older NVIDIA GeForce RTX 20/30 Series Cards

NVIDIA's latest RTX 40 series graphics cards feature impressive new technologies like DLSS 3 that can significantly enhance performance and image quality in games. However, owners of older 20 and 30 series NVIDIA GeForce RTX cards cannot officially benefit from these cutting-edge advances. DLSS 3's Frame Generation feature, in particular, requires dedicated hardware only found in NVIDIA's brand new Ada Lovelace architecture. But the ingenious modding community has stepped in with a creative workaround solution where NVIDIA has refused to enable frame generation functionality on older generation hardware. A new third-party modification can unofficially activate both upscaling (FSR, DLAA, DLSS or XeSS) and AMD Fluid Motion Frames on older NVIDIA cards equipped with Tensor Cores. Replacing two key DLL files and a small edit to the Windows registry enables the "DLSS 3" option to be activated in games running on older hardware.

In testing conducted by Digital Foundry, this modification delivered up to a 75% FPS boost - on par with the performance uplift official DLSS 3 provides on RTX 40 series cards. Games like Cyberpunk 2077, Spider-Man: Miles Morales, and A Plague Tale: Requiem were used to benchmark performance. However, there can be minor visual flaws, including incorrect UI interpolation or random frame time fluctuations. Ironically, while the FSR 3 tech itself originates from AMD, the mod currently only works on NVIDIA cards. So, while not officially supported, the resourcefulness of the modding community has remarkably managed to bring cutting-edge frame generation to more NVIDIA owners - until AMD RDNA 3 cards can utilize it as well. This shows the incredible potential of community-driven software modification and innovation.

ASUS Announces Dual GeForce RTX 4060 Ti SSD Graphics Card

ASUS today announced the Dual GeForce RTX 4060 Ti SSD, the world's first graphics card equipped with an M.2 slot, allowing for a seamless cooling upgrade for high-performance NVMe drives.

Reimagined M.2 storage
At its core, this card has all of the same amazing features as the ASUS Dual GeForce RTX 4060 Ti 8GB. Third-generation RT Cores and fourth-generation Tensor Cores, now featuring DLSS 3.5 and frame generation, drive incredibly immersive real-time ray tracing experiences, enabling this graphics card to push the limits of how good modern games can look. Housed in a sleek 2.5-slot design that only requires a single 8-pin PCIe power connector, the Dual GeForce RTX 4060 Ti SSD can easily fit into almost any existing build.

NVIDIA Announces up to 5x Faster TensorRT-LLM for Windows, and ChatGPT API-like Interface

Even as CPU vendors are working to mainstream accelerated AI for client PCs, and Microsoft setting the pace for more AI in everyday applications with Windows 11 23H2 Update; NVIDIA is out there reminding you that every GeForce RTX GPU is an AI accelerator. This is thanks to its Tensor cores, and the SIMD muscle of the ubiquitous CUDA cores. NVIDIA has been making these for over 5 years now, and has an install base of over 100 million. The company is hence focusing on bring generative AI acceleration to more client- and enthusiast relevant use-cases, such as large language models.

NVIDIA at the Microsoft Ignite event announced new optimizations, models, and resources to bring accelerated AI to everyone with an NVIDIA GPU that meets the hardware requirements. To begin with, the company introduced an update to TensorRT-LLM for Windows, a library that leverages NVIDIA RTX architecture for accelerating large language models (LLMs). The new TensorRT-LLM version 0.6.0 will release later this month, and improve LLM inference performance by up to 5 times in terms of tokens per second, when compared to the initial release of TensorRT-LLM from October 2023. In addition, TensorRT-LLM 0.6.0 will introduce support for popular LLMs, including Mistral 7B and Nemtron-3 8B. Accelerating these two will require a GeForce RTX 30-series "Ampere" or 40-series "Ada" GPU with at least 8 GB of main memory.

Striking Performance: LLMs up to 4x Faster on GeForce RTX With TensorRT-LLM

Generative AI is one of the most important trends in the history of personal computing, bringing advancements to gaming, creativity, video, productivity, development and more. And GeForce RTX and NVIDIA RTX GPUs, which are packed with dedicated AI processors called Tensor Cores, are bringing the power of generative AI natively to more than 100 million Windows PCs and workstations.

Today, generative AI on PC is getting up to 4x faster via TensorRT-LLM for Windows, an open-source library that accelerates inference performance for the latest AI large language models, like Llama 2 and Code Llama. This follows the announcement of TensorRT-LLM for data centers last month. NVIDIA has also released tools to help developers accelerate their LLMs, including scripts that optimize custom models with TensorRT-LLM, TensorRT-optimized open-source models and a developer reference project that showcases both the speed and quality of LLM responses.

Dell Technologies Expands Generative AI Portfolio

Dell Technologies expands its Dell Generative AI Solutions portfolio, helping businesses transform how they work along every step of their generative AI (GenAI) journeys. "To maximize AI efforts and support workloads across public clouds, on-premises environments and at the edge, companies need a robust data foundation with the right infrastructure, software and services," said Jeff Boudreau, chief AI officer, Dell Technologies. "That's what we are building with our expanded validated designs, professional services, modern data lakehouse and the world's broadest GenAI solutions portfolio."

Customizing GenAI models to maximize proprietary data
The Dell Validated Design for Generative AI with NVIDIA for Model Customization offers pre-trained models that extract intelligence from data without building models from scratch. This solution provides best practices for customizing and fine-tuning GenAI models based on desired outcomes while helping keep information secure and on-premises. With a scalable blueprint for customization, organizations now have multiple ways to tailor GenAI models to accomplish specific tasks with their proprietary data. Its modular and flexible design supports a wide range of computational requirements and use cases, spanning training diffusion, transfer learning and prompt tuning.

NVIDIA H100 GPUs Now Available on AWS Cloud

AWS users can now access the leading performance demonstrated in industry benchmarks of AI training and inference. The cloud giant officially switched on a new Amazon EC2 P5 instance powered by NVIDIA H100 Tensor Core GPUs. The service lets users scale generative AI, high performance computing (HPC) and other applications with a click from a browser.

The news comes in the wake of AI's iPhone moment. Developers and researchers are using large language models (LLMs) to uncover new applications for AI almost daily. Bringing these new use cases to market requires the efficiency of accelerated computing. The NVIDIA H100 GPU delivers supercomputing-class performance through architectural innovations including fourth-generation Tensor Cores, a new Transformer Engine for accelerating LLMs and the latest NVLink technology that lets GPUs talk to each other at 900 GB/sec.

Chinese Tech Firms Buying Plenty of NVIDIA Enterprise GPUs

TikTok developer ByteDance, and other major Chinese tech firms including Tencent, Alibaba and Baidu are reported (by local media) to be snapping up lots of NVIDIA HPC GPUs, with even more orders placed this year. ByteDance is alleged to have spent enough on new products in 2023 to match the expenditure of the entire Chinese tech market on similar NVIDIA purchases for FY2022. According to news publication Jitwei, ByteDance has placed orders totaling $1 billion so far this year with Team Green—the report suggests that a mix of A100 and H800 GPU shipments have been sent to the company's mainland data centers.

The older Ampere-based A100 units were likely ordered prior to trade sanctions enforced on China post-August 2022, with further wiggle room allowed—meaning that shipments continued until September. The H800 GPU is a cut-down variant of 2022's flagship "Hopper" H100 model, designed specifically for the Chinese enterprise market—with reduced performance in order to meet export restriction standards. The H800 costs around $10,000 (average sale price per accelerator) according to Tom's Hardware, so it must offer some level of potency at that price. ByteDance has ordered roughly 100,000 units—with an unspecified split between H800 and A100 stock. Despite the development of competing HPC products within China, it seems that the nation's top-flight technology companies are heading directly to NVIDIA to acquire the best-of-the-best and highly mature AI processing hardware.
Return to Keyword Browsing
Mar 25th, 2025 08:30 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts