News Posts matching #Tensor Cores

NVIDIA TensorRT Boosts Stable Diffusion 3.5 Performance on NVIDIA GeForce RTX and RTX PRO GPUs

Press Release by

Jun 12th, 2025 08:08 Discuss (10 Comments)

Generative AI has reshaped how people create, imagine and interact with digital content. As AI models continue to grow in capability and complexity, they require more VRAM, or video random access memory. The base Stable Diffusion 3.5 Large model, for example, uses over 18 GB of VRAM - limiting the number of systems that can run it well. By applying quantization to the model, noncritical layers can be removed or run with lower precision. NVIDIA GeForce RTX 40 Series and the Ada Lovelace generation of NVIDIA RTX PRO GPUs support FP8 quantization to help run these quantized models, and the latest-generation NVIDIA Blackwell GPUs also add support for FP4.

NVIDIA collaborated with Stability AI to quantize its latest model, Stable Diffusion (SD) 3.5 Large, to FP8 - reducing VRAM consumption by 40%. Further optimizations to SD3.5 Large and Medium with the NVIDIA TensorRT software development kit (SDK) double performance. In addition, TensorRT has been reimagined for RTX AI PCs, combining its industry-leading performance with just-in-time (JIT), on-device engine building and an 8x smaller package size for seamless AI deployment to more than 100 million RTX AI PCs. TensorRT for RTX is now available as a standalone SDK for developers.

Read full story

AnythingLLM App Best Experienced on NVIDIA RTX AI PCs

Press Release by

T0@st

May 29th, 2025 12:02 Discuss (0 Comments)

Large language models (LLMs), trained on datasets with billions of tokens, can generate high-quality content. They're the backbone for many of the most popular AI applications, including chatbots, assistants, code generators and much more. One of today's most accessible ways to work with LLMs is with AnythingLLM, a desktop app built for enthusiasts who want an all-in-one, privacy-focused AI assistant directly on their PC. With new support for NVIDIA NIM microservices on NVIDIA GeForce RTX and NVIDIA RTX PRO GPUs, AnythingLLM users can now get even faster performance for more responsive local AI workflows.

What Is AnythingLLM?
AnythingLLM is an all-in-one AI application that lets users run local LLMs, retrieval-augmented generation (RAG) systems and agentic tools. It acts as a bridge between a user's preferred LLMs and their data, and enables access to tools (called skills), making it easier and more efficient to use LLMs for specific tasks.

Read full story

Palit Showcases Pandora NXNano with 157 AI TOPS at Computex 2025

Computex by

AleksandarK

May 22nd, 2025 08:39 Discuss (0 Comments)

At Computex 2025, Palit showed off its latest Pandora NXNano mini PC, bringing the power of NVIDIA Jetson Orin NX Super to a truly pocket-sized AI platform. At its heart sits an eight-core Arm Cortex-A78AE CPU paired with a 1,024-core Ampere-architecture GPU (32 Tensor Cores), delivering up to 157 TOPS of sparse AI throughput and 78 TOPS dense performance. Up to 16 GB of LPDDR5 memory (102.4 GB/s bandwidth) and a pre-installed 128 GB PCIe Gen 4 SSD ensure both data-intensive models and large datasets run smoothly. Connectivity is generous: dual 10/100/1000 Mb Ethernet, two USB 3.2 Gen 2 Type-A ports, a USB 3.2 Gen 2 Type-C OTG port, plus USB 2.0 and HDMI 2.0 outputs. Four M.2 slots (for storage, Wi-Fi, 5G/LTE or video-capture cards), an 8-lane MIPI CSI-2 camera interface, and headers for I²C, SPI, UART, GPIO, and CAN Bus round out a very flexible I/O package.

Housed in a sleek 145×123 ×66 mm chassis, the NXNano balances performance and thermals with "superior thermal design" that includes two 50 mm high-efficiency fans, ensuring sustained workloads remain cool even under heavy AI inference. Removable base and side panels allow easy customization, whether adding 3D-printed shells or extra modules. The rugged aluminium frame accommodates DC input from 12 to 36 V. At just 470 g, this DIY-friendly unit is ideal for edge deployments in retail, digital signage, robotics, and education, putting enterprise-grade AI in a truly compact form factor.

GIGAIPC Unveils Jetson Orin Series at Computex 2025

Computex Press Release by

Nomad76

May 16th, 2025 13:36 Discuss (3 Comments)

As COMPUTEX 2025, one of the most anticipated global tech events, prepares to open its doors in Taipei, AI applications are set to reach new heights. GIGAIPC, the industrial computing and edge AI subsidiary of GIGABYTE Technology, will unveil three innovative AI edge computing solutions at COMPUTEX 2025, showcasing its expertise in industrial-grade system design. Powered by NVIDIA Jetson Orin modules, the flagship QN-ORAX32-A1, QN-ORNX16GH-A1, and QN-ORNX16 deliver exceptional AI performance for smart manufacturing, intelligent surveillance, smart healthcare, smart retail, and AIoT applications, setting new benchmarks for edge computing.

The QN-ORAX32-A1, built on the NVIDIA Jetson AGX Orin 32 GB module, features an 8-core Arm v8.2 64-bit CPU and an NVIDIA Ampere GPU with 1792 CUDA cores and 56 Tensor cores, delivering up to 200 TOPS of AI performance—six times faster than its predecessor. This enhanced computing power makes it well-suited for high-load data processing and complex AI models. Beyond its computing performance, the QN-ORAX32-A1 is also built for real-world deployment. To ensure long-term durability in harsh industrial environments, the system incorporates a fanless thermal design and wide-range DC power input.

Read full story

Nintendo Switch 2's Chipset Reportedly Confirmed as Tegra "T239" Unit

T0@st

Apr 23rd, 2025 13:22 Discuss (4 Comments)

An alleged partial close-up capture of the Nintendo Switch 2's chipset has leaked out; courtesy of Kurnal (@Kurnalsalts). This fresh leak is being hyped up as putting an end to all online debate regarding the upcoming hybrid console's technological underpinnings. Despite late 2024/early 2025 reports pointing to a custom NVIDIA "T239" SoC design, certain voices continued to produce conjecture about a more "cutting edge" solution. Surprisingly, Team Green's PR department did issue a statement about the Switch 2 being powered by: "a custom processor featuring an NVIDIA GPU with dedicated RT Cores and Tensor Cores for stunning visuals and AI-driven enhancements."

As expected, Nintendo staffers remained guarded during recent press junkets—in-depth tech talk was deferred in NVIDIA's general direction. Kurnal's sharing of a speculative "T239" partial die shot does not provide any major new revelations or insights—as discussed on the Nintendo Switch 2 Subreddit, tech enthusiasts continue to rely on specification details from the big hack of NVIDIA repositories (three years ago). Newer speculation has focused on Nintendo's choice of foundry—Digital Foundry's Richard Leadbetter continues to express his personal belief that Nintendo has selected a Samsung 8 nm DUV foundry node. In opposition, certain critics have persisted with a 5 nm EUV node process theory.

Official: Nintendo Switch 2 Leveled Up With NVIDIA "Custom Processor" & AI-Powered Tech

Press Release by

T0@st

Apr 3rd, 2025 12:01 Discuss (43 Comments)

The Nintendo Switch 2, unveiled April 2, takes performance to the next level, powered by a custom NVIDIA processor featuring an NVIDIA GPU with dedicated RT Cores and Tensor Cores for stunning visuals and AI-driven enhancements. With 1,000 engineer-years of effort across every element—from system and chip design to a custom GPU, APIs and world-class development tools—the Nintendo Switch 2 brings major upgrades. The new console enables up to 4K gaming in TV mode and up to 120 FPS at 1080p in handheld mode. Nintendo Switch 2 also supports HDR, and AI upscaling to sharpen visuals and smooth gameplay.

AI and Ray Tracing for Next-Level Visuals
The new RT Cores bring real-time ray tracing, delivering lifelike lighting, reflections and shadows for more immersive worlds. Tensor Cores power AI-driven features like Deep Learning Super Sampling (DLSS), boosting resolution for sharper details without sacrificing image quality. Tensor Cores also enable AI-powered face tracking and background removal in video chat use cases, enhancing social gaming and streaming. With millions of players worldwide, the Nintendo Switch has become a gaming powerhouse and home to Nintendo's storied franchises. Its hybrid design redefined console gaming, bridging TV and handheld play.

Read full story

PNY Announces Support for the New NVIDIA RTX PRO Blackwell Graphics Card Family

Press Release by

GFreeman

Mar 19th, 2025 03:26 Discuss (1 Comment)

PNY announced today it is adding the new NVIDIA RTX PRO 6000 Blackwell Workstation Edition, RTX PRO 6000 Blackwell Max-Q Workstation Edition, RTX PRO 5000 Blackwell, RTX PRO 4500 Blackwell, and RTX PRO 4000 Blackwell graphics cards to its lineup of NVIDIA RTX PRO GPU offerings for professionals.

Built for professionals and the future of work, NVIDIA RTX PRO Desktop GPUs based on the revolutionary NVIDIA Blackwell architecture deliver exceptional performance for AI-enhanced creative, design, and engineering workflows. Featuring the latest generation Tensor Cores, RT Cores, and up to 96 GB of ultra-high-speed GDDR7 memory, they enable groundbreaking advancements in AI, ray tracing, and neural graphics technology. Supercharge workstations for the next era of AI-driven workflows with the ultimate tools for professionals.

Read full story

ASUS Introduces Ascent GX10 AI Supercomputer Powered by NVIDIA GB10 Grace Blackwell Superchip

Press Release by

btarunr

Mar 18th, 2025 23:45 Discuss (1 Comment)

ASUS today announces its groundbreaking AI supercomputer, ASUS Ascent GX10, powered by the state-of-the-art NVIDIA GB10 Grace Blackwell Superchip. This revolutionary device places the formidable capabilities of a petaFLOP-scale AI supercomputer directly onto the desks of developers, AI researchers and data scientists around the globe.

As the size and complexity of generative AI models grow, local development efforts face increasing challenges. Prototyping, tuning and inferencing large models require substantial memory and compute performance. To address these needs, Ascent GX10 is designed to provide developers with a powerful, economical desktop solution for AI development.

Read full story

NVIDIA & Partners Will Discuss Supercharging of AI Development at GTC 2025

Press Release by

T0@st

Feb 26th, 2025 10:27 Discuss (1 Comment)

Generative AI is redefining computing, unlocking new ways to build, train and optimize AI models on PCs and workstations. From content creation and large and small language models to software development, AI-powered PCs and workstations are transforming workflows and enhancing productivity. At GTC 2025, running March 17-21 in the San Jose Convention Center, experts from across the AI ecosystem will share insights on deploying AI locally, optimizing models and harnessing cutting-edge hardware and software to enhance AI workloads—highlighting key advancements in RTX AI PCs and workstations.

Develop and Deploy on RTX
RTX GPUs are built with specialized AI hardware called Tensor Cores that provide the compute performance needed to run the latest and most demanding AI models. These high-performance GPUs can help build digital humans, chatbots, AI-generated podcasts and more. With more than 100 million GeForce RTX and NVIDIA RTX GPUs users, developers have a large audience to target when new AI apps and features are deployed. In the session "Build Digital Humans, Chatbots, and AI-Generated Podcasts for RTX PCs and Workstations," Annamalai Chockalingam, senior product manager at NVIDIA, will showcase the end-to-end suite of tools developers can use to streamline development and deploy incredibly fast AI-enabled applications.

Read full story

NVIDIA Recommends GeForce RTX 5070 Ti GPU to AI Content Creators

Press Release by

T0@st

Feb 21st, 2025 10:51 Discuss (40 Comments)

The NVIDIA GeForce RTX 5070 Ti graphics cards—built on the NVIDIA Blackwell architecture—are out now, ready to power generative AI content creation and accelerate creative performance. GeForce RTX 5070 Ti GPUs feature fifth-generation Tensor Cores with support for FP4, doubling performance and reducing VRAM requirements to run generative AI models.

In addition, the GPU comes equipped with two ninth-generation encoders and a sixth-generation decoder that add support for the 4:2:2 pro-grade color format and increase encoding quality for HEVC and AV1. This combo accelerates video editing workflows, reducing export times by 8x compared with single encoder GPUs without 4:2:2 support like the GeForce RTX 3090. The GeForce RTX 5070 Ti GPU also includes 16G B of fast GDDR7 memory and 896 GB/sec of total memory bandwidth—a 78% increase over the GeForce RTX 4070 Ti GPU.

Read full story

DeepSeek-R1 Goes Live on NVIDIA NIM

Press Release by

T0@st

Jan 31st, 2025 09:59 Discuss (9 Comments)

DeepSeek-R1 is an open model with state-of-the-art reasoning capabilities. Instead of offering direct responses, reasoning models like DeepSeek-R1 perform multiple inference passes over a query, conducting chain-of-thought, consensus and search methods to generate the best answer. Performing this sequence of inference passes—using reason to arrive at the best answer—is known as test-time scaling. DeepSeek-R1 is a perfect example of this scaling law, demonstrating why accelerated computing is critical for the demands of agentic AI inference.

As models are allowed to iteratively "think" through the problem, they create more output tokens and longer generation cycles, so model quality continues to scale. Significant test-time compute is critical to enable both real-time inference and higher-quality responses from reasoning models like DeepSeek-R1, requiring larger inference deployments. R1 delivers leading accuracy for tasks demanding logical inference, reasoning, math, coding and language understanding while also delivering high inference efficiency.

Read full story

New NVIDIA Broadcast AI Features Now Streaming With GeForce RTX 50 Series GPUs

Press Release by

GFreeman

Jan 30th, 2025 10:59 Discuss (3 Comments)

New GeForce RTX 5090 and RTX 5080 GPUs - built on the NVIDIA Blackwell architecture - are now available to power generative AI content creation and accelerate creative performance. GeForce RTX 5090 and RTX 5080 GPUs feature fifth-generation Tensor Cores with support for FP4, reducing the VRAM requirements to run generative AI models while doubling performance. For example, Black Forest Labs' FLUX models - available on Hugging Face this week - at FP4 precision require less than 10 GB of VRAM, compared with over 23 GB at FP16. With a GeForce RTX 5090 GPU, the FLUX.1 [dev] model can generate images in just over five seconds, compared with 15 seconds on FP16 or 10 seconds on FP8 on a GeForce RTX 4090 GPU.

GeForce RTX 50 Series GPUs also come equipped with ninth-generation encoders and sixth-generation decoders that add support for 4:2:2 and increase encoding quality for HEVC and AV1. Fourth-generation RT Cores paired with DLSS 4 provide creators with super-smooth 3D rendering viewports. The GeForce RTX 5090 GPU includes 32 GB of ultra-fast GDDR7 memory and 1,792 GB/sec of total memory bandwidth - a 77% bandwidth increase over the GeForce RTX 4090 GPU. It also includes three encoders and two decoders, reducing export times by a third compared with the prior generation.

Read full story

NVIDIA's Frame Generation Technology Could Come to GeForce RTX 30 Series

AleksandarK

Jan 20th, 2025 05:25 Discuss (64 Comments)

NVIDIA's deep learning super sampling (DLSS) has undergone many iterations to the current version 4 with the transformer model, delivering new technologies such as DLSS Multi Frame Generation, predicting multiple frames in advance to generate the upcoming frame, and increasing the frame output per second. However, not every NVIDIA GPU generation supports these more modern DLSS technologies. In an interview with Digital Foundry, Bryan Catanzaro, VP of Applied Deep Learning Research at NVIDIA, commented on trickling down some DLSS technologies to older GPU generations. For example, DLSS Ray Reconstruction, Super Resolution, and Deep Learning Anti-Aliasing (DLAA) work on NVIDIA GeForce RTX 20/30/40/50 series GPUs. However, the RTX 40 series carries an exclusive DLSS Frame Generation, and the newest RTX 50 series carries the DLSS Multi Frame Generation as an exclusive feature.

However, there is hope for older hardware. "I think this is primarily a question of optimization and also engineering and then the ultimate user experience. We're launching this Frame Generation, the best Multi Frame Generation technology, with the 50 Series, and we'll see what we're able to squeeze out of older hardware in the future." So, frame generation will most likely arrive on the older RTX 30 series, with even a slight possibility of the RTX 20 series getting the DLSS frame generation. Due to compute budget constraints, the multi-frame generation will most likely stay an RTX 50 series exclusive as it has more raw computing power to handle this technology.

Read full story

NVIDIA Puts Grace Blackwell on Every Desk and at Every AI Developer's Fingertips

CES Press Release by

btarunr

Jan 6th, 2025 22:27 Discuss (14 Comments)

NVIDIA today unveiled NVIDIA Project DIGITS, a personal AI supercomputer that provides AI researchers, data scientists and students worldwide with access to the power of the NVIDIA Grace Blackwell platform. Project DIGITS features the new NVIDIA GB10 Grace Blackwell Superchip, offering a petaflop of AI computing performance for prototyping, fine-tuning and running large AI models.

With Project DIGITS, users can develop and run inference on models using their own desktop system, then seamlessly deploy the models on accelerated cloud or data center infrastructure. "AI will be mainstream in every application for every industry. With Project DIGITS, the Grace Blackwell Superchip comes to millions of developers," said Jensen Huang, founder and CEO of NVIDIA. "Placing an AI supercomputer on the desks of every data scientist, AI researcher and student empowers them to engage and shape the age of AI."

Read full story

NVIDIA Blackwell GeForce RTX 50 Series Opens New World of AI Computer Graphics

CES Press Release by

btarunr

Jan 6th, 2025 22:18 Discuss (39 Comments)

NVIDIA today unveiled the most advanced consumer GPUs for gamers, creators and developers—the GeForce RTX 50 Series Desktop and Laptop GPUs. Powered by the NVIDIA Blackwell architecture, fifth-generation Tensor Cores and fourth-generation RT Cores, the GeForce RTX 50 Series delivers breakthroughs in AI-driven rendering, including neural shaders, digital human technologies, geometry and lighting.

"Blackwell, the engine of AI, has arrived for PC gamers, developers and creatives," said Jensen Huang, founder and CEO of NVIDIA. "Fusing AI-driven neural rendering and ray tracing, Blackwell is the most significant computer graphics innovation since we introduced programmable shading 25 years ago." The GeForce RTX 5090 GPU—the fastest GeForce RTX GPU to date—features 92 billion transistors, providing over 3,352 trillion AI operations per second (TOPS) of computing power. Blackwell architecture innovations and DLSS 4 mean the GeForce RTX 5090 GPU outperforms the GeForce RTX 4090 GPU by up to 2x.

Read full story

VeriSilicon Unveils Next-Gen Vitality Architecture GPU IP Series

Press Release by

Nomad76

Dec 19th, 2024 10:13 Discuss (7 Comments)

VeriSilicon today announced the launch of its latest Vitality architecture Graphics Processing Unit (GPU) IP series, designed to deliver high-performance computing across a wide range of applications, including cloud gaming, AI PC, and both discrete and integrated graphics cards.

VeriSilicon's new generation Vitality GPU architecture delivers exceptional advancements in computational performance with scalability. It incorporates advanced features such as a configurable Tensor Core AI accelerator and a 32 MB to 64 MB Level 3 (L3) cache, offering both powerful processing power and superior energy efficiency. Additionally, the Vitality architecture supports up to 128 channels of cloud gaming per core, addressing the needs of high concurrency and high image quality cloud-based entertainment, while enabling large-scale desktop gaming and applications on Windows systems. With robust support for Microsoft DirectX 12 APIs and AI acceleration libraries, this architecture is ideally suited for a wide range of performance-intensive applications and complex computing workloads.

Read full story

NVIDIA Blackwell Sets New Standard for Generative AI in MLPerf Inference Benchmark

Press Release by

GFreeman

Aug 28th, 2024 14:28 Discuss (3 Comments)

As enterprises race to adopt generative AI and bring new services to market, the demands on data center infrastructure have never been greater. Training large language models is one challenge, but delivering LLM-powered real-time services is another. In the latest round of MLPerf industry benchmarks, Inference v4.1, NVIDIA platforms delivered leading performance across all data center tests. The first-ever submission of the upcoming NVIDIA Blackwell platform revealed up to 4x more performance than the NVIDIA H100 Tensor Core GPU on MLPerf's biggest LLM workload, Llama 2 70B, thanks to its use of a second-generation Transformer Engine and FP4 Tensor Cores.

The NVIDIA H200 Tensor Core GPU delivered outstanding results on every benchmark in the data center category - including the latest addition to the benchmark, the Mixtral 8x7B mixture of experts (MoE) LLM, which features a total of 46.7 billion parameters, with 12.9 billion parameters active per token. MoE models have gained popularity as a way to bring more versatility to LLM deployments, as they're capable of answering a wide variety of questions and performing more diverse tasks in a single deployment. They're also more efficient since they only activate a few experts per inference - meaning they deliver results much faster than dense models of a similar size.

Read full story

Supermicro Launches Plug-and-Play SuperCluster for NVIDIA Omniverse

Press Release by

Nomad76

Aug 1st, 2024 11:01 Discuss (0 Comments)

Supermicro, Inc., a Total IT Solution Provider for AI, Cloud, Storage, and 5G/Edge, is announcing a new addition to its SuperCluster portfolio of plug-and-play AI infrastructure solutions for the NVIDIA Omniverse platform to deliver the high-performance generative AI-enhanced 3D workflows at enterprise scale. This new SuperCluster features the latest Supermicro NVIDIA OVX systems and allows enterprises to easily scale as workloads increase.

"Supermicro has led the industry in developing GPU-optimized products, traditionally for 3D graphics and application acceleration, and now for AI," said Charles Liang, president and CEO of Supermicro. "With the rise of AI, enterprises are seeking computing infrastructure that combines all these capabilities into a single package. Supermicro's SuperCluster features fully interconnected 4U PCIe GPU NVIDIA-Certified Systems for NVIDIA Omniverse, with up to 256 NVIDIA L40S PCIe GPUs per scalable unit. The system helps deliver high performance across the Omniverse platform, including generative AI integrations. By developing this SuperCluster for Omniverse, we're not just offering a product; we're providing a gateway to the future of application development and innovation."

Read full story

New Performance Optimizations Supercharge NVIDIA RTX AI PCs for Gamers, Creators and Developers

Press Release by

GFreeman

May 21st, 2024 14:39 Discuss (0 Comments)

NVIDIA today announced at Microsoft Build new AI performance optimizations and integrations for Windows that help deliver maximum performance on NVIDIA GeForce RTX AI PCs and NVIDIA RTX workstations. Large language models (LLMs) power some of the most exciting new use cases in generative AI and now run up to 3x faster with ONNX Runtime (ORT) and DirectML using the new NVIDIA R555 Game Ready Driver. ORT and DirectML are high-performance tools used to run AI models locally on Windows PCs.

WebNN, an application programming interface for web developers to deploy AI models, is now accelerated with RTX via DirectML, enabling web apps to incorporate fast, AI-powered capabilities. And PyTorch will support DirectML execution backends, enabling Windows developers to train and infer complex AI models on Windows natively. NVIDIA and Microsoft are collaborating to scale performance on RTX GPUs. These advancements build on NVIDIA's world-leading AI platform, which accelerates more than 500 applications and games on over 100 million RTX AI PCs and workstations worldwide.

Read full story

NVIDIA Launches the RTX A400 and A1000 Professional Graphics Cards

Press Release by

btarunr

Apr 17th, 2024 00:40 Discuss (9 Comments)

AI integration across design and productivity applications is becoming the new standard, fueling demand for advanced computing performance. This means professionals and creatives will need to tap into increased compute power, regardless of the scale, complexity or scope of their projects. To meet this growing need, NVIDIA is expanding its RTX professional graphics offerings with two new NVIDIA Ampere architecture-based GPUs for desktops: the NVIDIA RTX A400 and NVIDIA RTX A1000.

They expand access to AI and ray tracing technology, equipping professionals with the tools they need to transform their daily workflows. The RTX A400 GPU introduces accelerated ray tracing and AI to the RTX 400 series GPUs. With 24 Tensor Cores for AI processing, it surpasses traditional CPU-based solutions, enabling professionals to run cutting-edge AI applications, such as intelligent chatbots and copilots, directly on their desktops. The GPU delivers real-time ray tracing, so creators can build vivid, physically accurate 3D renders that push the boundaries of creativity and realism.

Read full story

Intel Launches Gaudi 3 AI Accelerator: 70% Faster Training, 50% Faster Inference Compared to NVIDIA H100, Promises Better Efficiency Too

AleksandarK

Apr 9th, 2024 12:59 Discuss (14 Comments)

During the Vision 2024 event, Intel announced its latest Gaudi 3 AI accelerator, promising significant improvements over its predecessor. Intel claims the Gaudi 3 offers up to 70% improvement in training performance, 50% better inference, and 40% better efficiency than Nvidia's H100 processors. The new AI accelerator is presented as a PCIe Gen 5 dual-slot add-in card with a 600 W TDP or an OAM module with 900 W. The PCIe card has the same peak 1,835 TeraFLOPS of FP8 performance as the OAM module despite a 300 W lower TDP. The PCIe version works as a group of four per system, while the OAM HL-325L modules can be run in an eight-accelerator configuration per server. This likely will result in a lower sustained performance, given the lower TDP, but it confirms that the same silicon is used, just finetuned with a lower frequency. Built on TSMC's N5 5 nm node, the AI accelerator features 64 Tensor Cores, delivering double the FP8 and quadruple FP16 performance over the previous generation Gaudi 2.

The Gaudi 3 AI chip comes with 128 GB of HBM2E with 3.7 TB/s of bandwidth and 24 200 Gbps Ethernet NICs, with dual 400 Gbps NICs used for scale-out. All of that is laid out on 10 tiles that make up the Gaudi 3 accelerator, which you can see pictured below. There is 96 MB of SRAM split between two compute tiles, which acts as a low-level cache that bridges data communication between Tensor Cores and HBM memory. Intel also announced support for the new performance-boosting standardized MXFP4 data format and is developing an AI NIC ASIC for Ultra Ethernet Consortium-compliant networking. The Gaudi 3 supports clusters of up to 8192 cards, coming from 1024 nodes comprised of systems with eight accelerators. It is on track for volume production in Q3, offering a cost-effective alternative to NVIDIA accelerators with the additional promise of a more open ecosystem. More information and a deeper dive can be found in the Gaudi 3 Whitepaper.

AAEON BOXER-8653AI & BOXER-8623AI Expand Vertical Market Potential in a More Compact Form

Press Release by

btarunr

Feb 28th, 2024 21:04 Discuss (0 Comments)

Leading provider of embedded PC solutions, AAEON, is delighted to announce the official launch of two new additions to its rich line of embedded AI systems, the BOXER-8653AI and BOXER-8623AI, which are powered by the NVIDIA Jetson Orin NX and Jetson Orin Nano, respectively. Measuring just 180 mm x 136 mm x 75 mm, both systems are compact and easily wall-mounted for discreet deployment, which AAEON indicate make them ideal for use in both indoor and outdoor settings such as factories and parking lots. Adding to this is the systems' environmental resilience, with the BOXER-8653AI sporting a wide -15°C to 60°C temperature tolerance and the BOXER-8623AI able to operate between -15°C and 65°C, with both supporting a 12 V ~ 24 V power input range via a 2-pin terminal block.

The BOXER-8653AI benefits from the NVIDIA Jetson Orin NX module, offering up to 70 TOPS of AI inference performance for applications that require extremely fast analysis of vast quantities of data. Meanwhile, the BOXER-8623AI utilizes the more efficient, yet still powerful NVIDIA Jetson Orin Nano module, capable of up to 40 TOPS. Both systems consequently make use of the 1024-core NVIDIA Ampere architecture GPU with 32 Tensor Cores.

Read full story

NVIDIA AI GPU Customers Reportedly Selling Off Excess Hardware

T0@st

Feb 27th, 2024 09:59 Discuss (9 Comments)

The NVIDIA H100 Tensor Core GPU was last year's hot item for HPC and AI industry segments—the largest purchasers were reported to have acquired up to 150,000 units each. Demand grew so much that lead times of 36 to 52 weeks became the norm for H100-based server equipment. The latest rumblings indicate that things have stabilized—so much so that some organizations are "offloading chips" as the supply crunch cools off. Apparently it is more cost-effective to rent AI processing sessions through cloud service providers (CSPs)—the big three being Amazon Web Services, Google Cloud, and Microsoft Azure.

According to a mid-February Seeking Alpha report, wait times for the NVIDIA H100 80 GB GPU model have been reduced down to around three to four months. The Information believes that some companies have already reduced their order counts, while others have hardware sitting around, completely unused. Maintenance complexity and costs are reportedly cited as a main factors in "offloading" unneeded equipment, and turning to renting server time from CSPs. Despite improved supply conditions, AI GPU demand is still growing—driven mainly by organizations dealing with LLM models. A prime example being Open AI—as pointed out by The Information—insider murmurings have Sam Altman & Co. seeking out alternative solutions and production avenues.

NVIDIA Introduces NVIDIA RTX 2000 Ada Generation GPU

Press Release by

T0@st

Feb 12th, 2024 11:31 Discuss (23 Comments)

Generative AI is driving change across industries—and to take advantage of its benefits, businesses must select the right hardware to power their workflows. The new NVIDIA RTX 2000 Ada Generation GPU delivers the latest AI, graphics and compute technology to compact workstations, offering up to 1.5x the performance of the previous-generation RTX A2000 12 GB in professional workflows. From crafting stunning 3D environments to streamlining complex design reviews to refining industrial designs, the card's capabilities pave the way for an AI-accelerated future, empowering professionals to achieve more without compromising on performance or capabilities. Modern multi-application workflows, such as AI-powered tools, multi-display setups and high-resolution content, put significant demands on GPU memory. With 16 GB of memory in the RTX 2000 Ada, professionals can tap the latest technologies and tools to work faster and better with their data.

Powered by NVIDIA RTX technology, the new GPU delivers impressive realism in graphics with NVIDIA DLSS, delivering ultra-high-quality, photorealistic ray-traced images more than 3x faster than before. In addition, the RTX 2000 Ada enables an immersive experience for enterprise virtual-reality workflows, such as for product design and engineering design reviews. With its blend of performance, versatility and AI capabilities, the RTX 2000 Ada helps professionals across industries achieve efficiencies. Architects and urban planners can use it to accelerate visualization workflows and structural analysis, enhancing design precision. Product designers and engineers using industrial PCs can iterate rapidly on product designs with fast, photorealistic rendering and AI-powered generative design. Content creators can edit high-resolution videos and images seamlessly, and use AI for realistic visual effects and content creation assistance. And in vital embedded applications and edge computing, the RTX 2000 Ada can power real-time data processing for medical devices, optimize manufacturing processes with predictive maintenance and enable AI-driven intelligence in retail environments.

Read full story

Mod Unlocks FSR 3 Fluid Motion Frames on Older NVIDIA GeForce RTX 20/30 Series Cards

AleksandarK

Feb 6th, 2024 01:08 Discuss (27 Comments)

NVIDIA's latest RTX 40 series graphics cards feature impressive new technologies like DLSS 3 that can significantly enhance performance and image quality in games. However, owners of older 20 and 30 series NVIDIA GeForce RTX cards cannot officially benefit from these cutting-edge advances. DLSS 3's Frame Generation feature, in particular, requires dedicated hardware only found in NVIDIA's brand new Ada Lovelace architecture. But the ingenious modding community has stepped in with a creative workaround solution where NVIDIA has refused to enable frame generation functionality on older generation hardware. A new third-party modification can unofficially activate both upscaling (FSR, DLAA, DLSS or XeSS) and AMD Fluid Motion Frames on older NVIDIA cards equipped with Tensor Cores. Replacing two key DLL files and a small edit to the Windows registry enables the "DLSS 3" option to be activated in games running on older hardware.

In testing conducted by Digital Foundry, this modification delivered up to a 75% FPS boost - on par with the performance uplift official DLSS 3 provides on RTX 40 series cards. Games like Cyberpunk 2077, Spider-Man: Miles Morales, and A Plague Tale: Requiem were used to benchmark performance. However, there can be minor visual flaws, including incorrect UI interpolation or random frame time fluctuations. Ironically, while the FSR 3 tech itself originates from AMD, the mod currently only works on NVIDIA cards. So, while not officially supported, the resourcefulness of the modding community has remarkably managed to bring cutting-edge frame generation to more NVIDIA owners - until AMD RDNA 3 cards can utilize it as well. This shows the incredible potential of community-driven software modification and innovation.

Return to Keyword Browsing

News Posts matching #Tensor Cores

NVIDIA TensorRT Boosts Stable Diffusion 3.5 Performance on NVIDIA GeForce RTX and RTX PRO GPUs

AnythingLLM App Best Experienced on NVIDIA RTX AI PCs

Palit Showcases Pandora NXNano with 157 AI TOPS at Computex 2025

GIGAIPC Unveils Jetson Orin Series at Computex 2025

Nintendo Switch 2's Chipset Reportedly Confirmed as Tegra "T239" Unit

Official: Nintendo Switch 2 Leveled Up With NVIDIA "Custom Processor" & AI-Powered Tech

PNY Announces Support for the New NVIDIA RTX PRO Blackwell Graphics Card Family

ASUS Introduces Ascent GX10 AI Supercomputer Powered by NVIDIA GB10 Grace Blackwell Superchip

NVIDIA & Partners Will Discuss Supercharging of AI Development at GTC 2025

NVIDIA Recommends GeForce RTX 5070 Ti GPU to AI Content Creators

DeepSeek-R1 Goes Live on NVIDIA NIM

New NVIDIA Broadcast AI Features Now Streaming With GeForce RTX 50 Series GPUs

NVIDIA's Frame Generation Technology Could Come to GeForce RTX 30 Series

NVIDIA Puts Grace Blackwell on Every Desk and at Every AI Developer's Fingertips

NVIDIA Blackwell GeForce RTX 50 Series Opens New World of AI Computer Graphics

VeriSilicon Unveils Next-Gen Vitality Architecture GPU IP Series

NVIDIA Blackwell Sets New Standard for Generative AI in MLPerf Inference Benchmark

Supermicro Launches Plug-and-Play SuperCluster for NVIDIA Omniverse

New Performance Optimizations Supercharge NVIDIA RTX AI PCs for Gamers, Creators and Developers

NVIDIA Launches the RTX A400 and A1000 Professional Graphics Cards

Intel Launches Gaudi 3 AI Accelerator: 70% Faster Training, 50% Faster Inference Compared to NVIDIA H100, Promises Better Efficiency Too

AAEON BOXER-8653AI & BOXER-8623AI Expand Vertical Market Potential in a More Compact Form

NVIDIA AI GPU Customers Reportedly Selling Off Excess Hardware

NVIDIA Introduces NVIDIA RTX 2000 Ada Generation GPU

Mod Unlocks FSR 3 Fluid Motion Frames on Older NVIDIA GeForce RTX 20/30 Series Cards

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts