News Posts matching #CUDA

Return to Keyword Browsing

NVIDIA's DLSS Transformer Exits Beta, Ready for Deployment

NVIDIA's DLSS Transformer officially graduates from beta today and is rolling out in its full, stable form across supported games. Following its debut as part of the DLSS 4 update, which features multi-frame generation and significant improvements in image quality, the technology has proven its worth. By replacing traditional convolutional neural networks with Transformer models, NVIDIA has doubled the model's parameters, boosted compute throughput fourfold, and delivered a 30-50% uplift in ray-traced effects quality, according to internal benchmarks. Even more impressive, each AI-enhanced frame now processes in just 1 ms on an RTX 5090, compared with the 3.25 ms required by DLSS 3.

Under the hood, DLSS 4 can tap into Blackwell-exclusive hardware, from FP8 tensor cores to fused CUDA kernels, and leans on vertical layer fusion and memory optimizations to keep overhead in check even with models twice the size of their CNN predecessors. To fine-tune performance and eliminate glitches such as ghosting, flicker, or blurring, NVIDIA has quietly run this network through a dedicated supercomputer for the past six years, continuously iterating on its findings. The result is a real-time AI upscaling solution that pairs a higher-performance AI architecture with rigorously validated quality.

Humanoid Robots to Assemble NVIDIA's GB300 NVL72 "Blackwell Ultra"

NVIDIA's upcoming GB300 NVL72 "Blackwell Ultra" rack-scale systems are reportedly going to get a humanoid robot assembly, according to sources close to Reuters. As readers are aware, most of the traditional manufacturing processes in silicon manufacturing, PCB manufacturing, and server manufacturing are automated, requiring little to no human intervention. However, rack-scale systems required humans for final assembly up until now. It appears that Foxconn and NVIDIA have made plans to open up the first AI-powered humanoid robot assembly plant in Houston, Texas. The central plan is that, in the coming months as the plant is completed, humanoid robots will take over the final assembly process entirely removing humans from the manufacturing loop.

And this is not a bad thing. Since server assembly typically requires lifting heavy server racks throughout the day, the humanoid robot system will aid humans by doing the hard work, thereby saving workers from excessive labor. Initially, humans will oversee these robots in their operations, with fully autonomous factories expected later on. The human element here will primarily involve inspecting the work. NVIDIA has been laying the groundwork for humanoid robots for some time, as the company has developed NVIDIA Isaac, a comprehensive CUDA-accelerated platform designed for humanoid robots. As models from Agility Robotics, Boston Dynamics, Fourier, Foxlink, Galbot, Mentee Robotics, NEURA Robotics, General Robotics, Skild AI, and XPENG require models that are aware of their surroundings, NVIDIA created Isaac GR00T N1, the world's first open humanoid robot foundation model, available for anyone to use and finetune.

NVIDIA Prepares Cut-Down GeForce RTX "5090DD" for China

When we heard the news that NVIDIA will be halting shipments to Chinese customers for its GeForce RTX 5090D GPU in Q2, there were some speculations that the company is preparing to make a comeback. Leaving out the Chinese market is a tough decision from a business standpoint, but NVIDIA complies with export regulations imposed by the US administration under Donald Trump. However, the company now plans to keep its presence in China with the release of the export-abiding GeForce "RTX 5090DD" GPU. Carrying a GB202-240 die, it will be a cut-down version with reportedly only 14,080 CUDA cores present, paired with 24 GB of GDDR7 memory.

Chinese gamers will notice a significant difference, but so will anyone trying to acquire these GPUs for non-gaming purposes. As the US administration has banned shipments of high-end chips to China to stop them from training military-grade AI, gaming GPUs have been a target of AI firms. Now, a modified GeForce RTX 5090 DD, which not only cuts with raw compute but also firmware, should be sufficient for gaming. Carrying a PG145 SKU 40 board, the new RTX 5090DD GPU will also feature a modified 384-bit bus for its 24 GB of memory, down from the original RTX 5090D's 512-bit bus.

Update 09:45 UTC: According to one of the most accurate NVIDIA leakers, kopite7kimi, the core count for this GPU is supposed to be 21,760 CUDA cores, with 575 W TDP. The leaker did note that there is some surprise, so NVIDIA has definitely changed something big under the hood to comply with export regulations.

NVIDIA Blackwell Delivers Breakthrough Performance in Latest MLPerf Training Results

NVIDIA is working with companies worldwide to build out AI factories—speeding the training and deployment of next-generation AI applications that use the latest advancements in training and inference. The NVIDIA Blackwell architecture is built to meet the heightened performance requirements of these new applications. In the latest round of MLPerf Training—the 12th since the benchmark's introduction in 2018—the NVIDIA AI platform delivered the highest performance at scale on every benchmark and powered every result submitted on the benchmark's toughest large language model (LLM)-focused test: Llama 3.1 405B pretraining.

The NVIDIA platform was the only one that submitted results on every MLPerf Training v5.0 benchmark—underscoring its exceptional performance and versatility across a wide array of AI workloads, spanning LLMs, recommendation systems, multimodal LLMs, object detection and graph neural networks. The at-scale submissions used two AI supercomputers powered by the NVIDIA Blackwell platform: Tyche, built using NVIDIA GB200 NVL72 rack-scale systems, and Nyx, based on NVIDIA DGX B200 systems. In addition, NVIDIA collaborated with CoreWeave and IBM to submit GB200 NVL72 results using a total of 2,496 Blackwell GPUs and 1,248 NVIDIA Grace CPUs.

NVIDIA GeForce RTX 5050 Reportedly Scheduled for July Release

NVIDIA is preparing some of the final SKUs for its GeForce RTX 50 series "Blackwell" graphics cards, with the last entry being the least powerful entry-level GeForce RTX 5050 GPU. The RTX 5050 is based on GB207 SKU with 2,560 CUDA cores. Running on a 128-bit but, it carries 8 GB of GDDR6 memory, with for now unknown memory bandwidth. It carries a 130 W TDP, meaning that some improvements have been made from the previous generation RTX 4050 desktop GPU. For comparison, the last-generation RTX 4050 also had 2,560 CUDA cores, but had 6 GB of memory and 100 W TDP. Given 30% higher TDP and higher memory capacity, the Blackwell revision should give decent performance bump even with the similar CUDA core configuration. As the launch is rumored for July, we are standing by for more information about performance and price targets NVIDIA envisions.

NVIDIA and Microsoft Advance Development on RTX AI PCs

Generative AI is transforming PC software into breakthrough experiences - from digital humans to writing assistants, intelligent agents and creative tools. NVIDIA RTX AI PCs are powering this transformation with technology that makes it simpler to get started experimenting with generative AI and unlock greater performance on Windows 11. NVIDIA TensorRT has been reimagined for RTX AI PCs, combining industry-leading TensorRT performance with just-in-time, on-device engine building and an 8x smaller package size for seamless AI deployment to more than 100 million RTX AI PCs.

Announced at Microsoft Build, TensorRT for RTX is natively supported by Windows ML - a new inference stack that provides app developers with both broad hardware compatibility and state-of-the-art performance. For developers looking for AI features ready to integrate, NVIDIA software development kits (SDKs) offer a wide array of options, from NVIDIA DLSS to multimedia enhancements like NVIDIA RTX Video. This month, top software applications from Autodesk, Bilibili, Chaos, LM Studio and Topaz Labs are releasing updates to unlock RTX AI features and acceleration.

GIGAIPC Unveils Jetson Orin Series at Computex 2025

As COMPUTEX 2025, one of the most anticipated global tech events, prepares to open its doors in Taipei, AI applications are set to reach new heights. GIGAIPC, the industrial computing and edge AI subsidiary of GIGABYTE Technology, will unveil three innovative AI edge computing solutions at COMPUTEX 2025, showcasing its expertise in industrial-grade system design. Powered by NVIDIA Jetson Orin modules, the flagship QN-ORAX32-A1, QN-ORNX16GH-A1, and QN-ORNX16 deliver exceptional AI performance for smart manufacturing, intelligent surveillance, smart healthcare, smart retail, and AIoT applications, setting new benchmarks for edge computing.

The QN-ORAX32-A1, built on the NVIDIA Jetson AGX Orin 32 GB module, features an 8-core Arm v8.2 64-bit CPU and an NVIDIA Ampere GPU with 1792 CUDA cores and 56 Tensor cores, delivering up to 200 TOPS of AI performance—six times faster than its predecessor. This enhanced computing power makes it well-suited for high-load data processing and complex AI models. Beyond its computing performance, the QN-ORAX32-A1 is also built for real-world deployment. To ensure long-term durability in harsh industrial environments, the system incorporates a fanless thermal design and wide-range DC power input.

Autonomous Inc. Introduces Brainy: The Petaflop AI Workstation

Autonomous Inc., a company dedicated to design and engineer the future of work into the hands of innovators, today announced Brainy, a revolutionary workstation designed to accelerate deep learning and machine learning workflows. Brainy delivers unprecedented AI performance directly to the desktop, empowering researchers, developers, and AI startups to push the boundaries of artificial intelligence.

"Brainy is more than just a machine; it's a partner in innovation," stated Brody Slade, Autonomous' Product Manager. "We're putting petaflop-level AI power within reach, eliminating the bottlenecks and costs associated with cloud-based solutions and truly changing the way AI development is done. It empowers you to not just think—but to think with your machine."

"Full Die Shot" Analysis of Nintendo Switch 2 SoC Indicates Samsung 8 nm Production Origins

Late last month, Kurnal (@Kurnalsalts) shared a partial die shot of a supposed Nintendo Switch 2 chipset—this teaser image seemed to verify previous leaked claims about the forthcoming next-gen hybrid gaming console being powered by a custom NVIDIA "T239" SoC design. Two weeks after the fact, Kurnal has boasted about delivering an alleged "world's first Nintendo Switch 2 die shot." Their social media post included a couple of key specification data points: "Samsung 8N (8 nm), eight Cortex-A78C cores, (shared) 4 MB L2 cache, and 1536 CUDA/6TPC 'Ampere' GPU." Another leaker—Geekerwan—said that they acquired a "Switch 2 motherboard" via Xianyu. This Chinese equivalent to eBay seems to be a veritable treasure trove of tech curiosities.

Earlier on in 2025, black market sellers were attempting to offload complete pre-launch Switch 2 packages for big money. As reported by VideoCardz, recent acquisitions only involved the securing of non-functional motherboard + SoC units—Kurnal disclosed a 1000 RMB (~$138 USD) price point. Digital Foundry's Richard Leadbetter is a very visible advocate of the Switch 2 chipset being based on a mature 8 nm Samsung node process. His personal belief was aimed at certain critics; these opposers predicted 5 nm manufacturing origins. Older leaks suggested a larger than expected die footprint—relative to Switch 1's internal setup; almost twice the size—leading to Leadbetter's conclusion. Comparison charts—produced by Kurnal and Geekerwan—propose an occupied area of 207 mm².

NVIDIA RTX PRO 6000 Appears in Early Listings: $11,000 in Japan, €9,000 in Europe

Despite featuring the biggest GB202 configuration—24,064 CUDA cores distributed across 188 streaming multiprocessors running at up to 2,617 MHz, paired with 96 GB of GDDR7 ECC memory—the RTX PRO 6000 'Blackwell' GPU from NVIDIA is yet to have an official launch date or pricing disclosed. Early European retailer listings show the card starting at €8,982, including 21 percent VAT. Some vendors are already asking for more than €10,900. However, business customers evaluating net costs can anticipate a significant saving, with a rough estimate of €7,430 before tax, subject to local tax regulations and import fees. NVIDIA is expected to offer the RTX PRO 6000 in several variants, including Workstation, Server, and Max‑Q editions that tailor power envelopes and cooling designs to different professional environments.

In Japan, pre‑release listings place the RTX PRO 6000 at ¥1,630,600 (around $11,326), reflecting a similar premium level. The appearance of these price tags suggests that initial shipments have quietly reached distributors well before any formal announcement. One Redditor even got his hands on it early, preparing for a trial run. Of course, before NVIDIA releases RTX PRO-optimized drivers, the performance will lag behind the gaming GeForce RTX 5090 SKU. Geared toward enterprise workstations and professional workloads that demand high memory capacity and massive compute performance, this pricing profile distinguishes the RTX PRO 6000 from gaming-grade SKUs. Still, it is below the server-grade GB200-based Blackwell GPUs aimed at AI and HPC workloads.

NVIDIA RTX PRO 6000 GDDR7 Memory Comes in 3 GB Modules, Sandwiching the PCB on Both Sides

NVIDIA has significantly advanced professional graphics by rebranding its workstation lineup as "RTX PRO" and incorporating an amazing 96 GB of GDDR7 memory capacity into a single RTX PRO 6000 card. This marks the first time 3 GB GDDR7 modules have been employed in a workstation GPU, each supporting error-correcting code for enhanced reliability. By arranging 16 such modules on each side of the PCB, NVIDIA achieves the remarkable 96 GB capacity while maintaining a TDP limit of 300 W for its Max-Q variant (pictured below) and up to 600 W for other SKUs. A recent leak on the Chiphell forum provides a clear insight into the new PCB layout. The customary 12 V-6×2 power connector has been omitted and replaced by four solder points intended for a cable extension.

This design choice suggests preparation for both Server and Max-Q editions, where power inputs are relocated to the rear of the card. Despite the simplified power interface and reduced footprint, the Max-Q model retains the full GB202 Blackwell GPU and the complete memory capacity. At the top of the series, the RTX PRO 6000 Blackwell will be offered in three distinct configurations. The Workstation and Server editions feature 24,064 CUDA cores, 96 GB of GDDR7 ECC memory, and a 600 W power budget, ensuring consistent performance in desktop towers and rack-mounted systems. The Max-Q edition employs the identical GPU and memory configuration but limits power consumption to 300 W through lower clocks and power limits, making it particularly well suited for compact chassis and noise-sensitive environments.

NVIDIA Sends MSRP Numbers to Partners: GeForce RTX 5060 Ti 8 GB at $379, RTX 5060 Ti 16 GB at $429

Next week's planned launch of NVIDIA's GeForce RTX 5060 Ti series brings two SKUs differentiated by memory capacity and pricing. Both models leverage the GB206‑300 GPU, made on a 5 nm node from TSMC, and feature a 128‑bit memory interface paired with GDDR7 chips running at an effective 28 Gbps. According to IT Home, NVIDIA has communicated MSRP figures to its key AIC partners ahead of the mid‑April rollout. The entry‑level 8 GB variant is set at an MSRP of ¥3,199 (roughly $379), while the 16 GB version carries an MSRP of ¥3,599 (about $429). This is a reduction from the $399 and $499 prices anticipated earlier. NVIDIA is adjusting its pricing strategy for these mid-tier chips to align itself against the competition better and draw more new buyers.

Under the hood, the GB206‑300 core activates 36 streaming multiprocessors, delivering a total of 4,608 CUDA cores. The GPU operates at a base clock of 2,407 MHz, boosting to 2,572 MHz under load. Memory runs at 1,750 MHz (28 Gbps effective), routed through the 128‑bit bus to yield up to 448 GB/s of bandwidth. Graphics‑specific throughput is augmented by 144 texture mapping units and 48 render output units, while 36 dedicated ray‑tracing cores handle real‑time lighting calculations. Additionally, 144 tensor cores accelerate AI‑driven workloads such as DLSS upscaling and machine‑learning inference. Power delivery for both cards is managed via a single 16‑pin connector, with a total board power of 180 W. Display connectivity includes one HDMI 2.1b port alongside three DisplayPort 2.1b outputs, and the card interfaces with host systems over PCI Express 5.0 x16.

NVIDIA PhysX and Flow Made Fully Open-Source

NVIDIA late last week committed NVIDIA PhysX SDK and NVIDIA Flow as open-source software under the BSD-3 license. This includes the GPU source code—the specific way PhysX leverages CUDA and GPU compute acceleration, and should make it easier for game developers to understand and implement PhysX, including its various interactive 3D effects such as rigid body dynamics, fluid simulation, and deformable objects. More importantly, a deeper understanding of PhysX makes it possible for modders to develop fallbacks for their older 32-bit game titles that use PhysX to work with newer generations of GPUs, such as the RTX 50-series "Blackwell." It should come especially handy when NVIDIA is trying to push Remix—its first-party initiative to refurbish older games with modern graphics and higher resolution visual assets.

Quantum Machines Anticipates Collaborative Breakthroughs at NVIDIA's New Research Center

Quantum Machines (QM), a leading provider of advanced quantum control solutions, today announced its intention to work with NVIDIA at its newly established NVIDIA Accelerated Quantum Research Center (NVAQC), unveiled at the GTC global AI conference. The Boston-based center aims to advance quantum computing research with accelerated computing, including integrating quantum processors with AI- supercomputing to overcome significant challenges in the quantum computing space. As quantum computing rapidly evolves, the integration of quantum processors with powerful AI supercomputers becomes increasingly essential. These accelerated quantum supercomputers are pivotal for advancing quantum error correction, device control, and algorithm development.

Quantum Machines joins other quantum computing pioneers, including Quantinuum and QuEra, along with academic partners from Harvard and MIT, in working with NVIDIA at the NVAQC to develop pioneering research. Quantum Machines will work with NVIDIA to integrate its NVIDIA GB200 Grace Blackwell Superchips with QM's advanced quantum control technologies, including the OPX1000. This integration will facilitate rapid, high-bandwidth communication between quantum processors and classical supercomputers. QM and NVIDIA thereby lay the essential foundations for quantum error correction and robust quantum algorithm execution. By reducing latency and enhancing processing efficiency, QM and NVIDIA solutions will significantly accelerate practical applications of quantum computing.

NVIDIA NIM Microservices Now Available to Streamline Agentic Workflows on RTX AI PCs and Workstations

Generative AI is unlocking new capabilities for PCs and workstations, including game assistants, enhanced content-creation and productivity tools and more. NVIDIA NIM microservices, available now, and AI Blueprints, in the coming weeks, accelerate AI development and improve its accessibility. Announced at the CES trade show in January, NVIDIA NIM provides prepackaged, state-of-the-art AI models optimized for the NVIDIA RTX platform, including the NVIDIA GeForce RTX 50 Series and, now, the new NVIDIA Blackwell RTX PRO GPUs. The microservices are easy to download and run. They span the top modalities for PC development and are compatible with top ecosystem applications and tools.

The experimental System Assistant feature of Project G-Assist was also released today. Project G-Assist showcases how AI assistants can enhance apps and games. The System Assistant allows users to run real-time diagnostics, get recommendations on performance optimizations, or control system software and peripherals - all via simple voice or text commands. Developers and enthusiasts can extend its capabilities with a simple plug-in architecture and new plug-in builder.

NVIDIA to Build Accelerated Quantum Computing Research Center

NVIDIA today announced it is building a Boston-based research center to provide cutting-edge technologies to advance quantum computing. The NVIDIA Accelerated Quantum Research Center, or NVAQC, will integrate leading quantum hardware with AI supercomputers, enabling what is known as accelerated quantum supercomputing. The NVAQC will help solve quantum computing's most challenging problems, ranging from qubit noise to transforming experimental quantum processors into practical devices.

Leading quantum computing innovators, including Quantinuum, Quantum Machines and QuEra Computing, will tap into the NVAQC to drive advancements through collaborations with researchers from leading universities, such as the Harvard Quantum Initiative in Science and Engineering (HQI) and the Engineering Quantum Systems (EQuS) group at the Massachusetts Institute of Technology (MIT).

NVIDIA Accelerates Science and Engineering With CUDA-X Libraries Powered by GH200 and GB200 Superchips

Scientists and engineers of all kinds are equipped to solve tough problems a lot faster with NVIDIA CUDA-X libraries powered by NVIDIA GB200 and GH200 superchips. Announced today at the NVIDIA GTC global AI conference, developers can now take advantage of tighter automatic integration and coordination between CPU and GPU resources - enabled by CUDA-X working with these latest superchip architectures - resulting in up to 11x speedups for computational engineering tools and 5x larger calculations compared with using traditional accelerated computing architectures.

This greatly accelerates and improves workflows in engineering simulation, design optimization and more, helping scientists and researchers reach groundbreaking results faster. NVIDIA released CUDA in 2006, opening up a world of applications to the power of accelerated computing. Since then, NVIDIA has built more than 900 domain-specific NVIDIA CUDA-X libraries and AI models, making it easier to adopt accelerated computing and driving incredible scientific breakthroughs. Now, CUDA-X brings accelerated computing to a broad new set of engineering disciplines, including astronomy, particle physics, quantum physics, automotive, aerospace and semiconductor design.

BOXX Workstations Upgraded With New NVIDIA RTX PRO 6000 Blackwell Workstation Edition GPUs 

BOXX Technologies, the leading innovator of high-performance computers, rendering systems, and servers, announced that as a supplier of NVIDIA-Certified Systems, BOXX workstations will feature the new NVIDIA RTX PRO 6000 Blackwell Workstation Edition and NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition GPUs. Designed for creative professionals, these NVIDIA Blackwell architecture GPUs combine breakthrough AI inference, ray tracing, and neural rendering technology with major performance and memory improvements to drive demanding creative, design, and engineering workflows. BOXX will be among the first computer hardware manufacturers offering the new GPUs inside multiple workstation form factors.

"From our desk side APEXX workstations to our FLEXX and RAXX data center platforms, BOXX is taking our record-setting performance to new heights with NVIDIA RTX PRO 6000 Blackwell Workstation Edition GPUs," said BOXX CEO Kirk Schell. "Our systems equipped with these groundbreaking GPUs are purpose-built for creative professionals who demand the best, so whether its architects, engineers, and content creators, or data scientists and large scale enterprise deployments, BOXX accelerates mission critical work while maintaining unparalleled performance, reliability, and support."

NVIDIA Details DLSS 4 Design: A Complete AI-Driven Rendering Technology

NVIDIA has published a research paper on DLSS version 4, its AI rendering technology for real-time graphics performance. The system integrates advancements in frame generation, ray reconstruction, and latency reduction. The flagship Multi-Frame Generation feature generates three additional frames for every native frame. The DLSS 4 later on brings the best looking frames to the user quickly to make is seem like a real rendering. At the core of DLSS 4 is a shift from convolutional neural networks to transformer models. These new AI architectures excel at capturing spatial-temporal dependencies, improving ray-traced affect quality by 30-50% according to NVIDIA's benchmarks. The technology processes each AI-generated frame in just 1 ms on RTX 5090 GPUs—significantly faster than the 3.25 ms required by DLSS 3. For competitive gaming, the new Reflex Frame Warp feature reduces input latency by up to 75%, achieving 14 ms in THE FINALS and under 3 ms in VALORANT, according to NVIDIA's own benchmarks.

DLSS 4's implementation leverages Blackwell-specific architecture capabilities, including FP8 tensor cores and fused CUDA kernels. The optimized pipeline incorporates vertical layer fusion and memory optimizations that keep computational overhead manageable despite using transformer models, which are twice as large as previous CNN implementations. This efficiency enables real-time performance even with the substantially more complex AI processing. The unified AI pipeline reduces manual tuning requirements for ray-traced effects, allowing studios to implement advanced path tracing across diverse hardware configurations. The design also addresses gaming challenges like interpolating fast-moving UI elements and particle effects and reducing artifacts in high-motion scenes. NVIDIA's hardware flip metering and Blackwell-induced display engine integration ensure precise frame pacing of newly generated frames for smooth, high-refresh-rate gaming, with accurate imagery.

NVIDIA Reportedly Prepares GeForce RTX 5060 and RTX 5060 Ti Unveil Tomorrow

NVIDIA is set to unveil its RTX 5060 series graphics cards tomorrow, according to VideoCardz information, which claims NVIDIA shared launch info with some media outlets today. The announcement will include two desktop models: the RTX 5060 and RTX 5060 Ti, confirming leaks from industry sources last week. The upcoming lineup will feature three variants: RTX 5060 Ti 16 GB, RTX 5060 Ti 8 GB, and RTX 5060. All three cards will utilize identical board designs and the same GPU, allowing manufacturers to produce visually similar Ti and non-Ti models. Power requirements are expected to range from 150-180 W. NVIDIA's RTX 5060 Ti will ship with 4608 CUDA cores, representing a modest 6% increase over the previous generation RTX 4060 Ti. The most significant improvement comes from the implementation of GDDR7 memory technology, which could deliver over 50% higher bandwidth than its predecessor if NVIDIA maintains the expected 28 Gbps memory speed across all variants.

The standard RTX 5060 will feature 3840 CUDA cores paired with 8 GB of GDDR7 memory. This configuration delivers 25% more GPU cores than its predecessor and marks an upgrade in GPU tier from AD107 (XX7) to GB206 (XX6). The smaller GB207 GPU is reportedly reserved for the upcoming RTX 5050. VideoCardz's sources indicate the RTX 5060 series will hit the market in April. Tomorrow's announcement is strategically timed as an update for the Game Developers Conference (GDC), which begins next week. All models in the series will maintain the 128-bit memory bus of their predecessors while delivering significantly improved memory bandwidth—448 GB/s compared to the previous generation's 288 GB/s for the Ti model and 272 GB/s for the standard variant. The improved bandwidth stems from the introduction of GDDR7 memory.

NVIDIA GeForce RTX 50 Series Faces Compute Performance Issues Due to Dropped 32-bit Support

PassMark Software has identified the root cause behind unexpectedly low compute performance in NVIDIA's new GeForce RTX 5090, RTX 5080, and RTX 5070 Ti GPUs. The culprit: NVIDIA has silently discontinued support for 32-bit OpenCL and CUDA in its "Blackwell" architecture, causing compatibility issues with existing benchmarking tools and applications. The issue manifested when PassMark's DirectCompute benchmark returned the error code "CL_OUT_OF_RESOURCES (-5)" on RTX 5000 series cards. After investigation, developers confirmed that while the benchmark's primary application has been 64-bit for years, several compute sub-benchmarks still utilize 32-bit code that previously functioned correctly on RTX 4000 and earlier GPUs. This architectural change wasn't clearly documented by NVIDIA, whose developer website continues to display 32-bit code samples and documentation despite the removal of actual support.

The impact extends beyond benchmarking software. Applications built on legacy CUDA infrastructure, including technologies like PhysX, will experience significant performance degradation as computational tasks fall back to CPU processing rather than utilizing the GPU's parallel architecture. While this fallback mechanism allows older applications to run on the RTX 40 series and prior hardware, the RTX 5000 series handles these tasks exclusively through the CPU, resulting in substantially lower performance. PassMark is currently working to port the affected OpenCL code to 64-bit, allowing proper testing of the new GPUs' compute capabilities. However, they warn that many existing applications containing 32-bit OpenCL components may never function properly on RTX 5000 series cards without source code modifications. The benchmark developer also notes this change doesn't fully explain poor DirectX9 performance, suggesting additional architectural changes may affect legacy rendering pathways. PassMark updated its software today, but legacy benchmarks could still suffer. Below is an older benchmark run without the latest PassMark V11.1 build 1004 patches, showing just how much the newest generations suffers without a proper software support.

NVIDIA's 32-Bit PhysX Waves Goodbye with GeForce RTX 50 Series Ending 32-Bit CUDA Software Support

The days of 32-bit software support in NVIDIA's drivers are coming to an end, and with that, so does the support for the once iconic PhysX real-time physics engine. According to NVIDIA's engineers on GeForce forums, the lack of PhysX support has been quietly acknowledged, as NVIDIA's latest GeForce RTX 50 series of GPUs are phasing out support for 32-bit CUDA software, slowly transitioning the gaming world to the 64-bit software entirely. While older NVIDIA GPUs from the Maxwell through Ada generations will maintain 32-bit CUDA support, this update breaks backward compatibility for physics acceleration in legacy PC games on new GPUs. Users running these titles on RTX 50 series cards may need to rely on CPU-based PhysX processing, which could result in suboptimal performance compared to previous GPU generations.

A Reddit user reported frame rates dropping below 60 FPS in Borderlands 2 while using basic game mechanics with a 9800X3D CPU and RTX 5090 GPU, all because 32-bit CUDA application support on Blackwell architecture is depreciated. When another user booted up a 64-bit PhysX application, Batman Arkham Knight, PhysX worked perfectly, as expected. It is just that a massive list of older games, which gamers would sometimes prefer to play, is now running a lot slower on the most powerful consumer GPU due to the phase-out of 32-bit CUDA app support.

NVIDIA GeForce RTX 5070 Ti Allegedly Scores 16.6% Improvement Over RTX 4070 Ti SUPER in Synthetic Benchmarks

Thanks to some early 3D Mark benchmarks obtained by VideoCardz, NVIDIA's upcoming GeForce RTX 5070 Ti GPU paints an interesting picture of performance gains over the predecessor. Testing conducted with AMD's Ryzen 7 9800X3D processor and 48 GB of DDR5-6000 memory has provided the first glimpse into the card's capabilities. The new GPU demonstrates a 16.6% performance improvement over its predecessor, the RTX 4070 Ti SUPER. However, benchmark data shows it is falling short of the more expensive RTX 5080 by 13.2%, raising questions about the price-to-performance ratio given the $250 price difference between the two cards. Priced at $749 MSRP, the RTX 5070 Ti could be even pricier in retail channels at launch, especially with limited availability. The card's positioning becomes particularly interesting compared to the RTX 5080's $999 price point, which commands a 33% premium for its additional performance capabilities.

As a reminder, the RTX 5070 Ti boasts 8,960 CUDA cores, 280 texture units, 70 RT cores for ray tracing, and 280 tensor cores for AI computations, all supported by 16 GB of GDDR7 memory running at 28 Gbps effective speed across a 256-bit bus interface, resulting in an 896 GB/s bandwidth. We have to wait for proper reviews for the final performance conclusion, as synthetic benchmarks tell only part of the story. Modern gaming demands consideration of advanced features such as ray tracing and upscaling technologies, which can significantly impact real-world performance. The true test will come from comprehensive gaming benchmarks tested over various cases. The gaming community won't have to wait long for detailed analysis, as official reviews will be reportedly released in just a few days. Additional evaluations of non-MSRP versions should follow on February 20, the card's launch date.

NVIDIA GeForce RTX 5070 Ti Edges Out RTX 4080 in OpenCL Benchmark

A recently surfaced Geekbench OpenCL listing has revealed the performance improvements that the GeForce RTX 5070 Ti is likely to bring to the table, and the numbers sure look promising - that is, coming from the disappointment of the GeForce RTX 5080, which manages roughly 260,000 points in the benchmark, portraying a paltry 8% improvement over its predecessor. The GeForce RTX 5070 Ti, however, managed an impressive 248,000 points, putting it a substantial 20% ahead of the GeForce RTX 4070 Ti. Hilariously enough, the RTX 5080 is merely 4% ahead, making the situation even worse for the somewhat contentious GPU. NVIDIA has claimed similar performance improvements in its marketing material, which does seem quiet plausible.

Of course, an OpenCL benchmark is hardly representative of real-world gaming performance. That being said, there is no denying that raw benchmarks will certainly help buyers temper expectations and make decisions. Previous leaks and speculations have hinted at a roughly 10% improvement over its predecessor in raster performance and up to 15% improvements in ray tracing performance, although the OpenCL listing does indicate the RTX 5070 ti might be capable of a larger generational jump, neck-and-neck with NVIDIA's claims. For those in need of a refresher, the RTX 5070 Ti boasts 8960 CUDA cores paired with 16 GB of GDDR7 memory on a 256-bit bus. Like its siblings, the RTX 5070 is also rumored to face "extremely limited" supply at launch. With its official launch less than a week away, we won't have much waiting to do to find out for ourselves.

NVIDIA RTX 5080 Laptop Defeats Predecessor By 19% in Time Spy Benchmark

The NVIDIA RTX 50-series witnessed quite a contentious launch, to say the least. Hindered by abysmal availability, controversial generational improvement, and whacky marketing tactics by Team Green, it would be safe to say a lot of passionate gamers were left utterly disappointed. That said, while the desktop cards have been the talk of the town as of late, the RTX 50 Laptop counterparts are yet to make headlines. Occasional leaks do appear on the interwebs, the latest one of which seems to indicate the 3D Mark Time Spy performance for the RTX 5080 Laptop GPU. And the results are - well, debatable.

We do know that the RTX 5080 Laptop GPU will feature 7680 CUDA cores, a shockingly modest increase over its predecessor. Considering that we did not get a node shrink this time around, the architectural improvements appear to be rather minimal, going by the tests conducted so far. Of course, the biggest boost in performance will likely be afforded by GDDR7 memory, utilizing a 256-bit bus, compared its predecessor's GDDR6 memory on a 192-bit bus. In 3D Mark's Time Spy DX12 test, which is somewhat of an outdated benchmark, the RTX 5080 Laptop managed roughly around 21,900 points. The RTX 4080 Laptop, on an average, rakes in around 18,200 points, putting the RTX 5080 Laptop ahead by almost 19%. The RTX 4090 Laptop is also left behind, by around 5%.
Return to Keyword Browsing
Jul 6th, 2025 01:52 CDT change timezone

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts