215

NVIDIA GeForce RTX 50 Technical Deep Dive

Name: NVIDIA GeForce RTX 50 Technical Deep Dive
Brand: NVIDIA

W1zzard

on Jan 15th, 2025,

in Graphics Cards.

Manufacturer: NVIDIA

Neural Rendering »

DLSS Multi-Frame Generation and DLSS Transformers

The evolution of DLSS 4 and its impact on neural rendering is centered on three core pillars of real-time graphics: image quality, smoothness, and responsiveness. Balancing these factors often requires trade-offs. Increasing resolution or enhancing rendering quality typically reduces frame rates and interactivity.

One approach to improving interactivity is to lower the resolution, such as reducing output from 4K to 1080p, which increases responsiveness at the cost of visual quality. Another solution is adding computational power, such as employing multiple GPUs. While this approach can achieve high-quality results, it comes with significant costs and technical challenges.

DLSS 4 is designed to push the boundaries of these three pillars by using artificial intelligence to optimize rendering efficiency. Rendering workloads often exhibit redundancy, as objects in scenes typically change only incrementally between frames. AI can detect and predict these patterns, reducing the computational load and enhancing real-time graphics performance.

Since its introduction with the RTX Turing launch in 2018, DLSS has undergone continuous improvements. While early versions faced challenges, iterative advancements in algorithms and integrations have expanded its adoption. Currently, DLSS is implemented in over 540 games and applications, including 15 of the top 20 games from 2024. Usage statistics indicate that over 80% of RTX players enable DLSS during gameplay, with a cumulative total of 3 billion hours of DLSS-enabled gaming.

These advancements result from a dedicated supercomputing infrastructure that analyzes model failures, expands training datasets, and refines algorithms. This iterative process has led to DLSS 4, the most ambitious version to date, featuring a completely new neural network architecture for the first time since DLSS 2, which was introduced in 2020.

DLSS 4 introduces a transformer-based model, marking a significant shift from traditional convolutional neural networks (CNNs) that were used previously. Transformers use attention mechanisms to focus computational resources on the most relevant parts of the data, enabling better handling of complex rendering scenarios and prioritizing challenging image regions.

Transformers will be used for DLSS Ray Reconstruction, DLSS Super Resolution, and DLAA.

The scalability and efficiency of transformer models allow DLSS 4 to utilize larger datasets and retain more examples from training. These advancements in neural network architecture deliver enhanced graphics performance and visual fidelity, setting a new benchmark for real-time rendering technology.

DLSS 4 introduces a significant leap in computational power, utilizing four times the compute of previous DLSS models. This increase enables new trade-offs between smoothness, responsiveness, and image quality.

NVIDIA showed comparison videos to demonstrate the enhanced performance of DLSS 4's transformer model:

A chain-link fence occluding a house: The new model resolves the fine details more effectively than the prior generation.
Distant wires: DLSS 4 reduces flickering and instability, maintaining consistent detail. (no slide available for that).
Spinning fan blades: Improved motion quality and reduced ghosting result in sharper, more stable visuals. (no slide available for that).

Overall, these improvements contribute to better image stability, motion accuracy, and fine detail clarity.

The new transformer-based super resolution in DLSS 4 significantly enhances detail retention. Examples include higher fidelity in textures, such as a bag's intricate patterns. Smarter models improve image quality for heavily used features like ray reconstruction and super resolution.

Building on DLSS 3, DLSS 4 introduces multi-frame generation: Two rendered frames are analyzed for correlations by multiple models. This process generates three additional frames for every two rendered ones. Overall, 15 of every 16 displayed pixels are generated, leading to a 8x increase in rendering efficiency.

The new frame generation AI model is 40% faster, uses 30% less VRAM, and only needs a single run per frame to produce multiple frames. For example, in Warhammer 40,000: Darktide, it delivers a 10% improvement in frame rate while saving 400 MB of memory at 4K with max settings and DLSS Frame Generation. Additionally, the process of generating the optical flow field has been accelerated by replacing hardware optical flow with a highly efficient AI model, significantly reducing the computational cost of producing extra frames.

When generating so many frames, Frame Pacing becomes a problem—irregular frame intervals impact smoothness. DLSS 4 addresses these issues by using a dedicated hardware unit inside Blackwell, which takes care of flip metering, reducing frame display variability by 5-10x. This ensures smoother gameplay and a better multi-frame generation experience.

Using Cyberpunk as an example:

DLSS off: 27 FPS, ~70 ms latency.
DLSS Super Resolution: 70 FPS, ~35 ms latency.
DLSS 3.5 with ray reconstruction: 140 FPS, ~35 ms latency.
DLSS 4: 250 FPS, ~34 ms latency with improved image quality (e.g., clearer reflections and sharper textures).

DLSS 4 delivers up to a 8x performance boost, enabling 4K 240 Hz gaming on RTX 5090 hardware. Games like Alan Wake 2, Black Myth: Wukong, and Cyberpunk 2077 achieve exceptionally high frame rates.

At launch, DLSS 4 will be compatible with 75 games and applications, with more planned. The design ensures compatibility with DLSS 3, simplifying adoption and integration.

DLSS 4 is designed to allow seamless compatibility with games that had previously adopted earlier versions like DLSS 3 and DLSS 3.5. Notably, the NVIDIA app will provide gamers with the ability to override DLSS settings on a per-game basis. This functionality enables users to activate DLSS 4 features, even in titles that did not initially support them.

The process involves selecting the desired game within the NVIDIA app and adjusting DLSS override settings. Options include choosing earlier CNN models, which offer faster performance at the cost of slightly reduced image quality, or utilizing frame generation technologies such as 2X, 3X, or 4X configurations. The app also introduces resolution override settings, enabling gamers to toggle between ultra-performance modes or the highest image quality settings like DLAA, regardless of native UI support within the game.

NVIDIA is aware of potential anti-cheat issues and is working with publishers and carefully validating multiplayer games individually to ensure this doesn't become a problem.

The rollout of DLSS 4 includes support for various RTX GPUs. Multi-frame generation is exclusive to the RTX 50 series due to its reliance on Blackwell's flip-metering hardware unit and the improved tensor cores, which now have INT4 support. Meanwhile, enhanced DLSS frame generation benefits both 40 and 50 series users with better image quality, faster performance, and optimized memory use. Additionally, Ray Reconstruction, the transformer super-resolution model, and deep learning anti-aliasing (DLAA) are accessible to all RTX gamers, ensuring DLSS 4 has wide applicability.

NVIDIA Reflex 2

NVIDIA also emphasized its focus on latency reduction, another critical aspect of gaming performance. With over 120 games integrating NVIDIA Reflex, the company reports that nine of the top ten shooters support the technology. Reflex integration is also a key component of frame generation, as it minimizes latency in the rendering pipeline.

As part of this launch, NVIDIA is introducing Reflex 2, which builds upon the first iteration by offering 75% faster responsiveness. The new system synchronizes CPU and GPU more effectively and uses a technique called "Frame warp" to update camera positions based on real-time user inputs. To achieve that the system records the mouse cursor position and transforms the whole image right before sending it to the screen, by how much the mouse has moved since the image was rendered originally.

One technical challenge in implementing Framework involves handling disocclusions—areas of the scene that become visible as the camera angle shifts. To address this, NVIDIA employs inpainting technology, leveraging historical data and 3D environmental information to fill in visual gaps. While this technology shows promise, NVIDIA acknowledged it may not be suitable for every game and noted ongoing improvements.

From what I understand, they record one or several previous frames and use that information to fill in the gaps. Obviously this doesn't work when you're coming around a corner that you've never seen before in this game session, so it'll be interesting what happens at that point with the "missing" pixels.

Reflex 2 will debut in several games, including The Finals and Valorant. Each of these titles demonstrates the system's ability to reduce latency in both GPU-heavy and CPU-bound environments.