Wednesday, February 13th 2019
NVIDIA DLSS and its Surprising Resolution Limitations
TechPowerUp readers today were greeted to our PC port analysis of Metro Exodus, which also contained a dedicated section on NVIDIA RTX and DLSS technologies. The former brings in real-time ray tracing support to an already graphically-intensive game, and the latter attempts to assuage the performance hit via NVIDIA's new proprietary alternative to more-traditional anti-aliasing. There was definitely a bump in performance from DLSS when enabled, however we also noted some head-scratching limitations on when and how it can even be enabled, depending on the in-game resolution and RTX GPU employed. We then set about testing DLSS on Battlefield V, which was also available from today, and it was then that we noticed a trend.
Take Metro Exodus first, with the relevant notes in the first image below. DLSS can only be turned on for a specific combination of RTX GPUs ranging from the RTX 2060 to the RTX 2080 Ti, but NVIDIA appear to be limiting users to a class-based system. Users with the RTX 2060, for example, can't even use DLSS at 4K and, more egregiously, owners of the RTX 2080 and 2080 Ti can not enjoy RTX and DLSS simultaneously at the most popular in-game resolution of 1920x1080, which would be useful to reach high FPS rates on 144 Hz monitors. Battlefield V has a similar, and yet even more divided system wherein the gaming flagship RTX 2080 Ti can not be used with RTX and DLSS at even 1440p, as seen in the second image below. This brought us back to Final Fantasy XV's own DLSS implementation last year, which was all or nothing at 4K resolution only. What could have prompted NVIDIA to carry this out? We speculate further past the break.We contacted NVIDIA about this to get word straight from the green horse's mouth, hoping to be able to provide a satisfactory answer to you. Representatives for the company told us that DLSS is most effective when the GPU is at maximum work load, such that if a GPU is not being challenged enough, DLSS is not going to be made available. Accordingly, this implementation encourages users to turn on RTX first, thus increasing the GPU load, to then enable DLSS. It would thus be fair to extrapolate why the RTX 2080 Ti does not get to enjoy DLSS at lower resolutions, where perhaps it is not being taxed as hard.
We do not buy this explanation, however. Turning off VSync alone results in uncapped frame rates, which allow for a GPU load nearing 100%. NVIDIA has been championing high refresh rate displays for years now, and our own results show that we need the RTX 2080 and RTX 2080 Ti to get close to 144 FPS at 1080p, for that sweet 120+ Hz refresh rate display action. Why not let the end user decide what takes priority here, especially if DLSS aims to improve graphical fidelity as well? It was at this point where we went back to the NVIDIA whitepaper on their Turing microarchitecture, briefly discussed here for those interested.
DLSS, as it turns out, operates on a frame-by-frame basis. A Turing microarchitecture-based GPU has shader cores for gaming, tensor cores for large-scale compute/AI load, and RT cores for real-time ray tracing. As load on the GPU is applied, relevant to DLSS, this is predominantly on the tensor cores. Effectively thus, a higher FPS in a game means a higher load on the tensor cores. The different GPUs in the NVIDIA GeForce RTX family have a different number of tensor cores, and thus limit how many frames/pixels can be processed in a unit time (say, one second). This variability in the number of tensor cores is likely the major reason for said implementation of DLSS. With their approach, it appears that NVIDIA wants to make sure that the tensor cores never become the bottleneck during gaming.
Another possible reason comes via Futuremark's 3DMark Port Royal benchmark for ray tracing. It recently added support for DLSS, and is a standard bearer to how RTX and DLSS can work in conjunction to produce excellent results. Port Royal, however, is an extremely scripted benchmark using pre-determined scenes to make good use of the machine learning capabilities integrated in DLSS. Perhaps this initial round of DLSS in games is following a similar mechanism, wherein the game engine is being trained to enable DLSS on specific scenes at specific resolutions, and not in a resolution-independent way.
Regardless of what is the underlying cause, all in-game DLSS implementations so far have come with some small print attached, that sours the ultimately-free bonus of DLSS which appears to work well - when it can- providing at least an additional dial for users to play with, to fine-tune their desired balance of visual experience to game FPS.
Take Metro Exodus first, with the relevant notes in the first image below. DLSS can only be turned on for a specific combination of RTX GPUs ranging from the RTX 2060 to the RTX 2080 Ti, but NVIDIA appear to be limiting users to a class-based system. Users with the RTX 2060, for example, can't even use DLSS at 4K and, more egregiously, owners of the RTX 2080 and 2080 Ti can not enjoy RTX and DLSS simultaneously at the most popular in-game resolution of 1920x1080, which would be useful to reach high FPS rates on 144 Hz monitors. Battlefield V has a similar, and yet even more divided system wherein the gaming flagship RTX 2080 Ti can not be used with RTX and DLSS at even 1440p, as seen in the second image below. This brought us back to Final Fantasy XV's own DLSS implementation last year, which was all or nothing at 4K resolution only. What could have prompted NVIDIA to carry this out? We speculate further past the break.We contacted NVIDIA about this to get word straight from the green horse's mouth, hoping to be able to provide a satisfactory answer to you. Representatives for the company told us that DLSS is most effective when the GPU is at maximum work load, such that if a GPU is not being challenged enough, DLSS is not going to be made available. Accordingly, this implementation encourages users to turn on RTX first, thus increasing the GPU load, to then enable DLSS. It would thus be fair to extrapolate why the RTX 2080 Ti does not get to enjoy DLSS at lower resolutions, where perhaps it is not being taxed as hard.
We do not buy this explanation, however. Turning off VSync alone results in uncapped frame rates, which allow for a GPU load nearing 100%. NVIDIA has been championing high refresh rate displays for years now, and our own results show that we need the RTX 2080 and RTX 2080 Ti to get close to 144 FPS at 1080p, for that sweet 120+ Hz refresh rate display action. Why not let the end user decide what takes priority here, especially if DLSS aims to improve graphical fidelity as well? It was at this point where we went back to the NVIDIA whitepaper on their Turing microarchitecture, briefly discussed here for those interested.
DLSS, as it turns out, operates on a frame-by-frame basis. A Turing microarchitecture-based GPU has shader cores for gaming, tensor cores for large-scale compute/AI load, and RT cores for real-time ray tracing. As load on the GPU is applied, relevant to DLSS, this is predominantly on the tensor cores. Effectively thus, a higher FPS in a game means a higher load on the tensor cores. The different GPUs in the NVIDIA GeForce RTX family have a different number of tensor cores, and thus limit how many frames/pixels can be processed in a unit time (say, one second). This variability in the number of tensor cores is likely the major reason for said implementation of DLSS. With their approach, it appears that NVIDIA wants to make sure that the tensor cores never become the bottleneck during gaming.
Another possible reason comes via Futuremark's 3DMark Port Royal benchmark for ray tracing. It recently added support for DLSS, and is a standard bearer to how RTX and DLSS can work in conjunction to produce excellent results. Port Royal, however, is an extremely scripted benchmark using pre-determined scenes to make good use of the machine learning capabilities integrated in DLSS. Perhaps this initial round of DLSS in games is following a similar mechanism, wherein the game engine is being trained to enable DLSS on specific scenes at specific resolutions, and not in a resolution-independent way.
Regardless of what is the underlying cause, all in-game DLSS implementations so far have come with some small print attached, that sours the ultimately-free bonus of DLSS which appears to work well - when it can- providing at least an additional dial for users to play with, to fine-tune their desired balance of visual experience to game FPS.
102 Comments on NVIDIA DLSS and its Surprising Resolution Limitations
With this kind of logic one could think that everyone has big desktops and overclocks (and let's be honest: that is a popular belief among TPU forum members).
First thing you have to accept is that most people don't even know how many fps they're looking at. Maybe they checked it once to learn if their hardware is good enough.
By contrary, a lot of people on forums like this one constantly game with PC monitoring in the corner (not just fps, but also temperature, load and so on). Are you doing it by chance? :) It really isn't. Ray tracing is the best way computers of today can replicate how light behaves in reality (in macro scale, at least).
You have to understand what we're seeing now in BV5 is not what RTRT is meant to do.
You're seeing all these unrealistic reflections because it's the most basic result - it uses the least hardware and is easiest to implement.
It's just the first step.
Light interacts with matter. Apart from the simple reflection, ray tracing takes care of scattering, diffusion, absorption and refraction. But to make it all work, all textures and materials have to be parametrized. It's a lot of work. But that's the work we'll have to do if we want RTRT to create photorealistic results sometime in the future.
Today RTX does just a fraction of what it can do and RTX cards offer just a fraction of computing potential they'll have to offer. But this tech will constantly improve.
Because not everything you can see in a review at 200% magnification, you'll see in motion.
You don't like it? Don't use. But it is a tool to boost your FPS.
It probably takes some ms to do DLSS in that frame, which will be added to rendering time. If we take i.e. RTX 2080 ti supported resolutions and look the performance numbers from tpus tests it shows that on supported resolutions rendering time is for non-DLSS native res. 1440p RTX on ultra 1/66.9 = 14.95 ms. Now if DLSS takes some fixed amount of time from the frame that should be added to rendered frame time from the rendering resolution. Let's say DLSS takes fixed arbitrary 2ms frame time: now at RTX on ultra@1440p DLSS rendering frame time is from 1080p: 1/90.5 = 11.05ms -> 1440p DLSS is now 11.05ms+2ms = 13.05ms or 1/13.05ms = 76.6 fps. So quite hefty uplift in performance. But how about if we take non supported 1440p rtx off numbers from that test: 1440p rtx off 1/117.9 = 8.48ms and imaginary DLSS on rendering at 1080p 1/147 = 6.80ms -> 1440p DLSS 6.80ms + 2ms = 8.8ms or 1/8.8ms = 113.6 fps. Which is lower than native.
The tensor core takes over the frame rendered by the raster cores and RT cores before sending the end frame to your display, but that doesn't mean raster cores and RT cores are waiting for the tensor core to do it's job to render more frames, and that's the reason DLSS can actually reduce performance at high frame rates, and waste raster efficiency.
What the explanation means, and techpowerup got right in this article is, you don't want your raster performance to be affected by waiting for the tensor cores to apply DLSS to a frame, when you're already pushing more than 60FPS, so in a nutshell, at high frame rates, tensor cores can become a bottleneck, with no return in IQ, hence, DLSS is disabled at lower res on high end GPUs.
Good job in posting this article before we got official confirmation from Nvidia :lovetpu:
BTW: look at the comparison of the TAA and DLSS here on TPU then you will see DLSS's "Image quality" boost.
www.techpowerup.com/reviews/Performance_Analysis/Battlefield_V_DLSS/
DLSS "4K" is actually 1440p + DLSS.
1440p + TAA is likely to look worse at a slightly better performance.
4K + TAA will naturally look better but the performance difference is huge.
Comparisons were done on the early demos of FFXV and Infiltrator (Unreal Engine). They found exactly what I described above. With regards to TAA, the general impression was that despite some parts leaning here or there, 1800p + TAA is practically equal to 1440p + DLSS in both image quality and performance.
LIke I said "reducing image quality" is a viewpoint thing. It reduces image quality when compared to 4K image that the marketing bullshit implies or explicitly says. It increases the image quality when compared to 1440p image that DLSS is technically based on.
You really give me nothing bro just telling me that I'm judging and I don't understand. That's what I see and that's how I take it. Try explaining if I'm wrong or I don't understand or point something for discussion.
It could be used as an antialiasing method (referred to as DLSS 2X) but we have not seen this type of application yet.