Monday, December 3rd 2018

DICE Prepares "Battlefield V" RTX/DXR Performance Patch: Up to 50% FPS Gains

EA-DICE and NVIDIA earned a lot of bad press last month, when performance numbers for "Battlefield V" with DirectX Raytracing (DXR) were finally out. Gamers were disappointed to see that DXR inflicts heavy performance penalties, with 4K UHD gameplay becoming out of bounds even for the $1,200 GeForce RTX 2080 Ti, and acceptable frame-rates only available on 1080p resolution. DICE has since been tirelessly working to rework its real-time raytracing implementation so performance is improved. Tomorrow (4th December), the studio will release a patch to "Battlefield V," a day ahead of its new Tides of War: Overture and new War Story slated for December 5th. This patch could be a game-changer for GeForce RTX users.

NVIDIA has been closely working with EA-DICE on this new patch, which NVIDIA claims improves the game's frame-rates with DXR enabled by "up to 50 percent." The patch enables RTX 2080 Ti users to smoothly play "Battlefield V" with DXR at 1440p resolution, with frame-rates over 60 fps, and DXR Reflections set to "Ultra." RTX 2080 (non-Ti) users should be able to play the game at 1440p with over 60 fps, if the DXR Reflections toggle is set at "Medium." RTX 2070 users can play the game at 1080p, with over 60 fps, and the toggle set to "Medium." NVIDIA states that it is continuing to work with DICE to improve DXR performance even further, which will take the shape of future game patches and driver updates.
A video presentation by NVIDIA follows.

Add your own comment

66 Comments on DICE Prepares "Battlefield V" RTX/DXR Performance Patch: Up to 50% FPS Gains

#51
londiste
EarthDogIt looks like to 2080 Ti in the first scene below looks almost identical, but when you go to 2080 medium (Vya's pic), there is a difference. If you look at the same scene with the 2070 on low, they look nearly identical as well. Not sure what is up with medium in this scene... its def more blurry.
Applying Medium settings is buggy in the old patch. It could simply be part of that.
DICE guy said they made no changes to the quality levels.

From the video, the bugs seem to have been the most awful bits. The destruction and foliage especially, the few fps counters they showed in the video were showing over 50% and around 300% (!) improvements in these situations.
Variable rate ray-tracing sounds like an obvious thing to have and I am surprised they did not do that before. Or maybe they did but now just optimized it a lot.
Would be interesting to know if filtering improvements mean they scrapped their own custom filter and are running's Nvidia's one on Tensor cores now.
SteevoThat would be great, but let's compare it to the 1080Ti, and what it could have been shrunk, and with higher clockspeed and the same architecture improvement.
Offtopic, but 12nm is not a shrink compared to 14/16nm. There really does not seem to be any higher clock speeds, maybe 100Mhz on average, if that.
Volta SM is 40%-ish larger than Pascal's and Tensor cores are maybe half of that or less. RT cores add 15% to Pascal's SM. Which part of it would you consider the same architecture improvement there?
Posted on Reply
#52
Robcostyle
100% gain for such a small period of time - can’t be only optimization. Quality downgrade is present aswell.
birdieNVIDIA haters:

NVIDIA sucks.
NVIDIA is ripping everyone off.
NVIDIA is a monopoly.
DXR sucks.
DXR performance sucks.
RTX buyers are alpha testers for DXR/Vulkan RT.


Except, DXR is brand new; currently used in just one game; it's the first implementation ever; we're already seeing impressive gains with just one patch and devs have promised that performance is gonna get even better.

And ... programmable shaders, which are used in close to 100% of modern games, were also first introduced by NVIDIA in 2000 and they have a similar history.
Yeah, yeah, just keep your shirt on - I know you and couple of ur crazy fanboys-friends bought one, and now you get triggered wehn someone starts critisism against rtx card (mostly about the price though).
Posted on Reply
#53
ShockG
londisteOfftopic, but 12nm is not a shrink compared to 14/16nm. There really does not seem to be any higher clock speeds, maybe 100Mhz on average, if that.
Volta SM is 40%-ish larger than Pascal's and Tensor cores are maybe half of that or less. RT cores add 15% to Pascal's SM. Which part of it would you consider the same architecture improvement there?
Manufacturing process or node, while related to clock frequencies, is not entirely responsible for them.
That Pascal and Turing GPUs operate at around the same clock frequency has everything to do with design and GPU layout. In the same way that Pentium 4 CPUs could hit high clock frequencies (4GHz) more than 12 years ago. The netburst architecture with its deep pipelines allowed that, even on a 90nm process. You have to think power first, rather than frequency.
You have probably experienced this yourself. Cool your GPU/CPU sufficiently and the clock frequency can run higher. Heat is energy, it's power. By cooling the processor you are directly affecting the power properties/draw (that's why power draw can be lowered for the same frequency, by merely keeping the processor at a lower temperature).
The node is a means to an end, not an end unto itself.
Various manufacturing process are optimized for specific targets. For the Turing GPUs, the transistor density has gone up tremendously, but the clock frequencies have remained the same, while operating voltage for the silicon has decreased *as you'd expect with a smaller node. So from that, one could infer that TSMC's 12nm process as tuned for NVIDIA's Turing GPUs was targeting power and density, rather than clock frequency or perhaps the design and layout of the GPU isn't conducive to much higher clock frequencies anyway. Both could be true and are common occurrences in the semiconductor industry.
Posted on Reply
#54
Vya Domus
ShockGManufacturing process or node, while related to clock frequencies, is not entirely responsible for them.
It has everything to do with that. No one gets on with the design of chip before they know what node it's going to be on.

It goes like this node -> architecture -> clock speeds. As you can see the manufacturing process is a defining factor for just about everything.

It's the number of transistors active at a time that decides how far you push the frequencies. The consequence of a longer pipeline is just that, less die space active due to branch misprediction, dependencies, etc.

Transistor count also inadvertently affects clocks speeds through leakage. All of these these things are tied with the manufacturing process.
Posted on Reply
#55
bug
birdieNVIDIA haters:

NVIDIA sucks.
NVIDIA is ripping everyone off.
NVIDIA is a monopoly.
DXR sucks.
DXR performance sucks.
RTX buyers are alpha testers for DXR/Vulkan RT.


Except, DXR is brand new; currently used in just one game; it's the first implementation ever; we're already seeing impressive gains with just one patch and devs have promised that performance is gonna get even better.

And ... programmable shaders, which are used in close to 100% of modern games, were also first introduced by NVIDIA in 2000 and they have a similar history.
First and foremost, they're trying to convince themselves. Because otherwise they'd have to accept AMD has fallen even further behind.
And yes, it was expected RTX will not shine from the beginning because, guess what?, none of the technologies that came before ever did.
At the same time, there's no denying that on top of developers getting the hang of it, Nvidia also has to shrink the silicone so that we can afford it by only selling one kidney before this can become mainstream.
Posted on Reply
#56
ShockG
Vya DomusIt has everything to do with that. No one gets on with the design of chip before they know what node it's going to be on.

It goes like this node -> architecture -> clock speeds. As you can see the manufacturing process is a defining factor for clock speeds.
I'm not sure how true that is.
Intel, AMD, NVIDIA, Samsung etc all measure power per mm^2, i.e power density is a key metric, not the clock frequency. The clock frequency comes way after they figure out what is the best performance per watt. So power per watt, must intersect with power per mm^2, when that target is reached, then clock frequencies may be adjusted accordingly. Power draw/consumption is affected by the manufacturing process, but that comes from the expected properties of the manufacturing node, from which the processor will be made.
Clock frequency is a target only after these two (power per mm^2, power per watt) have been worked out as clock frequency alone doesn't tell you anything about a processors computational abilities in isolation.
The design process at a high level, must make sense long before the node on which it will be manufactured on is a driving factor (how else could you work on a GPU for 10 years etc.). When engineers are drawing traces and laying out logic blocks etc. these do not always translate into how they will end up physically. for example traces can be wider or narrower in the silicon as opposed to when they were drawn on paper. This design and layout process can't wait for the node to exist first, it must work outside of that. Hence the back and fourth between these semiconductor firms and their fabs which is exhaustive. Design validation is always first and you get that through simulation data both at the labs and at the fabs where it will be made. Clock frequencies can only be as high as what the power constraints allow, which is why that get's worked on first (power), not clock frequency.

Kepler and Maxwell were both on TSMC's 28nm process. What allowed the higher clock frequencies and lower power consumption (i.e power per watt and power per mm^2 was improved) was the logic changes in the design and layout, not the manufacturing improvements entirely. That extra 200~300MHz Maxwell could do over Kepler came from the changes made there primarily. While TSMC had improved their 28nm process, it wouldn't be enough to take Kepler as it is; add a billion transistors, increase the clock frequency and still hit the same power target. With all of those changes, areal density between these two increased by roughly ~ 5.4%, but clock speeds went up by nearly 20% on the same 28nm node. The heavy lifting in achieving this is outside of what TSMC's node manufacturing improvement alone could yield.
Posted on Reply
#57
londiste
ShockGVarious manufacturing process are optimized for specific targets. For the Turing GPUs, the transistor density has gone up tremendously, but the clock frequencies have remained the same, while operating voltage for the silicon has decreased *as you'd expect with a smaller node. So from that, one could infer that TSMC's 12nm process as tuned for NVIDIA's Turing GPUs was targeting power and density, rather than clock frequency or perhaps the design and layout of the GPU isn't conducive to much higher clock frequencies anyway. Both could be true and are common occurrences in the semiconductor industry.
Looking at the public die size and transistor count specs, Turings (TU102/104/106) are 24.3-25 MTransistors/mm² while Pascals (GP102/104/106) are 22.0-25.5. This is not a precise method but we can fairly certainly say the density difference is minimal if there is one at all.

You are right about TSMC's 12nm being power optimized though. Slightly lower voltages on GPU do bring power consumption reductions. They are not very major though, 5% voltage reduction at best with a slightly larger power consumtion reduction.

Frequency is probably limited both by architecture and manufacturing process at this point.
Posted on Reply
#58
Vya Domus
ShockG(how else could you work on a GPU for 10 years etc.)
It's simple, you don't. That claim could mean anything. Working on block diagrams and speculating for 10 years ? Sure. Designing the chip itself down to the transistor level, absolutely not, and the thing is you have to do that in order to come up with a complete working design that meets your targets. And you need to know the characteristics of the node down to the smallest detail. You just can't design a chip like that without knowing where are you going to put those 18 billion transistors and how they'll behave.

Unless of course you believe Nvidia had magic globe that told them they are going to work on TSMC's 12 nm a decade prior. May I also point out that this node was co-developed by Nvidia, that's not incidental. Turing likely entered critical development no more than 1-2 years prior to that.
Posted on Reply
#59
Easy Rhino
Linux Advocate
I thought we all stopped buying games from EA?
Posted on Reply
#60
ShredBird
To anyone thinking they simply cut corners to get more performance, just read the quote below of one of the developers talking about optimizing the engine for RTX. These devs are incredibly intelligent, understand computer graphics to a mind boggling level of detail and are passionate about what they do. I agree, RTX is going to have some growing pains, and it's a questionable value proposition for sure, but these developers are doing anything but cutting corners for the sake of performance.
Yasin Uludağ: One of the optimisations that is built into the BVHs are our use of “overlapped” compute - multiple compute shaders running in parallel. This is not the same thing as async compute or simultaneous compute. It just means you can run multiple compute shaders in parallel. However, there is an implicit barrier injected by the driver that prevents these shaders running in parallel when we record our command lists in parallel for BVH building. This will be fixed in the future and we can expect quite a bit of performance here since it removes sync points and wait-for-idles on the GPU.
We also plan on running BVH building using simultaneous compute during the G-Buffer generation phase, allowing ray tracing to start much earlier in the frame, and the G-Buffer pass. Nsight traces shows that this can be a big benefit. This will be done in the future.
Another optimisation we have in the pipe and that almost made launch was a hybrid ray trace/ray march system. This hybrid ray marcher creates a mip map on the entire depth buffer using a MIN filter. This means that every level takes the closest depth in 2x2 regions and keeps going all the way to the lowest mip map. Because this uses a so-called min filter, you know you can skip an entire region on the screen while traversing.
With this, ray binning then accelerates the hybrid ray traverser tremendously because rays are fetched from the same pixels down the same mip map thereby having super efficient cache utilisation. If your ray gets stuck behind objects as you find in classic screen-space reflections, this system then promotes the ray to become a ray trace/world space ray and continue from the failure point. We also get quality wins here as decals and grass strands will now be in reflections.
We have optimised the denoiser as well so it runs faster and we are also working on optimisations for our compute passes and filters that run throughout the ray tracing implementation.
We have applied for presenting our work/tech at GDC, so look out for that!
Source: www.eurogamer.net/articles/digitalfoundry-2018-battlefield-5-rtx-ray-tracing-analysis
Posted on Reply
#61
Xzibit
ShredBirdTo anyone thinking they simply cut corners to get more performance, just read the quote below of one of the developers talking about optimizing the engine for RTX. These devs are incredibly intelligent, understand computer graphics to a mind boggling level of detail and are passionate about what they do. I agree, RTX is going to have some growing pains, and it's a questionable value proposition for sure, but these developers are doing anything but cutting corners for the sake of performance.



Source: www.eurogamer.net/articles/digitalfoundry-2018-battlefield-5-rtx-ray-tracing-analysis
Interesting. These or at least some were already talked about at GDC 2018 by Nvidia.
In fact, at the end of our analysis piece, you'll find our in-depth interview with DICE rendering engineer Yasin Uludağ, who has been working with colleague Johannes Deligiannis for the last year on implementing ray tracing within Battlefield 5.
Doesn't just work. Didn't they tout only a few days at the Demo unveiling ?

Its being reflected at a lower rez
  • Low: 0.9 smoothness cut-off and 15.0 per cent of screen resolution as maximum ray count.
  • Med: 0.9 smoothness cut-off and 23.3 per cent of screen resolution as maximum ray count.
  • High: 0.5 smoothness cut-off and 31.6 per cent of screen resolution as maximum ray count.
  • Ultra: 0.5 smoothness cut-off and 40.0 per cent of screen resolution as maximum ray count.
Expect to see more granularity added to the DXR settings, perhaps with a focus on culling distance and LODs.
I say maximum ray count here because we will try to distribute rays from this fixed pool onto those screen pixels that are prescribed to be reflective (based on their reflective properties) but we can never go beyond one ray per pixel in our implementation. So, if only a small percentage of the screen is reflective, we give all of those pixels one ray.
Posted on Reply
#62
londiste
The old rendering implementation for reflections was half-res - whether that is half or 1/4 is up for debate, I suppose.

DigitalFoundry article was after the initial DXR patch so the changes he was talking about are definitely responsible for better performance. Something was clearly wrong with med/high/ultra performance in the initial DXR patch though and I have not seen any updated benchmarks yet.
Posted on Reply
#63
EarthDog
Have they rolled it out again?
Posted on Reply
#65
MuhammedAbdo
Vayra86They certainly did do things with quality to get here. I haven't seen a single sharp reflection across the whole video. Its a blurry mess. Still a small improvement over non-RT, but only when you look for it. One noticeable aspect is increased DoF on the weapon. You can see this @ 0:43
Nope, that's just different motion blur amount and mud amount on the gun, the scenes are not aligned perfectly.
Vayra86In brief: DXR may well have the knock-on effect that each preset also alters the other game graphics quality settings.
In brief you are talking nonesense, the developer is open about their implementation from day one. In fact they improved the qulity of reflections through enhanced denoising techniques.
stimpy88So now we have the oxymoron that is, we now have next gen Ray-Traced graphical fidelity, the holy grail if you will, paired with medium detail textures and effects resembling games from 2012. Welcome to the future indeed.
Medium RTX genius, not medium game quality.
Posted on Reply
Add your own comment
Jun 29th, 2024 21:13 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts