Monday, December 3rd 2018
DICE Prepares "Battlefield V" RTX/DXR Performance Patch: Up to 50% FPS Gains
EA-DICE and NVIDIA earned a lot of bad press last month, when performance numbers for "Battlefield V" with DirectX Raytracing (DXR) were finally out. Gamers were disappointed to see that DXR inflicts heavy performance penalties, with 4K UHD gameplay becoming out of bounds even for the $1,200 GeForce RTX 2080 Ti, and acceptable frame-rates only available on 1080p resolution. DICE has since been tirelessly working to rework its real-time raytracing implementation so performance is improved. Tomorrow (4th December), the studio will release a patch to "Battlefield V," a day ahead of its new Tides of War: Overture and new War Story slated for December 5th. This patch could be a game-changer for GeForce RTX users.
NVIDIA has been closely working with EA-DICE on this new patch, which NVIDIA claims improves the game's frame-rates with DXR enabled by "up to 50 percent." The patch enables RTX 2080 Ti users to smoothly play "Battlefield V" with DXR at 1440p resolution, with frame-rates over 60 fps, and DXR Reflections set to "Ultra." RTX 2080 (non-Ti) users should be able to play the game at 1440p with over 60 fps, if the DXR Reflections toggle is set at "Medium." RTX 2070 users can play the game at 1080p, with over 60 fps, and the toggle set to "Medium." NVIDIA states that it is continuing to work with DICE to improve DXR performance even further, which will take the shape of future game patches and driver updates.A video presentation by NVIDIA follows.
NVIDIA has been closely working with EA-DICE on this new patch, which NVIDIA claims improves the game's frame-rates with DXR enabled by "up to 50 percent." The patch enables RTX 2080 Ti users to smoothly play "Battlefield V" with DXR at 1440p resolution, with frame-rates over 60 fps, and DXR Reflections set to "Ultra." RTX 2080 (non-Ti) users should be able to play the game at 1440p with over 60 fps, if the DXR Reflections toggle is set at "Medium." RTX 2070 users can play the game at 1080p, with over 60 fps, and the toggle set to "Medium." NVIDIA states that it is continuing to work with DICE to improve DXR performance even further, which will take the shape of future game patches and driver updates.A video presentation by NVIDIA follows.
66 Comments on DICE Prepares "Battlefield V" RTX/DXR Performance Patch: Up to 50% FPS Gains
DICE guy said they made no changes to the quality levels.
From the video, the bugs seem to have been the most awful bits. The destruction and foliage especially, the few fps counters they showed in the video were showing over 50% and around 300% (!) improvements in these situations.
Variable rate ray-tracing sounds like an obvious thing to have and I am surprised they did not do that before. Or maybe they did but now just optimized it a lot.
Would be interesting to know if filtering improvements mean they scrapped their own custom filter and are running's Nvidia's one on Tensor cores now. Offtopic, but 12nm is not a shrink compared to 14/16nm. There really does not seem to be any higher clock speeds, maybe 100Mhz on average, if that.
Volta SM is 40%-ish larger than Pascal's and Tensor cores are maybe half of that or less. RT cores add 15% to Pascal's SM. Which part of it would you consider the same architecture improvement there?
That Pascal and Turing GPUs operate at around the same clock frequency has everything to do with design and GPU layout. In the same way that Pentium 4 CPUs could hit high clock frequencies (4GHz) more than 12 years ago. The netburst architecture with its deep pipelines allowed that, even on a 90nm process. You have to think power first, rather than frequency.
You have probably experienced this yourself. Cool your GPU/CPU sufficiently and the clock frequency can run higher. Heat is energy, it's power. By cooling the processor you are directly affecting the power properties/draw (that's why power draw can be lowered for the same frequency, by merely keeping the processor at a lower temperature).
The node is a means to an end, not an end unto itself.
Various manufacturing process are optimized for specific targets. For the Turing GPUs, the transistor density has gone up tremendously, but the clock frequencies have remained the same, while operating voltage for the silicon has decreased *as you'd expect with a smaller node. So from that, one could infer that TSMC's 12nm process as tuned for NVIDIA's Turing GPUs was targeting power and density, rather than clock frequency or perhaps the design and layout of the GPU isn't conducive to much higher clock frequencies anyway. Both could be true and are common occurrences in the semiconductor industry.
It goes like this node -> architecture -> clock speeds. As you can see the manufacturing process is a defining factor for just about everything.
It's the number of transistors active at a time that decides how far you push the frequencies. The consequence of a longer pipeline is just that, less die space active due to branch misprediction, dependencies, etc.
Transistor count also inadvertently affects clocks speeds through leakage. All of these these things are tied with the manufacturing process.
And yes, it was expected RTX will not shine from the beginning because, guess what?, none of the technologies that came before ever did.
At the same time, there's no denying that on top of developers getting the hang of it, Nvidia also has to shrink the silicone so that we can afford it by only selling one kidney before this can become mainstream.
Intel, AMD, NVIDIA, Samsung etc all measure power per mm^2, i.e power density is a key metric, not the clock frequency. The clock frequency comes way after they figure out what is the best performance per watt. So power per watt, must intersect with power per mm^2, when that target is reached, then clock frequencies may be adjusted accordingly. Power draw/consumption is affected by the manufacturing process, but that comes from the expected properties of the manufacturing node, from which the processor will be made.
Clock frequency is a target only after these two (power per mm^2, power per watt) have been worked out as clock frequency alone doesn't tell you anything about a processors computational abilities in isolation.
The design process at a high level, must make sense long before the node on which it will be manufactured on is a driving factor (how else could you work on a GPU for 10 years etc.). When engineers are drawing traces and laying out logic blocks etc. these do not always translate into how they will end up physically. for example traces can be wider or narrower in the silicon as opposed to when they were drawn on paper. This design and layout process can't wait for the node to exist first, it must work outside of that. Hence the back and fourth between these semiconductor firms and their fabs which is exhaustive. Design validation is always first and you get that through simulation data both at the labs and at the fabs where it will be made. Clock frequencies can only be as high as what the power constraints allow, which is why that get's worked on first (power), not clock frequency.
Kepler and Maxwell were both on TSMC's 28nm process. What allowed the higher clock frequencies and lower power consumption (i.e power per watt and power per mm^2 was improved) was the logic changes in the design and layout, not the manufacturing improvements entirely. That extra 200~300MHz Maxwell could do over Kepler came from the changes made there primarily. While TSMC had improved their 28nm process, it wouldn't be enough to take Kepler as it is; add a billion transistors, increase the clock frequency and still hit the same power target. With all of those changes, areal density between these two increased by roughly ~ 5.4%, but clock speeds went up by nearly 20% on the same 28nm node. The heavy lifting in achieving this is outside of what TSMC's node manufacturing improvement alone could yield.
You are right about TSMC's 12nm being power optimized though. Slightly lower voltages on GPU do bring power consumption reductions. They are not very major though, 5% voltage reduction at best with a slightly larger power consumtion reduction.
Frequency is probably limited both by architecture and manufacturing process at this point.
Unless of course you believe Nvidia had magic globe that told them they are going to work on TSMC's 12 nm a decade prior. May I also point out that this node was co-developed by Nvidia, that's not incidental. Turing likely entered critical development no more than 1-2 years prior to that.
Its being reflected at a lower rez
DigitalFoundry article was after the initial DXR patch so the changes he was talking about are definitely responsible for better performance. Something was clearly wrong with med/high/ultra performance in the initial DXR patch though and I have not seen any updated benchmarks yet.