DICE Prepares "Battlefield V" RTX/DXR Performance Patch: Up to 50% FPS Gains

lexluthermiester · Dec 4, 2018

Vayra86 said:
Impressive, now lets see them do that optimization on a less big budget, high exposure title.

That's just being negative for the sake of being contrary.

This situation happens with every new technology advance. A company introduces something new, devs need time to get to know that something new and then it is utilized. Improvements then happen as knowledge and experience factor into things.

BTW; Who called it? Come on now..

londiste · Dec 4, 2018

EarthDog said:
It looks like to 2080 Ti in the first scene below looks almost identical, but when you go to 2080 medium (Vya's pic), there is a difference. If you look at the same scene with the 2070 on low, they look nearly identical as well. Not sure what is up with medium in this scene... its def more blurry.

Applying Medium settings is buggy in the old patch. It could simply be part of that.
DICE guy said they made no changes to the quality levels.

From the video, the bugs seem to have been the most awful bits. The destruction and foliage especially, the few fps counters they showed in the video were showing over 50% and around 300% (!) improvements in these situations.
Variable rate ray-tracing sounds like an obvious thing to have and I am surprised they did not do that before. Or maybe they did but now just optimized it a lot.
Would be interesting to know if filtering improvements mean they scrapped their own custom filter and are running's Nvidia's one on Tensor cores now.

Steevo said:
That would be great, but let's compare it to the 1080Ti, and what it could have been shrunk, and with higher clockspeed and the same architecture improvement.

Offtopic, but 12nm is not a shrink compared to 14/16nm. There really does not seem to be any higher clock speeds, maybe 100Mhz on average, if that.
Volta SM is 40%-ish larger than Pascal's and Tensor cores are maybe half of that or less. RT cores add 15% to Pascal's SM. Which part of it would you consider the same architecture improvement there?

Robcostyle · Dec 4, 2018

100% gain for such a small period of time - can’t be only optimization. Quality downgrade is present aswell.

birdie said:
NVIDIA haters:

NVIDIA sucks.
NVIDIA is ripping everyone off.
NVIDIA is a monopoly.
DXR sucks.
DXR performance sucks.
RTX buyers are alpha testers for DXR/Vulkan RT.

Except, DXR is brand new; currently used in just one game; it's the first implementation ever; we're already seeing impressive gains with just one patch and devs have promised that performance is gonna get even better.

And ... programmable shaders, which are used in close to 100% of modern games, were also first introduced by NVIDIA in 2000 and they have a similar history.

Yeah, yeah, just keep your shirt on - I know you and couple of ur crazy fanboys-friends bought one, and now you get triggered wehn someone starts critisism against rtx card (mostly about the price though).

ShockG · Dec 4, 2018

londiste said:
Offtopic, but 12nm is not a shrink compared to 14/16nm. There really does not seem to be any higher clock speeds, maybe 100Mhz on average, if that.
Volta SM is 40%-ish larger than Pascal's and Tensor cores are maybe half of that or less. RT cores add 15% to Pascal's SM. Which part of it would you consider the same architecture improvement there?

Manufacturing process or node, while related to clock frequencies, is not entirely responsible for them.
That Pascal and Turing GPUs operate at around the same clock frequency has everything to do with design and GPU layout. In the same way that Pentium 4 CPUs could hit high clock frequencies (4GHz) more than 12 years ago. The netburst architecture with its deep pipelines allowed that, even on a 90nm process. You have to think power first, rather than frequency.
You have probably experienced this yourself. Cool your GPU/CPU sufficiently and the clock frequency can run higher. Heat is energy, it's power. By cooling the processor you are directly affecting the power properties/draw (that's why power draw can be lowered for the same frequency, by merely keeping the processor at a lower temperature).
The node is a means to an end, not an end unto itself.
Various manufacturing process are optimized for specific targets. For the Turing GPUs, the transistor density has gone up tremendously, but the clock frequencies have remained the same, while operating voltage for the silicon has decreased *as you'd expect with a smaller node. So from that, one could infer that TSMC's 12nm process as tuned for NVIDIA's Turing GPUs was targeting power and density, rather than clock frequency or perhaps the design and layout of the GPU isn't conducive to much higher clock frequencies anyway. Both could be true and are common occurrences in the semiconductor industry.

Vya Domus · Dec 4, 2018

ShockG said:
Manufacturing process or node, while related to clock frequencies, is not entirely responsible for them.

It has everything to do with that. No one gets on with the design of chip before they know what node it's going to be on.

It goes like this node -> architecture -> clock speeds. As you can see the manufacturing process is a defining factor for just about everything.

It's the number of transistors active at a time that decides how far you push the frequencies. The consequence of a longer pipeline is just that, less die space active due to branch misprediction, dependencies, etc.

Transistor count also inadvertently affects clocks speeds through leakage. All of these these things are tied with the manufacturing process.

bug · Dec 4, 2018

birdie said:
NVIDIA haters:

NVIDIA sucks.
NVIDIA is ripping everyone off.
NVIDIA is a monopoly.
DXR sucks.
DXR performance sucks.
RTX buyers are alpha testers for DXR/Vulkan RT.

Except, DXR is brand new; currently used in just one game; it's the first implementation ever; we're already seeing impressive gains with just one patch and devs have promised that performance is gonna get even better.

And ... programmable shaders, which are used in close to 100% of modern games, were also first introduced by NVIDIA in 2000 and they have a similar history.

First and foremost, they're trying to convince themselves. Because otherwise they'd have to accept AMD has fallen even further behind.
And yes, it was expected RTX will not shine from the beginning because, guess what?, none of the technologies that came before ever did.
At the same time, there's no denying that on top of developers getting the hang of it, Nvidia also has to shrink the silicone so that we can afford it by only selling one kidney before this can become mainstream.

ShockG · Dec 4, 2018

Vya Domus said:
It has everything to do with that. No one gets on with the design of chip before they know what node it's going to be on.

It goes like this node -> architecture -> clock speeds. As you can see the manufacturing process is a defining factor for clock speeds.

I'm not sure how true that is.
Intel, AMD, NVIDIA, Samsung etc all measure power per mm^2, i.e power density is a key metric, not the clock frequency. The clock frequency comes way after they figure out what is the best performance per watt. So power per watt, must intersect with power per mm^2, when that target is reached, then clock frequencies may be adjusted accordingly. Power draw/consumption is affected by the manufacturing process, but that comes from the expected properties of the manufacturing node, from which the processor will be made.
Clock frequency is a target only after these two (power per mm^2, power per watt) have been worked out as clock frequency alone doesn't tell you anything about a processors computational abilities in isolation.
The design process at a high level, must make sense long before the node on which it will be manufactured on is a driving factor (how else could you work on a GPU for 10 years etc.). When engineers are drawing traces and laying out logic blocks etc. these do not always translate into how they will end up physically. for example traces can be wider or narrower in the silicon as opposed to when they were drawn on paper. This design and layout process can't wait for the node to exist first, it must work outside of that. Hence the back and fourth between these semiconductor firms and their fabs which is exhaustive. Design validation is always first and you get that through simulation data both at the labs and at the fabs where it will be made. Clock frequencies can only be as high as what the power constraints allow, which is why that get's worked on first (power), not clock frequency.

Kepler and Maxwell were both on TSMC's 28nm process. What allowed the higher clock frequencies and lower power consumption (i.e power per watt and power per mm^2 was improved) was the logic changes in the design and layout, not the manufacturing improvements entirely. That extra 200~300MHz Maxwell could do over Kepler came from the changes made there primarily. While TSMC had improved their 28nm process, it wouldn't be enough to take Kepler as it is; add a billion transistors, increase the clock frequency and still hit the same power target. With all of those changes, areal density between these two increased by roughly ~ 5.4%, but clock speeds went up by nearly 20% on the same 28nm node. The heavy lifting in achieving this is outside of what TSMC's node manufacturing improvement alone could yield.

londiste · Dec 4, 2018

ShockG said:
Various manufacturing process are optimized for specific targets. For the Turing GPUs, the transistor density has gone up tremendously, but the clock frequencies have remained the same, while operating voltage for the silicon has decreased *as you'd expect with a smaller node. So from that, one could infer that TSMC's 12nm process as tuned for NVIDIA's Turing GPUs was targeting power and density, rather than clock frequency or perhaps the design and layout of the GPU isn't conducive to much higher clock frequencies anyway. Both could be true and are common occurrences in the semiconductor industry.

Looking at the public die size and transistor count specs, Turings (TU102/104/106) are 24.3-25 MTransistors/mm² while Pascals (GP102/104/106) are 22.0-25.5. This is not a precise method but we can fairly certainly say the density difference is minimal if there is one at all.

You are right about TSMC's 12nm being power optimized though. Slightly lower voltages on GPU do bring power consumption reductions. They are not very major though, 5% voltage reduction at best with a slightly larger power consumtion reduction.

Frequency is probably limited both by architecture and manufacturing process at this point.

Vya Domus · Dec 4, 2018

ShockG said:
(how else could you work on a GPU for 10 years etc.)

It's simple, you don't. That claim could mean anything. Working on block diagrams and speculating for 10 years ? Sure. Designing the chip itself down to the transistor level, absolutely not, and the thing is you have to do that in order to come up with a complete working design that meets your targets. And you need to know the characteristics of the node down to the smallest detail. You just can't design a chip like that without knowing where are you going to put those 18 billion transistors and how they'll behave.

Unless of course you believe Nvidia had magic globe that told them they are going to work on TSMC's 12 nm a decade prior. May I also point out that this node was co-developed by Nvidia, that's not incidental. Turing likely entered critical development no more than 1-2 years prior to that.

Easy Rhino · Dec 4, 2018

I thought we all stopped buying games from EA?

ShredBird · Dec 5, 2018

To anyone thinking they simply cut corners to get more performance, just read the quote below of one of the developers talking about optimizing the engine for RTX. These devs are incredibly intelligent, understand computer graphics to a mind boggling level of detail and are passionate about what they do. I agree, RTX is going to have some growing pains, and it's a questionable value proposition for sure, but these developers are doing anything but cutting corners for the sake of performance.

Yasin Uludağ: One of the optimisations that is built into the BVHs are our use of “overlapped” compute - multiple compute shaders running in parallel. This is not the same thing as async compute or simultaneous compute. It just means you can run multiple compute shaders in parallel. However, there is an implicit barrier injected by the driver that prevents these shaders running in parallel when we record our command lists in parallel for BVH building. This will be fixed in the future and we can expect quite a bit of performance here since it removes sync points and wait-for-idles on the GPU.
We also plan on running BVH building using simultaneous compute during the G-Buffer generation phase, allowing ray tracing to start much earlier in the frame, and the G-Buffer pass. Nsight traces shows that this can be a big benefit. This will be done in the future.
Another optimisation we have in the pipe and that almost made launch was a hybrid ray trace/ray march system. This hybrid ray marcher creates a mip map on the entire depth buffer using a MIN filter. This means that every level takes the closest depth in 2x2 regions and keeps going all the way to the lowest mip map. Because this uses a so-called min filter, you know you can skip an entire region on the screen while traversing.
With this, ray binning then accelerates the hybrid ray traverser tremendously because rays are fetched from the same pixels down the same mip map thereby having super efficient cache utilisation. If your ray gets stuck behind objects as you find in classic screen-space reflections, this system then promotes the ray to become a ray trace/world space ray and continue from the failure point. We also get quality wins here as decals and grass strands will now be in reflections.
We have optimised the denoiser as well so it runs faster and we are also working on optimisations for our compute passes and filters that run throughout the ray tracing implementation.
We have applied for presenting our work/tech at GDC, so look out for that!

Source: https://www.eurogamer.net/articles/digitalfoundry-2018-battlefield-5-rtx-ray-tracing-analysis

Xzibit · Dec 5, 2018

ShredBird said:
To anyone thinking they simply cut corners to get more performance, just read the quote below of one of the developers talking about optimizing the engine for RTX. These devs are incredibly intelligent, understand computer graphics to a mind boggling level of detail and are passionate about what they do. I agree, RTX is going to have some growing pains, and it's a questionable value proposition for sure, but these developers are doing anything but cutting corners for the sake of performance.

Source: https://www.eurogamer.net/articles/digitalfoundry-2018-battlefield-5-rtx-ray-tracing-analysis

Interesting. These or at least some were already talked about at GDC 2018 by Nvidia.

In fact, at the end of our analysis piece, you'll find our in-depth interview with DICE rendering engineer Yasin Uludağ, who has been working with colleague Johannes Deligiannis for the last year on implementing ray tracing within Battlefield 5.

Doesn't just work. Didn't they tout only a few days at the Demo unveiling ?

Its being reflected at a lower rez

Low: 0.9 smoothness cut-off and 15.0 per cent of screen resolution as maximum ray count.

Med: 0.9 smoothness cut-off and 23.3 per cent of screen resolution as maximum ray count.

High: 0.5 smoothness cut-off and 31.6 per cent of screen resolution as maximum ray count.

Ultra: 0.5 smoothness cut-off and 40.0 per cent of screen resolution as maximum ray count.

Expect to see more granularity added to the DXR settings, perhaps with a focus on culling distance and LODs.

I say maximum ray count here because we will try to distribute rays from this fixed pool onto those screen pixels that are prescribed to be reflective (based on their reflective properties) but we can never go beyond one ray per pixel in our implementation. So, if only a small percentage of the screen is reflective, we give all of those pixels one ray.

londiste · Dec 5, 2018

The old rendering implementation for reflections was half-res - whether that is half or 1/4 is up for debate, I suppose.

DigitalFoundry article was after the initial DXR patch so the changes he was talking about are definitely responsible for better performance. Something was clearly wrong with med/high/ultra performance in the initial DXR patch though and I have not seen any updated benchmarks yet.

EarthDog · Dec 5, 2018

Have they rolled it out again?

londiste · Dec 5, 2018

https://www.guru3d.com/news-story/d...-performance-patch-released-(benchmarks).html

MuhammedAbdo · Dec 5, 2018

Vayra86 said:
They certainly did do things with quality to get here. I haven't seen a single sharp reflection across the whole video. Its a blurry mess. Still a small improvement over non-RT, but only when you look for it. One noticeable aspect is increased DoF on the weapon. You can see this @ 0:43

Nope, that's just different motion blur amount and mud amount on the gun, the scenes are not aligned perfectly.

Vayra86 said:
In brief: DXR may well have the knock-on effect that each preset also alters the other game graphics quality settings.

In brief you are talking nonesense, the developer is open about their implementation from day one. In fact they improved the qulity of reflections through enhanced denoising techniques.

stimpy88 said:
So now we have the oxymoron that is, we now have next gen Ray-Traced graphical fidelity, the holy grail if you will, paired with medium detail textures and effects resembling games from 2012. Welcome to the future indeed.

Medium RTX genius, not medium game quality.

EarthDog · Dec 5, 2018

londiste said:
https://www.guru3d.com/news-story/d...-performance-patch-released-(benchmarks).html

Looks like significant improvement across the line and NVIDIA doesnt seem to be off from what they said.

Processor	Ryzen 7800X3D
Motherboard	ROG STRIX B650E-F GAMING WIFI
Memory	2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s)	INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage	2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s)	42" LG C2 OLED, 27" ASUS PG279Q
Case	Thermaltake Core P5
Power Supply	Fractal Design Ion+ Platinum 760W
Mouse	Corsair Dark Core RGB Pro SE
Keyboard	Corsair K100 RGB
VR HMD	HTC Vive Cosmos

System Name	Good enough
Processor	AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard	ASRock B650 Pro RS
Cooling	2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory	32GB - FURY Beast RGB 5600 Mhz
Video Card(s)	Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage	1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s)	LG UltraGear 32GN650-B + 4K Samsung TV
Case	Phanteks NV7
Power Supply	GPS-750C

Processor	Intel i5-12600k
Motherboard	Asus H670 TUF
Cooling	Arctic Freezer 34
Memory	2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s)	EVGA GTX 1060 SC
Storage	500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s)	Dell U3219Q + HP ZR24w
Case	Raijintek Thetis
Audio Device(s)	Audioquest Dragonfly Red :D
Power Supply	Seasonic 620W M12
Mouse	Logitech G502 Proteus Core
Keyboard	G.Skill KM780R
Software	Arch Linux + Win10

Processor	Ryzen 7800X3D
Motherboard	ROG STRIX B650E-F GAMING WIFI
Memory	2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s)	INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage	2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s)	42" LG C2 OLED, 27" ASUS PG279Q
Case	Thermaltake Core P5
Power Supply	Fractal Design Ion+ Platinum 760W
Mouse	Corsair Dark Core RGB Pro SE
Keyboard	Corsair K100 RGB
VR HMD	HTC Vive Cosmos

System Name	Good enough
Processor	AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard	ASRock B650 Pro RS
Cooling	2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory	32GB - FURY Beast RGB 5600 Mhz
Video Card(s)	Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage	1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s)	LG UltraGear 32GN650-B + 4K Samsung TV
Case	Phanteks NV7
Power Supply	GPS-750C

System Name	Desktop
Processor	i5 13600KF
Motherboard	AsRock B760M Steel Legend Wifi
Cooling	Noctua NH-U9S
Memory	4x 16 Gb Gskill S5 DDR5 @6000
Video Card(s)	Gigabyte Gaming OC 6750 XT 12GB
Storage	WD_BLACK 4TB SN850x
Display(s)	Gigabye M32U
Case	Corsair Carbide 400C
Audio Device(s)	On Board
Power Supply	EVGA Supernova 650 P2
Mouse	MX Master 3s
Keyboard	Logitech G915 Wireless Clicky
Software	The Matrix

System Name	Bird's Monolith
Processor	Intel i7-4770k 4.6 GHz (liquid metal)
Motherboard	Asrock Z87 Extreme3
Cooling	Noctua NH-D14, Noctua 140mm Case Fans
Memory	16 GB G-Skill Trident-X DDR3 2400 CAS 9
Video Card(s)	EVGA 1080ti SC2 Hybrid
Storage	2 TB Mushkin Triactor 3D (RAID 0)
Display(s)	Dell S2716DG / Samsung Q80R QLED TV
Case	Fractal Design Define R4
Audio Device(s)	Audio Engine D1 DAC, A5+ Speakers, SteelSeries Arctis 7
Power Supply	Seasonic Platinum 660 W
Mouse	SteelSeries Rival
Keyboard	SteelSeries Apex
Software	Windows 10 Pro x64

DICE Prepares "Battlefield V" RTX/DXR Performance Patch: Up to 50% FPS Gains

Linux Advocate