Thursday, April 11th 2019

NVIDIA Extends DirectX Raytracing (DXR) Support to Many GeForce GTX GPUs

NVIDIA today announced that it is extending DXR (DirectX Raytracing) support to several GeForce GTX graphics models beyond its GeForce RTX series. These include the GTX 1660 Ti, GTX 1660, GTX 1080 Ti, GTX 1080, GTX 1070 Ti, GTX 1070, and GTX 1060 6 GB. The GTX 1060 3 GB and lower "Pascal" models don't support DXR, nor do older generations of NVIDIA GPUs. NVIDIA has implemented real-time raytracing on GPUs without specialized components such as RT cores or tensor cores, by essentially implementing the rendering path through shaders, in this case, CUDA cores. DXR support will be added through a new GeForce graphics driver later today.

The GPU's CUDA cores now have to calculate BVR, intersection, reflection, and refraction. The GTX 16-series chips have an edge over "Pascal" despite lacking RT cores, as the "Turing" CUDA cores support concurrent INT and FP execution, allowing more work to be done per clock. NVIDIA in a detailed presentation listed out the kinds of real-time ray-tracing effects available by the DXR API, namely reflections, shadows, advanced reflections and shadows, ambient occlusion, global illumination (unbaked), and combinations of these. The company put out detailed performance numbers for a selection of GTX 10-series and GTX 16-series GPUs, and compared them to RTX 20-series SKUs that have specialized hardware for DXR.
Update: Article updated with additional test data from NVIDIA.

According to NVIDIA's numbers, GPUs without RTX are significantly slower than the RTX 20-series. No surprises here. But at 1440p, the resolution NVIDIA chose for these tests, you would need at least a GTX 1080 or GTX 1080 Ti for playable frame-rates (above 30 fps). This is especially true in case of Battlefield V, in which only the GTX 1080 Ti manages 30 fps. The gap between the GTX 1080 Ti and GTX 1080 is vast, with the latter serving up only 25 fps. The GTX 1070 and GTX 1060 6 GB spit out really fast Powerpoint presentations, at under 20 fps.
It's important to note here, that NVIDIA tested at the highest DXR settings for Battlefield V, and lowering the DXR Reflections quality could improve frame-rates, although we remain skeptical about the slower SKUs such as GTX 1070 and GTX 1060 6 GB. The story repeats with Shadow of the Tomb Raider, which uses DXR shadows, albeit the frame-rates are marginally higher than Battlefield V. You still need a GTX 1080 Ti for 34 fps.
Atomic Heart uses Advanced Reflections (reflections of reflections, and non-planar reflective surfaces). Unfortunately, no GeForce GTX card manages performance over 15.4 fps. The story repeats with 3DMark Port Royal, which uses both Advanced Reflections and DXR Shadows. Single-digit frame-rates for all GTX cards. The performance is better with Justice tech-demo, although far-from playable, as only the GTX 1080 and GTX 1080 Ti manage over 20 fps. Advanced Reflections and AO, in case of the Star Wars RTX tech-demo, is another torture for these GPUs - single-digit frame-rates all over. Global Illumination with Metro Exodus is another slog for these chips.
Overall, NVIDIA has managed to script the perfect advertisement for the RTX 20-series. Real-time ray-tracing on compute shaders is horrendously slow, and it pays to have specialized hardware such as RT cores for them, while tensor cores accelerate DLSS to improve performance even further.
It remains to be seen if AMD takes a swing at DXR on GCN stream processors any time soon. The company has already had a technical effort underway for years under Radeon Rays, and is reportedly working on DXR.

Update:
NVIDIA posted its test data for 4K and 1080p in addition to 1440p, and medium-thru-low settings of DXR. Their entire test data is posted below.

Add your own comment

111 Comments on NVIDIA Extends DirectX Raytracing (DXR) Support to Many GeForce GTX GPUs

#76
FordGT90Concept
"I go fast!1!11!1!"
medi01Could you elaborate?
There's one RT core attached to each SM. An ideal raytracing ASIC would have one pool of shared memory with all raytracing compute units attached to it. They all have access to the same mesh information and light source information and they illuminate the meshes based on collisions. As I said before, the way Turing is set up, it's meant to complement exiting rendering techniques and not replace them. Correct me if I'm wrong but I don't think there's even a 100% raytraced demo running on RTX. Yeah, NVIDIA compiled a few videos of exclusively raytraced demos but AMD has done the same with Radeon Rays too--nothing exceptional that all GPUs can do given enough time.

It makes more sense to me that AMD would develop a raytracing ASIC which operates as a co-processor on the GPU not unlike the Video Coding Engine and Unified Video Decoder. It would pull meshes and illumination information from the main memory, bounce it's rays, and then copy back the updated meshes to be anti-aliased.

All that information is generally in the VRAM. A raytracing ASIC would be more cache than anything else.
Posted on Reply
#77
Crackong
R-T-BYeah, if you have any technical knowledge at all you don't have to guess about this. It simply can't happen with present hardware. Unless we are talking about things like Quake 3...
The same debate happened when AMD first announced FreeSync.
And we knew the outcome.
Posted on Reply
#78
R-T-B
CrackongThe same debate happened when AMD first announced FreeSync.
And we knew the outcome.
This is different. Adaptive sync was already in the queue for the VESA standard. It utilized technology tech people knew already existed (even blurbusters noted this in their gsync analysis).

You are asking AMD to do something their chips literally do not have the facilities to do. At best, they can achieve something akin to emulation.

It simply won't happen, because short of new hardware, it can't happen.
FordGT90ConceptThere's one RT core attached to each SM. An ideal raytracing ASIC would have one pool of shared memory with all raytracing compute units attached to it. They all have access to the same mesh information and light source information and they illuminate the meshes based on collisions.
That's because all the RT cores do is the magic denoising.
Posted on Reply
#79
FordGT90Concept
"I go fast!1!11!1!"
Then why do they call them RT cores? Denoising really has nothing to do with raytracing. It is just averaging pixels to cover up the lousy job their limited number of rays do.
Posted on Reply
#80
Crackong
R-T-BThis is different.
It is the same.
Back when AMD first announced FreeSync,
ppl speculated if it is possible without the expensive FPGA module embedded inside the monitor.
Turns out it is.

Nvidia themselves shown you RTRT is possible without ASIC acceleration.
Pascal architecture itself wasn't built with RTRT in mind yet can run RTRT with not bad results.
It is quite impressive to see in a ray traced game, a 1080ti could to 70% of fps compared to a 2060, with the first driver that enable Pascal to do this. (Poor optimization)

RT cores are dead weight in conventional Rasterization, but Cuda can do both.
Same goes to AMD.
ASIC for RTRT will soon be obsolete, just like the ASIC for PhysX.
Posted on Reply
#81
Xzibit
R-T-BThat's because all the RT cores do is the magic denoising.
Just saying
NvidiaThe RT Core includes two specialized units. The first unit does bounding box tests, and the second unit does ray-triangle intersection tests.
"Specialized" = Fixed function

To my knowledge Nvidia has never said outside of Optix AI that the denoiser is accelerated. It always references them as "Fast".

Denoisers are Filters and will vary as such.
CGOptiX may not be ideal for animation. For animation we suggest using V-Ray’s denoiser with cross-frame denoising.
Posted on Reply
#82
Slizzo
FordGT90ConceptIf this is the case, GCN using async compute should be able to do very well at DXR without modification.

I guess we'll find out when AMD debuts DXR support.
FordGT90Conceptthe GPU could properly async (like GCN can)
You do know that Turing is fully async compute capable, right? It can execute INT and FP at the same time, concurrently.
Posted on Reply
#83
FordGT90Concept
"I go fast!1!11!1!"
That's in a WARP. Async is filling the gaps between WARPs. Scheduler finds idle resources then applies async compute workloads to them. What NVIDIA does is switch WARP from graphics, to compute, then back again. It isn't capable of filling in idle resources like GCN does.

Turing in particular needs instructions for FP32 and FP16 simultaneously in each WARP or large swaths of transistors end up idling:
www.anandtech.com/show/13973/nvidia-gtx-1660-ti-review-feat-evga-xc-gaming/2
Posted on Reply
#84
londiste
R-T-BThat's because all the RT cores do is the magic denoising.
No, they do not. Tensor cores (!= RT cores) can do denoising.
CrackongBack when AMD first announced FreeSync,
ppl speculated if it is possible without the expensive FPGA module embedded inside the monitor.
Turns out it is.
Without FPGA, sure. Without specialized hardware - nope. Adaptive sync support is in the scaler. Monitor manufacturers use an ASIC for it and it is probably just a question of scale where they are able to order bigger batches of chips. That is also the reason why Adaptive sync support took as long as it did to become a common feature.
CrackongIt is quite impressive to see in a ray traced game, a 1080ti could to 70% of fps compared to a 2060, with the first driver that enable Pascal to do this. (Poor optimization)
I seriously doubt this. Nvidia has had Optix in professional space for years. Specifically DXR part is probably as well optimized as it can be at this point.
Posted on Reply
#85
Fiendish
FordGT90ConceptThat's in a WARP. Async is filling the gaps between WARPs. Scheduler finds idle resources then applies async compute workloads to them. What NVIDIA does is switch WARP from graphics, to compute, then back again. It isn't capable of filling in idle resources like GCN does.

Turing in particular needs instructions for FP32 and FP16 simultaneously in each WARP or large swaths of transistors end up idling:
www.anandtech.com/show/13973/nvidia-gtx-1660-ti-review-feat-evga-xc-gaming/2
Turing seems to be far better than Pascal when it comes to Async Compute, are you sure the deficiencies of Pascal in this area, apply to Turing overall?
Posted on Reply
#86
medi01
Crackongppl speculated if it is possible without the expensive FPGA module embedded inside the monitor.
I recall in anandtech comments a guy who claimed to work at AMD stating that GPU driving screen refreshing has been there in the notebook world for a while. I forgot the standard's name, but there was nothing to speculate about, it was literally there.
Posted on Reply
#87
Vayra86
W1zzardi'm not sure yet, i got lots of other things to test and nvidia has provided numbers anyway. it'll also be a ton of work to test all this.

i'm thinking about adding a rtx section with just one game (metro?) to future gpu reviews
I honestly wouldn't go down this rabbit hole. Metro is in no way representative of RTX capability. It just shows us what a global illumination pass does and costs. Any other game may implement more or less features and work out entirely different. It won't be that informative really. I think focused feature articles like you've done up until now are a much better way to do it.
Posted on Reply
#88
R-T-B
londisteNo, they do not. Tensor cores (!= RT cores) can do denoising.
Ah I got my terminology mixed up, thanks for the catch...
Posted on Reply
#89
W1zzard
Vayra86I honestly wouldn't go down this rabbit hole. Metro is in no way representative of RTX capability. It just shows us what a global illumination pass does and costs. Any other game may implement more or less features and work out entirely different. It won't be that informative really. I think focused feature articles like you've done up until now are a much better way to do it.
Not gonna stop with the per-game launch articles

Which game would you choose instead of Metro? and why?
Posted on Reply
#90
R-T-B
CrackongNvidia themselves shown you RTRT is possible without ASIC acceleration.
Yes, very slowly.
Posted on Reply
#91
Vayra86
W1zzardNot gonna stop with the per-game launch articles

Which game would you choose instead of Metro? and why?
I think its too early to choose a game as a 'benchmark' for RT performance yet. We've seen two meagre attempts that entered the dev cycle late in the development of both games, I don't think its going to be representative of anything that comes next.

The main problem is that its so very abstract. Nvidia uses performance levels without a clear distinction for RTX On. Everything is fluid here; they will always balance the amount of RT around the end-user performance requirements; the BF V patch was a good example of that. What really matters is the actual quality improvement and then the performance hit required to get there. This is not like saying 'MSAA x4 versus MSAA x8' or something like that. There is no linearity, no predictability and the featureset can be used to any extent.
Posted on Reply
#92
W1zzard
Vayra86I think its too early to choose a game as a 'benchmark' for RT performance yet
But isn't "Metro RTX" in reviews better than nothing? Once more titles come out I'm definitely open to changing the title when rebenching
Posted on Reply
#93
bug
CrackongIt is relevant.
This "RTX being sort of supported on Pascal cards" move has one purpose only : Try to convince ppl to "upgrade / side-grade" to RTX cards.
Then, what is the point to do this "upgrade / side-grade" when there are not even a handful of Ray-Tracing games out there ?

RTX based on DXR which is just one part of the DX12 API, RTX itself is not comparable to fully featured DX10 / DX11 / DX12 .
Compare it to Nvidia's last hardware-accelerated marketing gimmick a.k.a PhysX, tho.
It was the same thing again, hardware PhysX got 4 games back in 2008 and 7 in 2009, then hardware PhysX simply fade out and now open sourced.
Now we got 3 Ray Traced games in 7 months, sounds familiar ?
Again, apples and oranges. PhysX was built as a closed solution, whereas RTX is DXR (part of a pretty "standard" API) with sugar on top.
Please try to word your replies better. Or in case your only point was "Nvidia is trying to scam people out of their money" or "RTX will die because I don't like it as it is now", we've already had hundreds of posts about that.
Posted on Reply
#94
Vlada011
NVIDIA literary make fools from people. Are you aware how much quality parts for music, guns, etc could be bought for price of single RTX2080Ti and she become outdated for 3 years.
For 5 years is almost garbage. It's very rear in industry to find comparison with that to cost so much.
Extreme quality audio equipment as Amplifiers, Turntables, Speakers, etc could be bought for 1K to last decades.
Compare example some studio-club turntable from Technics who reach Anniversary of 50 years, indestructible 15kg monster with replaceable every single part on him could be bought for price of RTX2080 example, Extreme quality speakers. That's poor insanity where gaming industry lead us.
Turing was Turning point for me, moment when I figure out that from now I will play games 2-3 years after launch date, pay only one platform in generation, not two with same memory types and similar core architectures because that's literary throwing money.
You can buy 85 Vinyls or 150 CDs average for price of single custom build RTX2080Ti. That's abnormal.
You can buy Benelli M4 12ga Tactical Shutgun for RTX2080Ti... When you check little peace of plastic with chip ready to become outdated for 5 years, it's funny.

I prepared 700 euro money and waited RTX2080Ti with option to add 100-150 euro more even 200 for some premium models, I hoped maybe even K|NGP|N version.
I was not even close to buy 30% stronger card than GTX1080Ti. At the end I didn't had enough for even RTX2080 and need to add significant money.
Option was to buy second hand GTX1080Ti and rest of money to buy 1TB M.2 but I will remember this GeForce architecture as many of you will remember next generations and gave up from brand new cards. You can play nice with previous high end model with 1-2 years warranty second hand below 500$.

I knew in one moment when people become aware how much they pay for Intel processors and NVIDIA graphic cards they will change approach and than even significant drop of price will not
Now if NVIDIA drop price for 150$ they would change nothing significantly, they would sell more GPU, but not much as they expected.

PC gaming is if you can afford GTX1080, GTX1080Ti, RTX2080/2080Ti, eventually RTX2070.
But investing few hundreds for lower class of GPU... better PS4.
Posted on Reply
#95
Vayra86
W1zzardBut isn't "Metro RTX" in reviews better than nothing? Once more titles come out I'm definitely open to changing the title when rebenching
Better than nothing, yes, but its also riding the Nvidia marketing wave which I think is out of place for neutral reviewers.

We didn't test 'Hairworks games' or 'PhysX games' back when Nvidia pushed games with those technologies either. As it stands today, there is no practical difference apart from it being called 'DXR'. There was never a feature article for other proprietary tech either such as 'Turf Effects' or HBAO+, so in that sense the articles we get are already doing the new technology (and its potential) justice. In reviews you want to highlight a setup that applies for all hardware.

Its a pretty fundamental question really. If this turns out to be a dud, you will be seen as biased if you consistently push the RTX button. And if it becomes widely supported technology, the performance we see today has about zero bearing on it. The story changes when AMD or another competitor also brings a hardware solution for it.

Something to think about, but this is why I feel focused articles that really lay out the visual and performance differences in detail are actually worth something, but another RTX bar in the bar chart most certainly is not.
CrackongASIC for RTRT will soon be obsolete, just like the ASIC for PhysX.
The only caveat there in my opinion is perf/watt. The RT perf/watt of dedicated hardware is much stronger, and if you consider the TDP of the 2080ti, there isn't really enough headroom to do everything on CUDA.
Posted on Reply
#96
bug
Vlada011NVIDIA literary make fools from people. Are you aware how much quality parts for music, guns, etc could be bought for price of single RTX2080Ti and she become outdated for 3 years.
For 5 years is almost garbage. It's very rear in industry to find comparison with that to cost so much.
Extreme quality audio equipment as Amplifiers, Turntables, Speakers, etc could be bought for 1K to last decades.
Compare example some studio-club turntable from Technics who reach Anniversary of 50 years, indestructible 15kg monster with replaceable every single part on him could be bought for price of RTX2080 example, Extreme quality speakers. That's poor insanity where gaming industry lead us.
Turing was Turning point for me, moment when I figure out that from now I will play games 2-3 years after launch date, pay only one platform in generation, not two with same memory types and similar core architectures because that's literary throwing money.
You can buy 85 Vinyls or 150 CDs average for price of single custom build RTX2080Ti. That's abnormal.
You can buy Benelli M4 12ga Tactical Shutgun for RTX2080Ti... When you check little peace of plastic with chip ready to become outdated for 5 years, it's funny.
Using the same logic, I assume you'd never buy a BMW because of the 7-series or i8 pricing. And you won't buy a Mercedes either, because Maybach.
Posted on Reply
#97
Crackong
bugwe've already had hundreds of posts about that.
It is because people agreed that way.
Listen to the people.

Yes, "Nvidia is trying to scam people out of their money"
And yes, Nvidia RTX surely will die because how bad they ruined it, but DXR lives on.
Posted on Reply
#98
medi01
londisteMonitor manufacturers use an ASIC for it and it is probably just a question of scale where they are able to order bigger batches of chips. That is also the reason why Adaptive sync support took as long as it did to become a common feature.
BULLSHIT.

It was a cheap add-on feature that manufacturers producing upscaler chips included free of charge.
It took customer reluctance to pay 200+ premium per monitor, for nVidia branded version of it.
GSync branded notebooks didn't use it either, because, see above.

But once Gsync failed and FreeSync won, let's call the latter simply "Adaptive Sync" shall we, it downplays what AMD has accomplished and all.
bugI assume you'd never buy a BMW because of the 7-series or i8 pricing. And you won't buy a Mercedes either, because Maybach.
Bringing ultra-competitive low margin automotive into this is beyond ridiculous.
Posted on Reply
#99
londiste
medi01I recall in anandtech comments a guy who claimed to work at AMD stating that GPU driving screen refreshing has been there in the notebook world for a while. I forgot the standard's name, but there was nothing to speculate about, it was literally there.
eDP, Embedded Displayport.
medi01It was a cheap add-on feature that manufacturers producing upscaler chips included free of charge.
It took customer reluctance to pay 200+ premium per monitor, for nVidia branded version of it.
GSync branded notebooks didn't use it either, because, see above.
Yes, manufacturers producing upscaler chips did include the feature. The correct question is - when?
It took customer reluctance to pay 200+ premium per monitor sounds like free really wasn't why.
GSync module would not physically fit in a notebook. Besides, isn't using established standards exactly what is encouraged? :)

Wrong thread for this though.
Posted on Reply
#100
bug
medi01Bringing ultra-competitive low margin automotive into this is beyond ridiculous.
And reducing the whole Turing line to the ridiculously expensive 2080Ti isn't?
Posted on Reply
Add your own comment
Dec 18th, 2024 03:17 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts