NVIDIA to Enable DXR Ray Tracing on GTX (10- and 16-series) GPUs in April Drivers Update

londiste · Mar 28, 2019

Midland Dog said:
16 bit precision

That shape is definitely not from 16-bit precision.
Still waiting for CryTek to provide all the details but it is highly likely Vegas 16-bit precision is used for RT.

FordGT90Concept · Mar 28, 2019

Makes sense, double the flops for slight loss in accuracy. Pretty sure Radeon Rays has 16-bit precision options.

Edit: https://github.com/GPUOpen-Librarie...adeonRays/src/intersector/intersector_lds.cpp

Code:

if (spec.has_fp16)
  m_gpudata->qbvh_prog.executable = m_device->CompileExecutable("../RadeonRays/src/kernels/CL/intersect_bvh2_lds_fp16.cl", headers, numheaders, buildopts.c_str());

Yup, RadeonRays SDK has FP16 checks.

HwGeek · Mar 28, 2019

Yes- I made a comment few post back- they just added BVH GPU acceleration and FP16 support for RadeonRays 3.0 last week.
https://www.techpowerup.com/forums/...ril-drivers-update.253759/page-4#post-4018035

Also- how FP16 impact on performance on Polaris? it offers 1:1 Ratio- but can FP16 help in better utilizing memory Bandwidth vs INT32/FP32?

FordGT90Concept · Mar 28, 2019

The FP16 code paths wouldn't be taken on Polaris at all. They aren't something the GPU understands.

HwGeek · Mar 28, 2019

What about FP8/INT8? can they used somehow in RTRT?

FordGT90Concept · Mar 28, 2019

No, DXR would likely be implemented using FP32 if AMD were to add backwards compatible support for it. I assume that it would just be noisier (fewer rays/bounces).

Vlada011 · Mar 30, 2019

Than I didn't wrong because paid 580 euro for GTX1080Ti Poseidon before 5 months???
Peformance of RTX2080 FE, 11GB, option for water cooling (not full cover waterblock) but at least for that money is included cooler comparable with AIO kits worth over 100$.
And option to install full custom waterblock for ASUS Strix when price drop to around 70 euro.

We can say different things for AMD, but they help to everyone with their research and trying to offer for lower price.
Intel and NVIDIA alone would massacred us with their politic and prices without AMD.

First impression after GeForce users switch to high end Radeon graphic card will be... I feel like picture quality is little better, like sharper cleaner more photographic image.
And in future if NVIDIA continue to ask 1000$ for high end GPU and AMD offer 10% weaker for 750$, I'm ready to consider combination Intel-Radeon.

jabbadap · Mar 30, 2019

FordGT90Concept said:
No, DXR would likely be implemented using FP32 if AMD were to add backwards compatible support for it. I assume that it would just be noisier (fewer rays/bounces).

Well there might be some use of int cores with in it. Not necessary as low precision as int8 is but that exodus frame pic shows quite hefty use of int32 math during RT(Possibly just filtering/denosiong but anyhow). I think the possibilty to do fp32 to and int32 math concurrently will give one way to do it hard way. Volta and Turings can do it that way and Pascal obviously can't. Which bodes the question: how is GCN, can it run fp32 and int32 math concurrently?

Maximuspop · Apr 1, 2019

MAXLD said:
That's the point of this move. Making the "poor" look not so poor in comparison.

"- Boys, let's release this RTX we've been working on for a few years, we'll have it exclusively in the new cards, so we'll sell them for premium.
- Boss, it's not that great yet, our new hardware still isn't that capable of fast and proper implementation.
- Just do it, it will be the first hardware Ray Tracing bling-bling ever, it's a big deal. We'll get it working in a couple AAA games and people will jump into it.
(... few months later...)
- Boss, people aren't joining the RTX bandwagon... and they aren't really swapping Pascal for Turing.
- Well then execute plan B: unlock Ray Tracing for old Pascal.
- But boss, those have no Ray Tracing focused hardware, it will run tons even worse.
- Exactly, we'll make them feel that Pascal is ancient crap, and then they'll want to finally swap them for RTXs. At the same time we'll spread the name even more.
(... weeks later...)
- Boss, still not a big interest in RTX 2000 cards. What now?
- Fine, release the RTX 3000 series with proper improved RTX performance, we'll make RTX 2000 look like ancient crap in comparison, and RTX 3000 look like the second coming of baby Jesus."

Just like they said when they presented RTX, bringing a card to the market that can do Ray Tracing this "fast" (compared to before) is quite an achievement. Problem is: it's still not fast enough.

"- People, we made the impossible: a card that can finally do the legendary tech that is "Ray Tracing"! Behold!
- Cool!
(...) Ok, nevermind, it runs slow. And I can live without it for now, can barely see the difference anyway.
- No, you don't understand. This is dope engineering. If it wasn't for this new hardware, it would be a slideshow with your current card.
- Y, but it's still slow. Not appealing.
- Look, we'll show you. Let's test with your current card.
- Dude, please..."

SoNic67 · Apr 2, 2019

jabbadap said:
Volta and Turings can do it that way and Pascal obviously can't. Which bodes the question: how is GCN, can it run fp32 and int32 math concurrently?

I doubt that Turing can do it 100% either, even if they say "independent integer execution units", I think it's like Hyperthreading in CPU's. I might be wrong though...

jabbadap · Apr 2, 2019

SoNic67 said:
I doubt that Turing can do it 100% either, even if they say "independent integer execution units", I think it's like Hyperthreading in CPU's. I might be wrong though...

Well it can do fp32 at full throttle and concurrently run some int operations with it's separate int cores. It's actually all about keeping gpu busy doing floating point math.

On Turings whitepaper they say even on current games they could get some performance benefit only from that. And is the one of the main reasons why Turing gets more performance out of TFlops compared to Pascal(The second being new shared cache).

SoNic67 · Apr 2, 2019

Yes, I saw that. This is the part reminded me of hyper-threading concept on CPU's:

First, the Turing SM adds a new independent integer datapath that can execute instructions concurrently with the floating-point math datapath.... This translates to 2x more bandwidth and more than 2x more capacity available for L1 cache for common workloads.

Is not like they have added extra integer units. To me it looks like they can now issue those INT instructions in the same time with FP32 ones.

Later this gets a little more... confusing.

Two SMs are included per TPC, and each SM has a total of 64 FP32 Cores and 64 INT32 Cores. In comparison, the Pascal GP10x GPUs have one SM per TPC and 128 FP32 Cores per SM. The Turing SM supports concurrent execution of FP32 and INT32 operations (more details below), independent thread scheduling similar to the Volta GV100 GPU.

So in the first sentence they say that those are separate cores. But the numbers (64FP+64INT) add to the same number like on Pascal (128FP). And the last sentence talks again specifically of "thread scheduling"...
I don't think there would be a physical difference between an INT32 and a FP32 core. It's only the schedulers that make that difference.

Overall, the changes in SM enable Turing to achieve 50% improvement in delivered performance per CUDA core.

This is in line with the hyper-threading gains on CPU's.

jabbadap · Apr 2, 2019

SoNic67 said:
Yes, I saw that. This is the part reminded me of hyper-threading concept on CPU's:

Is not like they have added extra integer units. To me it looks like they can now issue those INT instructions in the same time with FP32 ones.

Later this gets a little more... confusing.

So in the first sentence they say that those are separate cores. But the numbers (64FP+64INT) add to the same number like on Pascal (128FP). And the last sentence talks again specifically of "thread scheduling"...

As it says Turing TPC has _two_ of those (64int+64fp) SMs, Pascal TPC have _one_ 128FP SM. On the one more confusing note: according to Nvidia Turings without tensor cores have separate fp16 cores.

SoNic67 · Apr 2, 2019

Hmm...
GP102-450-A1 has 3840 cores and 30 Streaming Multiprocessors. 3840/30=128
GP104-410-A1 has 2560 / 20 = 128
TU102-400-A1 has 4608 cores and 72 Streaming Multiprocessors. 4608/72=64
TU104-400-A1 has 2944 / 46 = 64

To me looks like definitelly they broke those SM in two and they can issue either INT32 or FP32 commands to them, on independent paths. Which is consistent with approx 50% gains.

Processor	Ryzen 7800X3D
Motherboard	ROG STRIX B650E-F GAMING WIFI
Memory	2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s)	INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage	2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s)	42" LG C2 OLED, 27" ASUS PG279Q
Case	Thermaltake Core P5
Power Supply	Fractal Design Ion+ Platinum 760W
Mouse	Corsair Dark Core RGB Pro SE
Keyboard	Corsair K100 RGB
VR HMD	HTC Vive Cosmos

System Name	BY-2021
Processor	AMD Ryzen 7 5800X (65w eco profile)
Motherboard	MSI B550 Gaming Plus
Cooling	Scythe Mugen (rev 5)
Memory	2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s)	AMD Radeon RX 7900 XT
Storage	Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s)	Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case	Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s)	Realtek ALC1150, Micca OriGen+
Power Supply	Enermax Platimax 850w
Mouse	Nixeus REVEL-X
Keyboard	Tesoro Excalibur
Software	Windows 10 Home 64-bit
Benchmark Scores	Faster than the tortoise; slower than the hare.

System Name	BY-2021
Processor	AMD Ryzen 7 5800X (65w eco profile)
Motherboard	MSI B550 Gaming Plus
Cooling	Scythe Mugen (rev 5)
Memory	2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s)	AMD Radeon RX 7900 XT
Storage	Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s)	Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case	Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s)	Realtek ALC1150, Micca OriGen+
Power Supply	Enermax Platimax 850w
Mouse	Nixeus REVEL-X
Keyboard	Tesoro Excalibur
Software	Windows 10 Home 64-bit
Benchmark Scores	Faster than the tortoise; slower than the hare.

System Name	BY-2021
Processor	AMD Ryzen 7 5800X (65w eco profile)
Motherboard	MSI B550 Gaming Plus
Cooling	Scythe Mugen (rev 5)
Memory	2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s)	AMD Radeon RX 7900 XT
Storage	Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s)	Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case	Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s)	Realtek ALC1150, Micca OriGen+
Power Supply	Enermax Platimax 850w
Mouse	Nixeus REVEL-X
Keyboard	Tesoro Excalibur
Software	Windows 10 Home 64-bit
Benchmark Scores	Faster than the tortoise; slower than the hare.

System Name	Intel® X99 Wellsburg
Processor	Intel® Core™ i7-5820K - 4.5GHz
Motherboard	ASUS Rampage V E10 (1801)
Cooling	EK RGB Monoblock + EK XRES D5 Revo Glass PWM
Memory	CMD16GX4M4A2666C15
Video Card(s)	ASUS GTX1080Ti Poseidon
Storage	Samsung 970 EVO PLUS 1TB /850 EVO 1TB / WD Black 2TB
Display(s)	Samsung P2450H
Case	Lian Li PC-O11 WXC
Audio Device(s)	CREATIVE Sound Blaster ZxR
Power Supply	EVGA 1200 P2 Platinum
Mouse	Logitech G900 / SS QCK
Keyboard	Deck 87 Francium Pro
Software	Windows 10 Pro x64

NVIDIA to Enable DXR Ray Tracing on GTX (10- and 16-series) GPUs in April Drivers Update

londiste

FordGT90Concept

"I go fast!1!11!1!"

HwGeek

FordGT90Concept

"I go fast!1!11!1!"

HwGeek

FordGT90Concept

"I go fast!1!11!1!"

Vlada011

jabbadap

Maximuspop

SoNic67

jabbadap

SoNic67

jabbadap

SoNic67