And you are basing that on what? There's nothing fake about the current raytracing implementation. It is and always was about the resolution. Just like old gen graphics were starting at 320x240, going through 640x480 all the way up to the 4k we have now, raytracing is going through that same path. It's about how many rays per pixel you can cast. Essentially you get a low res, high noise picture, which is the basis for GI, reflections or shadows. There's nothing fake about it, you're just dealing with the lack of data and noise, just like the low resolutions in the old times of gaming. Newer gens of cards will have more power, will be able to cast more rays per pixel, improving the "resolution", the actual quality of the raytraced output. Raytracing can produce photorealistic output if you don't need real time output. That means you can cast hundreds of rays per pixel and wait for it to be computed. Metro Exodus was if I remember correctly 1 ray per pixel due to their checkerboarding approach. Denoising makes that into something useful. Even such a small sample rate is already noticeably better that traditional rasterization. Now imagine 4 rays per pixel. That's gonna be a massive improvement.
Basing that on the example I specifically singled out, because it lets you mess around with settings and turn off the fakery to see what's really going on under the hood.
Raytracing a scene fully on my 2060S at native resolution still takes 20 seconds to get a single, decent-quality frame, so there are two main tricks used to generate a frame in the fraction of a second required for a single convicing frame:
- Temporal denoiser + blur
This is based on previous frame data, so with the textures turned off and the only image you're seeing is what's raytraced. Top image was taken within a few frames of me moving the camera, bottom image was the desired final result that took 3-5 seconds to 'fade' in as the temporal denoiser had more previous frames to work from. Since you are usually moving when you're actually playing a game, the typical image quality of the entire experience is this 'dark smear', laggy, splotchy mess that visibly runs at a fraction of your framerate. It's genuinely amazing how close to a useful image it's generating in under half a second, but we're still a couple of orders of magnitude too slow to replace baked shadowmaps for full GI.
- Resolution hacks and intelligent sampling zones to draw you eye to shiny things at the cost of detail accuracy (think of it as a crude VRS for DXR)
Here's an image from the same room, zoomed a lot, and the part of the image I took it from for reference:
A - rendered at 1/4 resolution
B - tranparency, this is a reflection on water, old-school 1995 DirectX 3.0 dither hack rather than real transparency calculations
C - the actual resolution of traced rays - each bright dot in region C is a ray that has been traced in just 4-bit chroma and all the dark space is essentially guesswork/temporal patterns tiled and rotated based on the frequency of those ray hits. If you go and look at a poorly-lit corner of the room you can clearly see the repeated tiling of these 'best guess' dot patterns and they have nothing to do with the noisier, more random bright specs that are the individual ray samples.
So, combine those two things together. Firstly we have very low ray density that is used as a basis for region definitions that can then be approximated per frame using a library of tile-based approximations that aren't real raytracing, just more fakery that's stamped out as a best guess based on the very low ray coverage for that geometry region. If I was going to pick a rough ballpark figure, I'd probably say that 3% of the frame data in that last image is raytraced samples and 97% of it is faked interpolation between regions and potato-stamped to fill in the gaps with an approximation. This works fine as long as you just want an approximation, because the human brain does great work in filling in the gaps, especially when it's all in motion. Anyway, once it's tile-stamped a best-guess frame together out of those few ray samples, each of those barely-raytraced frames are blurred together in a buffer over the course of several hundred frames. There will be visual artifacts like in my first point anywhere you have new data on screen, because temporal filtering of on-screen data only means that anything that has appeared from offscreen is a very low-resolution, mostly fake mess for the first few dozen frames.
Don't get me wrong, QuakeII RTX is a technological marvel - it's truly incredible how close to a realtime raytraced game we can get with so many hacks and fakery to spread that absolutely minimal, almost insignificant amount of true raytracing around. Focus on the bits that matter, do it at a fraction of the game resolution and only in areas that are visibly detailed. Blur the hell out of the rest using tens of previous frames and a library of pre-baked ray tiles to approximate a raytraced result until you have hundreds of frames of data to actually use for real result.
We're just not at a level where we can afford to do it at full resolution, for the whole screen at once, and for regions offscreen so that movement doesn't introduce weird visual artifacts. 10x faster than a 2080Ti might get us those constraints, and another couple of orders magnitude might allow us to bring the temporal filter down from a hundred frames for a useful image, to single digit numbers of frames. It's still not realtime, but if people can run games at 100fps, 25fps raytraced data with temporal interpolation should be very hard to notice.
So yeah, real raytracing is going to need 1000x more power than a 2080Ti, but even with what we have right now, it's enough to get the ball rolling if you don't look too closely and hide the raytracing between lots of shader-based lies too. Let's face it, shader based lies get us 90% there for almost free, and if the limited raytracing can get us 95% of the way there without hurting performance, people are going to be happy that there's a noticeable improvement without really caring about how it happened - they'll just see DXR on/off side by side and go "yeah, DXR looks nicer".