The RT Core
After reading the previous pages, you should have a good understanding how ray tracing works and what its biggest challenges are, especially when performance is a focus.
To address these, NVIDIA has engineered a new kind of GPU component called "RT Core". Each SM in all Turing GPUs has one RT core, which is a fixed function circuit to speed up the two most important operations in ray tracing: Bounding Volume Hierarchy Traversal and ray-to-triangle intersection testing.
We previously described how BVH works. Instead of running BVH in CUDA cores, the SM can now send the RT Core a command "ray probe", and the RT Core independently runs the BVH traversal without putting any load on the CUDA cores. When a hit is found during traversal, the RT Core will automatically drill down through all the bounding boxes until it reaches the triangle the ray intersects with; a ray miss can also be returned as a result. This data is passed on to the SM, which will then execute a piece of game code that's similar to a shader, which decides what do with the result, i.e., whether additional shadow or reflection rays should be cast.
That's all there is to RTX technology in hardware: a fully programmable BVH traversal and triangle intersection unit. All further details are left to the developer, which gives this amazing flexibility because it's not just limited to classic ray tracing (shoot rays out through the screen into the scene), but can be used for any kind of geometry task that involves following a straight line while computing intersections with lots of geometry. One example is software in the oil and gas industry, which uses similar algorithms for exploration and simulation.
The RT Cores are just the foundation; the software is also very important. RTX is supported by NVIDIA OptiX, Microsoft DirectX for Raytracing, and Vulkan RT. All these APIs are similar in that they provide a framework for building ray-traced applications using a workflow that's not unsimilar to DirectX game programming. Simplified: the game engine provides its object data, generates ray directions and starting points, and as result gets back info on ray hits. Everything else, like memory management, GPU scheduling, shader dispatch, object traversal, and hardware optimizations, are handled by the library, which reduces the workload for developers immensely.
It needs to be clear, though, that "RTX enabled" will not magically switch a game from rendering with rasterization to a fully ray-traced game. Nearly all of the pixels you see on screen are still generated with rasterization. RTX simply provides a novel approach to create certain effects with a new kind of hardware acceleration, or create certain effects that were too computationally expensive to do in traditional rendering pipelines. Game developers pick out specific effects they want to create (shadows, area lighting, ambient occlusion, reflections) and implement those on a very small subset of the whole scene, using very few rays, that then have the voids between rays filled out with NVIDIA's denoiser. This is not full ray tracing, and certainly not Hollywood's photo-realistic path tracing. It is an extremely smart compromise that puts real-time ray tracing in the hand of developers with today's hardware.