NVIDIA today launched its GeForce RTX 20-series graphics card family based on its ambitious new "Turing" architecture. Launched 18 months from "Pascal," Turing comes at a time when advancements in silicon fabrication node technology are unable to keep pace with roadmaps of major chipmakers who traditionally brought out a new architecture based on a new process every 18–24 months. In an ideal world, we should have gone sub-10 nm already, which NVIDIA would have leveraged to bring the "Volta" architecture to the consumer-space for another serving of "more of everything." The "Turing" architecture packs a collection of innovations that were needed to build a new GPU on existing silicon fab processes.
At the heart of NVIDIA's effort is the RTX Technology, which brings what looks like real-time ray tracing to 3D games. Not everything on your screen is ray-traced, but some of the objects are; and so, a hybrid of ray tracing and classic rasterization makes up what you see.
To ray trace even those few things on your screen, an enormous amount of compute power is needed and so, NVIDIA created specialized hardware for the task in the form of RT cores, which sit besides the all-purpose CUDA cores. The Tensor cores, which made their debut with "Volta," also feature here, lending a hand with deep-learning and AI tasks, including a few turnkey features game developers can integrate. The new architecture also keeps up with generational gains in memory bandwidth with the new GDDR6 memory standard. The display I/O is revamped with support for the latest DisplayPort and HDMI standards, and a revolutionary new connector called VirtualLink.
In its long list of firsts, "Turing" also sees NVIDIA debut the architecture not with two SKUs based on the second-biggest chip (e.g.: GTX 1080 and GTX 1070), but the flagship SKU based on the "big chip," along with the top SKU based on the second-biggest chip, with the introduction of the new GeForce RTX 2080 Ti and GeForce RTX 2080. The GeForce RTX 2070 is also on the horizon, but isn't launching today. NVIDIA is saving its launch for next month.
The GeForce RTX 20-series is launching at unusually high prices, with generational price increments ranging between 15%–70%. NVIDIA's justification is that these cards are "more than GeForce GTX," and has made a few tweaks to its product stack. The RTX 2070, which starts at $500, is the cheapest SKU for now, followed by the RTX 2080 at $700 and the flagship RTX 2080 Ti at $1000, at least.
These prices don't apply to "reference design" cards, which don't quite exist. Cards that are completely designed by NVIDIA are referred to as "Founders Edition," which not only sell with a premium product design, but higher-than-reference clock speeds, to justify 10%–15% premiums.
In this review, we take a look at the MSI GeForce RTX 2080 Ti Gaming X Trio, the company's flagship RTX 2080 Ti offering for now. MSI has thrown everything but the kitchen sink into this card, including a powerful 2nd generation Tri-Frozr cooling solution with three fans that each have independent fan control via the MSI Gaming app, RGB LED logo lighting that speaks Mystic Light, and one of the highest factory-overclocked speeds for the RTX 2080 Ti that we've seen. The GPU core ticks at 1350 MHz, with maximum GPU Boost set at 1755 MHz, which is a 14 percent overclock straight off the bat. The memory is left untouched at 14 Gbps. A strong custom VRM design keeps this monstrosity well fed.
On the 14th of September, we published a comprehensive NVIDIA "Turing" architecture deep-dive article including coverage of its three new silicon implementations and the new RTX Technology. Be sure to catch that article for more technical details.
The "Turing" architecture caught many of us by surprise because it wasn't visible on GPU architecture roadmaps until a few quarters ago. NVIDIA took this roadmap detour over carving out client-segment variants of "Volta" as it realized it had achieved sufficient compute power to bring its ambitious RTX Technology to the client segment. NVIDIA RTX is an all-encompassing, real-time ray-tracing model for consumer graphics, which seeks to bring a semblance of real-time ray tracing to 3D games.
To enable RTX, NVIDIA has developed an all new hardware component that sits next to CUDA cores, called the RT core. An RT core is a fixed-function hardware that does what the spiritual ancestor of RTX, NVIDIA OptiX, did over CUDA cores. You input the mathematical representation of a ray and it will transverse the scene to calculate the point of intersection with any triangle in the scene. This is a computationally heavy task that would have otherwise bogged down the CUDA cores.
The other major introduction is the Tensor Core, which made its debut with the "Volta" architecture. These too are specialized components tasked with 3x3x3 matrix multiplication, which speed up AI deep-learning neural net building and training. Its relevance to gaming is limited at this time, but NVIDIA is introducing a few AI-accelerated image-quality enhancements that could leverage Tensor operations.
The component hierarchy of a "Turing" GPU isn't much different from its predecessors, but the new-generation Streaming Multiprocessor is significantly different. It packs 64 CUDA cores, 8 Tensor Cores, and a single RT core.
TU102 Graphics Processor
The TU102 is the largest silicon based on the "Turing" architecture and powers the GeForce RTX 2080 Ti. It's also the biggest GPU die NVIDIA ever built, with over 18.6 billion transistors sitting on a 775 mm² die that has been fabricated on the 12 nanometer process by TSMC. As we mentioned earlier, the essential component hierarchy on the "Turing" architecture hasn't changed. What has changed, however, is that the Streaming Multiprocessor (SM), the indivisible sub-unit of the GPU, now packs CUDA cores, RT cores, and Tensor cores, orchestrated by a new Warp Scheduler that supports concurrent INT and FP32 ops, which should improve the GPU's asynchronous compute performance.
At the topmost level, the GPU takes host connectivity from PCI-Express 3.0 x16, an NVLink interface, and connects to GDDR6 memory across a 384-bit wide memory bus. On the RTX 2080 Ti, the memory interface is narrowed to 352-bit and wired to 11 GB of memory. The GigaThread engine marshals load between six GPCs (graphics processing clusters). Each GPC has a dedicated raster engine and six TPCs (texture processing clusters). A TPC shares a PolyMorph engine between two SMs. Each SM packs 64 CUDA cores, 8 Tensor cores, and an RT core.
There are, hence, 768 CUDA cores, 96 Tensor cores, and 12 RT cores per GPC, and a grand total of 4,608 CUDA cores, 576 Tensor cores, and 72 RT cores across the TU102 silicon. The GeForce RTX 2080 Ti is carved out of the TU102 by disabling four SMs, resulting in 4,352 CUDA cores, 544 Tensor cores, and 68 RT cores. The GPU is endowed with 272 TMUs and 96 ROPs.
As we mentioned, the memory bus is narrowed down slightly to 352-bit, which holds 11 GB of GDDR6 memory, clocked at 14 Gbps, resulting in a memory bandwidth of 616 GB/s.
Features
Again, we highly recommend you read our article from the 14th of September for intricate technical details about the "Turing" architecture feature set, which we are going to briefly summarize here.
NVIDIA RTX is a brave new feature that has triggered a leap in GPU compute power, just like other killer real-time consumer graphics features, such as anti-aliasing, programmable shading, and tessellation. It provides a programming model for 3D scenes with ray-traced elements that improve realism. RTX introduces several turnkey effects that game developers can implement with specific sections of their 3D scenes, rather than ray-tracing everything on the screen (we're not quite there yet). A plethora of next-generation GameWorks effects could leverage RTX.
Perhaps more relevant architectural features to gamers come in the form of improvements to the GPU's shaders. In addition to concurrent INT and FP32 operations in the SM, "Turing" introduces Mesh Shading, Variable Rate Shading, Content-Adaptive Shading, Motion-Adaptive Shading, Texture-Space Shading, and Foveated Rendering.
Deep Learning Anti-Aliasing (DLSS) is an ingenious new post-processing AA method that leverages deep-neural networks built ad-hoc with the purpose of guessing how an image could look upscaled. DNNs are built on-chip, accelerated by Tensor cores. Ground-truth data on how objects in most common games should ideally look upscaled are fed via driver updates, or GeForce Experience. The DNN then uses this ground-truth data to reconstruct detail in 3D objects. 2x DLSS image quality is comparable to 64x "classic" super sampling.