At the heart of the GeForce GTX 980 Ti is the 28 nm GM200 silicon. On paper, it is quite an engineering feat because of its gargantuan 8 billion transistor count and 601 mm² large die on the existing 28 nm process that appeared to have reached its thermal boundaries with the previous-generation NVIDIA GK110 and AMD "Hawaii".
The GM200 is based on the "Maxwell" architecture. It features the same component hierarchy as the GM204, but is a 50% upscale in every respect. It features six graphics processing clusters (GPCs) as opposed to the four on the GM204, which makes for 3,072 CUDA cores, a 50% wider 384-bit memory bus, and a 50% larger 3 MB L2 cache over the GM204. In the GeForce GTX 980 Ti, NVIDIA disabled 2 of the 24 streaming multiprocessors (SMMs) on the silicon, which results in 2,816 CUDA cores. At 176, the texture-memory unit (TMU) count is lower. The ROP count is 96. The card features 6 GB of memory, half that of the GTX Titan X, but at 288 GB/s, the memory bandwidth is the same. The GPU can address the entire 6 GB of memory at a consistent speed.
The GM200 features 900 million more transistors than its predecessor, the GK110, although in its GTX TITAN X avatar, it features the same 250W TDP rating. That's both impressive and unnerving. The GM204, despite its 5.2 billion transistors, was rated at 165W TDP on the GTX 980, indicating that with Maxwell, NVIDIA may have finally reached the thermal limits of the 28 nm process.
At the heart of the Maxwell architecture is a redesigned streaming multiprocessor (SMM), the tertiary subunit of the GPU. The chip begins with a PCI-Express 3.0 x16 bus interface, a 384-bit wide GDDR5 memory interface, and a display controller that supports as many as three Ultra HD displays, or five physical displays in total. This display controller introduces support for HDMI 2.0, which has enough bandwidth to drive Ultra HD displays at 60 Hz refresh rates. The controller is ready for 5K (5120x2880, four times the pixels as QuadHD), and the 384-bit wide memory interface holds 6 GB of memory.
The GigaThread Engine splits workloads between four graphics processing clusters (GPCs). The L2 cache cushion transfers between these GPCs. Each GPC holds four streaming multiprocessors (SMMs) and a common raster engine between them. Each SMX holds a third-generation PolyMorph Engine, a component that performs a host of rendering tasks, such as fetch, transform, setup, tessellation, and output. The SMX has 128 CUDA cores, the number-crunching components of NVIDIA GPUs, spread across four subdivisions with dedicated warp-schedulers, registers, and caches. NVIDIA claims the SMM to have two times the performance-per-watt figure of "Kepler" SMX units.