NVIDIA announced the GeForce GTX 1070 Ti performance-segment graphics card last week; today, the reviews are going live. The GTX 1070 Ti is its latest (and probably final) implementation of "Pascal". It's been close to 18 months since the NVIDIA "Pascal" GPU architecture made its debut with the GeForce GTX 1080, back in May 2016. It enjoyed virtually zero competition from AMD for the most part, which took another 14 months to come up with something that could compete with the GTX 1080 and GTX 1070, with the RX Vega 64 and RX Vega 56, respectively. The enthusiast-segment GTX 1080 Ti and TITAN Xp remain unchallenged. NVIDIA may have erred in differentiating the GTX 1070 from its bigger sibling.
The GeForce GTX 1070 Ti is designed to fill the performance gap between the GTX 1070 and GTX 1080, but by being closer to the GTX 1080 than just halfway. This is probably necessary for it to outperform the RX Vega 56. While the GTX 1070 lacks a quarter of the 20 "Pascal" streaming multiprocessors (each worth 128 CUDA cores), the GTX 1070 Ti lacks just one. This takes its CUDA core count all the way up to 2,432, which is just 128 fewer than the 2,560 of the GTX 1080, a staggering 512 more than the 1,920 of the GTX 1070.
To not make the GTX 1070 Ti "too good," NVIDIA carried over the memory setup of the GTX 1070. You get 8 GB of older GDDR5 memory ticking at 8.00 GHz, which churns out 256 GB/s of memory bandwidth; in contrast to the newer 10 GHz GDDR5X memory on the GTX 1080 (320 GB/s) and faster 11 GHz memory on the GTX 1080 refresh (352 GB/s). The clock speeds are another interesting mix. The GTX 1070 Ti has the base clock of the GTX 1080, but the boost clock of the GTX 1070. So the GPU Boost multipliers are rather restrained. These, coupled with the inherently better energy efficiency of the "Pascal" architecture compared to AMD "Vega," make for an interesting answer by NVIDIA to AMD's latest challenge.
In this review, we're taking a look at the Colorful GeForce GTX 1070 Ti iGame Vulcan X TOP, which, besides its long name, comes with a reworked PCB design and extra large triple-slot, triple-fan GPU cooler. Colorful is mostly known in Asian countries - the price, according to Colorful, is $650, which is much higher than any other GTX 1070 Ti we've seen to date.
GeForce GTX 1070 Ti Market Segment Analysis
Price
Shader Units
ROPs
Core Clock
Boost Clock
Memory Clock
GPU
Transistors
Memory
GTX 980 Ti
$390
2816
96
1000 MHz
1075 MHz
1750 MHz
GM200
8000M
6 GB, GDDR5, 384-bit
R9 Fury X
$380
4096
64
1050 MHz
N/A
500 MHz
Fiji
8900M
4 GB, HBM, 4096-bit
GTX 1070
$400
1920
64
1506 MHz
1683 MHz
2002 MHz
GP104
7200M
8 GB, GDDR5, 256-bit
RX Vega 56
$400
3584
64
1156 MHz
1471 MHz
800 MHz
Vega 10
12500M
8 GB, HBM2, 2048-bit
GTX 1070 Ti
$450
2432
64
1607 MHz
1683 MHz
2000 MHz
GP104
7200M
8 GB, GDDR5, 256-bit
Colorful GTX 1070 Ti iGame Vulcan X TOP
$650
2432
64
1607 MHz
1683 MHz
2000 MHz
GP104
7200M
8 GB, GDDR5, 256-bit
GTX 1080
$500
2560
64
1607 MHz
1733 MHz
1251 MHz
GP104
7200M
8 GB, GDDR5X, 256-bit
RX Vega 64
$500
4096
64
1247 MHz
1546 MHz
953 MHz
Vega 10
12500M
8 GB, HBM2, 2048-bit
GTX 1080 Ti
$720
3584
88
1481 MHz
1582 MHz
1376 MHz
GP102
12000M
11 GB, GDDR5X, 352-bit
Architecture
The GeForce GTX 1070 Ti is based on NVIDIA's GP104, using the Pascal architecture. The biggest GPU is the GP100 driving the Tesla P100 HPC processor. The GP104 succeeds the GM204 (GTX 980 and GTX 970), and despite having a smaller die at 314 mm² when compared to the 398 mm² of the GM204, it does feature a significantly higher transistor count at 7.2 billion when compared to the 5.2 billion of the GM204. This is due to NVIDIA's big move to the 16 nm FinFET process.
With each successive architecture since "Fermi," NVIDIA has been enriching the streaming multiprocessor (SM) by adding more dedicated resources and reducing shared resources within the graphics processing cluster (GPC), which leads to big performance gains. The story continues with "Pascal." Like the GM204 before it, the GP104 features four GPCs, super-specialized subunits of the GPU that share the PCI-Express 3.0 x16 host interface and the 256-bit GDDR5X memory interface through eight controllers. These controllers support both GDDR5X and GDDR5 memory.
Workload across the four GPCs is shared by the GigaThread Engine cushioned by 2 MB of cache. Each GPC holds five streaming multiprocessors (SMs), which is an increase from the four SMs each GPC held on the GM204. On the GTX 1070 Ti, one of these 20 SMs is disabled. The GPC shares a raster engine between these five SMs. The "Pascal" streaming multiprocessor features a 4th generation PolyMorph Engine, a component for key render setup operations. With "Pascal," the PolyMorph Engine includes specialized hardware for the new Simultaneous MultiProjection feature. Each SM also holds a block of eight TMUs.
Each SM continues to feature 128 CUDA cores. With the GTX 1070 Ti featuring 19/20 SMs, it hence features a total of 2,432 CUDA cores. Other vital specifications include 152 TMUs and 64 ROPs. NVIDIA claims to have worked on a new GPU internal circuit design and board channel paths to facilitate significantly higher clock speeds than what the GM204 is capable of. The GeForce GTX 1070 Ti ships with a staggering 1607 MHz GPU clock speed for a maximum GPU Boost frequency of 1683 MHz.
As we mentioned earlier, the GeForce GTX 1070 Ti carries over the memory subsystem of the GTX 1070 untouched. It features 8 GB of GDDR5 memory across the chip's 256-bit wide memory interface, clocked at 8.00 GHz (GDDR5-effective), which works out to a memory bandwidth of 256 GB/s. This is significantly lower than the 320 GB/s of the GTX 1080 and the 352 GB/s of the GTX 1080 refresh featuring 11 Gbps GDDR5X memory.
The "Pascal" architecture supports Asynchronous Compute as standardized by Microsoft. It adds to that with its own variation of the concept with "Dynamic Load Balancing."
The New Age of Multi-GPU
With Microsoft DirectX 12 introducing a standardized mixed multi-GPU mode in which a DirectX 12 (and above) 3D app can take advantage of any number and types of GPUs as long as they support the API features needed by the app, multi-GPU has changed forever. Instead of steering its GPU lineup toward that future, NVIDIA has spent some R&D on its proprietary SLI technology. With increasing resolutions and refresh rates straining the bandwidth of display connectors and inter-GPU communication in multi-GPU modes, NVIDIA decided that SLI needs added bandwidth. One way it saw to doing so was to task both SLI contact points on the graphics card in a 2-way configuration. Enter the SLI HB (high-bandwidth) bridge, a rigid SLI bridge that comes in 1U, 2U, and 3U slot spacings for a link between two GeForce "Pascal" graphics cards along both their SLI "fingers" (contact points). This allows a SLI duo to more reliably render at such resolutions as 4K at 60 Hz or 120 Hz, and 5K, or HDR-enabled resolutions. SLI could still work with a classic 2-way bridge at any resolution, but that could adversely affect performance upscaling, and the output won't be as smooth as with an SLI HB bridge. This also appears to be why NVIDIA discontinued official support for 3-way and 4-way SLI.
The GTX 1070 Ti still supports 3-way and 4-way SLI over the classic bridges that come with motherboards, but only in a few benchmarks. NVIDIA's regular driver updates will only optimize for 2-way SLI. NVIDIA "Pascal" GPUs do support Microsoft DirectX 12's multi-display adapter (MDA) mode, but NVIDIA will not provide game-specific optimizations through driver updates for MDA. That would become the game developer's responsibility. The same applies to "explicit" LDA (linked display adapter).
New Display Connectors
The "Pascal" architecture features DisplayPort 1.4 even though it's only certified for up to DisplayPort 1.2. You can enjoy all the features of DisplayPort 1.3 and 1.4 just fine, such as HDR metadata transport. The GPU also supports HDMI 2.0b, the latest HDMI standard with support for HDR video. In the entire course of its presentation, NVIDIA did not mention whether "Pascal" supports VESA AdaptiveSync, which AMD is co-branding as FreeSync. All you need for it to work is a GPU that supports HDMI 2.0a or DisplayPort 1.2a (which are both satisfied by NVIDIA supporting HDMI 2.0b and DisplayPort 1.4). All that's needed is support on the driver's side. The GeForce GTX 1070 Ti features an HDMI 2.0b, a dual-link DVI-D, and three DisplayPort 1.4 connectors. The DVI connector lacks analog wiring, and, thus, the GTX 1070 Ti lacks support for D-Sub monitors through dongles.
Fast Sync
With each new architecture over the past three generations, NVIDIA toyed with display sync. With "Kepler," it introduced Adaptive V-Sync, by the time "Maxwell" came along, you had G-SYNC, and with "Pascal," the company is introducing a new feature called Fast Sync. NVIDIA states Fast Sync to be a low-latency alternative to V-Sync that eliminates frame-tearing (normally caused because the GPU's output frame-rate is above the display's refresh-rate) while letting the GPU render unrestrained from V-Sync, which reduces input latency. This works by decoupling the display pipelines and render output, which makes temporarily storing excessive frames that have been rendered in the frame buffer possible. The result is an experience with low input-lag (from V-Sync "off") and no frame-tearing (from V-Sync "on"). You will be able to enable Fast Sync for a 3D app by editing its profile in NVIDIA Control Panel; simply force Vertical Sync mode to "Fast."