NVIDIA today released the GeForce GTX 1660 Ti and with it splits its client-segment discrete graphics lineup into the GeForce GTX series and GeForce RTX series. The RTX 20-series starts at the $350-mark with the RTX 2060, while models below it are relegated to the GTX brand. The best part? Both are based on NVIDIA's latest 12 nm "Turing" architecture. What sets the two apart is right in the name—RTX real-time raytracing technology.
NVIDIA probably figured that getting RTX to work even at 1080p requires a minimum number of RT cores and CUDA core horsepower, which cannot be scaled down beyond a certain point because enabling RTX features already exacts a roughly 30 percent performance tax, and NVIDIA wouldn't want $200–$300 graphics cards being unable to play RTX-enabled games at 1080p at acceptable frame rates. The RTX 2060 appears to be positioned on that limit. In games without raytracing, the RTX 2060 has enough muscle for 1440p resolution, but on games with RTX-enabled, playability swings halfway between 1080p and 1440p.
The easiest way out of this problem for NVIDIA would be to not bother with RTX below the $350-mark and instead focus on making the GPU as cost-efficient as possible. With RTX out of the way, NVIDIA could physically remove RT cores that add billions of transistors to the silicon, making the chips smaller. Interestingly, NVIDIA also decided to axe tensor cores, specialized hardware that accelerate deep-learning neural net building and training, shedding even more transistor load. The remaining CUDA cores are very much from the "Turing" architecture and benefit from the increased IPC and higher clock-speed headroom obtained with the switch to 12 nm. The largest such GTX Turing chip is the new "TU116."
The GeForce GTX 1660 Ti is the largest implementation of the TU116 and is being offered at US$279, which is about $60 higher than what the GTX 1060 6 GB "Pascal" is being sold at. In that sense, it's not a successor. It's endowed with 1,536 "Turing" CUDA cores, 96 TMUs, 48 ROPs, and a 192-bit wide memory interface, but the memory is 50% faster. NVIDIA is using 12 Gbps GDDR6 memory, which belts out 288 GB/s of bandwidth. The memory amount is still 6 GB.
With this endeavor, NVIDIA is targeting two very distinct classes of PC gamers across its lineup: the GTX "Turing" series products, such as the GTX 1660 Ti, are intended for gamers who play online multiplayer titles such as "Anthem," "Fortnite," or even "Battlefield V" with its eye-candy dialed down in favor of responsiveness and agility, while the RTX 20-series is targeted at people who play AAA games rich in eye-candy, real-time raytracing, and resolutions between 1440p and 4K.
In this review, we're testing the EVGA GTX 1660 Ti XC Black, which is a cost-optimized variant that targets the MSRP price point of $279. Unlike other vendors, EVGA only uses a single fan on their card and includes no backplate. The card doesn't come with an overclock out of the box, but has its power limit slightly increased, which should result in some extra performance.
GeForce GTX 1660 Ti Market Segment Analysis
Price
Shader Units
ROPs
Core Clock
Boost Clock
Memory Clock
GPU
Transistors
Memory
RX 570
$150
2048
32
1168 MHz
1244 MHz
1750 MHz
Ellesmere
5700M
4 GB, GDDR5, 256-bit
RX 580
$185
2304
32
1257 MHz
1340 MHz
2000 MHz
Ellesmere
5700M
8 GB, GDDR5, 256-bit
GTX 1060 3 GB
$185
1152
48
1506 MHz
1708 MHz
2002 MHz
GP106
4400M
3 GB, GDDR5, 192-bit
GTX 1060
$200
1280
48
1506 MHz
1708 MHz
2002 MHz
GP106
4400M
6 GB, GDDR5, 192-bit
RX 590
$260
2304
32
1469 MHz
1545 MHz
2000 MHz
Polaris 30
5700M
8 GB, GDDR5, 256-bit
GTX 1070
$310
1920
64
1506 MHz
1683 MHz
2002 MHz
GP104
7200M
8 GB, GDDR5, 256-bit
RX Vega 56
$370
3584
64
1156 MHz
1471 MHz
800 MHz
Vega 10
12500M
8 GB, HBM2, 2048-bit
GTX 1660 Ti
$280
1536
48
1500 MHz
1770 MHz
1500 MHz
TU116
6600M
6 GB, GDDR6, 192-bit
EVGA GTX 1660 Ti XC Black
$280
1536
48
1500 MHz
1770 MHz
1500 MHz
TU116
6600M
6 GB, GDDR6, 192-bit
GTX 1070 Ti
$450
2432
64
1607 MHz
1683 MHz
2000 MHz
GP104
7200M
8 GB, GDDR5, 256-bit
RTX 2060 FE
$350
1920
48
1365 MHz
1680 MHz
1750 MHz
TU106
10800M
6 GB, GDDR6, 192-bit
GTX 1080
$500
2560
64
1607 MHz
1733 MHz
1251 MHz
GP104
7200M
8 GB, GDDR5X, 256-bit
RX Vega 64
$400
4096
64
1247 MHz
1546 MHz
953 MHz
Vega 10
12500M
8 GB, HBM2, 2048-bit
GTX 1080 Ti
$700
3584
88
1481 MHz
1582 MHz
1376 MHz
GP102
12000M
11 GB, GDDR5X, 352-bit
RTX 2070
$490
2304
64
1410 MHz
1620 MHz
1750 MHz
TU106
10800M
8 GB, GDDR6, 256-bit
Architecture
As we mentioned earlier, the GeForce GTX 1660 Ti is very much based on the "Turing" architecture, while lacking its two killer features, RT cores and tensor cores. Much of NVIDIA's efforts to woo buyers in the competitive sub-$300 market are hence directed at reiterating the benefits of CUDA cores from the "Turing" architecture, which by the way are the same CUDA cores you'd find on an RTX 20-series GPU. At the heart of the GTX 1660 Ti is the new 12 nm "TU116" GPU.
NVIDIA has significantly re-engineered the Graphics Processing Clusters (GPCs) of the silicon to lack RT cores and tensor cores. The chip's hierarchy is similar to other "Turing" GPUs. The GigaThread Engine and L2 cache are town-square for the GPU, which bind three GPCs with the chip's PCI-Express 3.0 x16 host and 192-bit GDDR6 memory interfaces. Each GPC has four indivisible TPCs (Texture Processing Cluster) that share a Polymorph Engine between two streaming multiprocessors (SM). Each Turing SM packs 64 CUDA cores, and thus, we end up with 128 CUDA cores per TPC, 512 per GPC, and 1,536 across the silicon.
Much of NVIDIA's CUDA core specific innovation for Turing centers on improving the architecture's concurrent execution capabilities. This is not the same as asynchronous compute, but the two concepts aren't too far removed from each other. Turing CUDA cores are designed to in parallel execute integer and floating-point instructions per clock-cycle, while older architectures, such as Pascal, can only handle one kind of execution at a time. Asynchronous compute is a more macro concept and concerns the GPU's ability to handle various graphics and compute workloads in tandem.
Cushioning the CUDA cores is an improved L1 cache subsystem. The L1 caches are enlarged three-fold, with a four-fold increase in load/store bandwidth. The caches are configurable on the fly as either two 32 KB partitions per SM or a unified 64 KB block per TPC. NVIDIA has also substituted tensor cores with dedicated FP16 cores per SM to execute FP16 operations. These are physically separate components to the 64 FP32 and 64 INT32 cores per SM and execute FP16 at double the speed of FP32 cores. On the RTX 2060, for example, are no dedicated FP16 cores per SM, and the tensor cores are configured to handle FP16 ops at an enormous rate.
NVIDIA has deployed the latest GDDR6 memory on the GTX 1660 Ti, although it ticks at a slower 12 Gbps data rate, compared to 14 Gbps on the RTX 2060. This is still a massive 50 percent increase in memory bandwidth compared to the GTX 1060 6 GB (288 GB/s vs. 192 GB/s). The memory amount is unchanged at 6 GB.
Features
Let's talk about the two elephants in the room first. The GTX 1660 Ti will not give you real-time raytracing because it lacks RT cores, and won't give you DLSS for want of tensor cores. What you will get is Variable Rate Shading. The Adaptive Shading (aka variable-rate shading) feature introduced with Turing is carried over to the GTX 1660 Ti. Both its key algorithms, content-adaptive shading (CAS) and motion-adaptive shading (MAS), are available. CAS senses color or spatial coherence in scenes to minimize repetitive shading of details in pursuit of increasing detail where it matters. MAS senses high motion in a scene (e.g.: race simulators) and minimizes shading of details in favor of performance.
Packaging and Contents
You will receive:
Graphics card
Documentation
The Card
The GTX 1660 Ti XC Black doesn't use the transparent plastic design we've seen on other Turing cards from the company. Rather, it sticks with a dark design that uses a mix of glossy and matte surfaces to drive the looks of the card. A backplate is not included. Dimensions of the card are 19.5 x 11.5 cm.
Installation requires three slots in your system.
Display connectivity options include a DVI port, one HDMI 2.0b, and one DisplayPort 1.4a.
NVIDIA has updated their display engine with the Turing microarchitecture, which now supports DisplayPort 1.4a with support for VESA's nearly lossless Display Stream Compression (DSC). Combined, this enables support for 8K@30Hz with a single cable, or 8K@60Hz when DSC is turned on. For context, DisplayPort 1.4a is the latest version of the standard that was published in April, 2018.
At CES 2019, NVIDIA announced that all their graphics cards will now support VESA Adaptive Sync (aka FreeSync). While only a small number of FreeSync monitors have been fully qualified for G-SYNC, users can enable the feature in NVIDIA's control panel, no matter whether the monitor is certified or not.
The board uses a single 8-pin power connector. This input configuration is specified for up to 225 watts of power draw.