As we mentioned earlier, the GeForce GTX 1660 Ti is very much based on the "Turing" architecture, while lacking its two killer features, RT cores and tensor cores. Much of NVIDIA's efforts to woo buyers in the competitive sub-$300 market are hence directed at reiterating the benefits of CUDA cores from the "Turing" architecture, which by the way are the same CUDA cores you'd find on an RTX 20-series GPU. At the heart of the GTX 1660 Ti is the new 12 nm "TU116" GPU.
NVIDIA has significantly re-engineered the Graphics Processing Clusters (GPCs) of the silicon to lack RT cores and tensor cores. The chip's hierarchy is similar to other "Turing" GPUs. The GigaThread Engine and L2 cache are town-square for the GPU, which bind three GPCs with the chip's PCI-Express 3.0 x16 host and 192-bit GDDR6 memory interfaces. Each GPC has four indivisible TPCs (Texture Processing Cluster) that share a Polymorph Engine between two streaming multiprocessors (SM). Each Turing SM packs 64 CUDA cores, and thus, we end up with 128 CUDA cores per TPC, 512 per GPC, and 1,536 across the silicon.
Much of NVIDIA's CUDA core specific innovation for Turing centers on improving the architecture's concurrent execution capabilities. This is not the same as asynchronous compute, but the two concepts aren't too far removed from each other. Turing CUDA cores are designed to in parallel execute integer and floating-point instructions per clock-cycle, while older architectures, such as Pascal, can only handle one kind of execution at a time. Asynchronous compute is a more macro concept and concerns the GPU's ability to handle various graphics and compute workloads in tandem.
Cushioning the CUDA cores is an improved L1 cache subsystem. The L1 caches are enlarged three-fold, with a four-fold increase in load/store bandwidth. The caches are configurable on the fly as either two 32 KB partitions per SM or a unified 64 KB block per TPC. NVIDIA has also substituted tensor cores with dedicated FP16 cores per SM to execute FP16 operations. These are physically separate components to the 64 FP32 and 64 INT32 cores per SM and execute FP16 at double the speed of FP32 cores. On the RTX 2060, for example, are no dedicated FP16 cores per SM, and the tensor cores are configured to handle FP16 ops at an enormous rate.
NVIDIA has deployed the latest GDDR6 memory on the GTX 1660 Ti, although it ticks at a slower 12 Gbps data rate, compared to 14 Gbps on the RTX 2060. This is still a massive 50 percent increase in memory bandwidth compared to the GTX 1060 6 GB (288 GB/s vs. 192 GB/s). The memory amount is unchanged at 6 GB.
Features
Let's talk about the two elephants in the room first. The GTX 1660 Ti will not give you real-time raytracing because it lacks RT cores, and won't give you DLSS for want of tensor cores. What you will get is Variable Rate Shading. The Adaptive Shading (aka variable-rate shading) feature introduced with Turing is carried over to the GTX 1660 Ti. Both its key algorithms, content-adaptive shading (CAS) and motion-adaptive shading (MAS), are available. CAS senses color or spatial coherence in scenes to minimize repetitive shading of details in pursuit of increasing detail where it matters. MAS senses high motion in a scene (e.g.: race simulators) and minimizes shading of details in favor of performance.