Architecture
The GeForce GTX 1060 succeeds the GeForce GTX 960 of the previous generation, and is hence based on a smaller ASIC, the GP106, which in turn succeeds the GM206 silicon the GTX 960 is based on. Leveraging the 16 nm FinFET process, the GP106 is tiny, with a die-area of just 200 mm². The transistor count is 4.4 billion, which is a significant increase from the 2.94 billion on the GM206.
With each successive architecture since "Fermi," NVIDIA has been enriching the streaming multiprocessor (SM) by adding more dedicated resources and reducing shared resources within the graphics processing cluster (GPC), which leads to big performance gains. The story continues with "Pascal." Like the GM206 before it, the GP106 features two GPCs, super-specialized subunits of the GPU that share the PCI-Express 3.0 x16 host interface and the 192-bit GDDR5 memory interface through six controllers.
Workload across the two GPCs is shared by the GigaThread Engine cushioned by an L2 cache. Each GPC holds five streaming multiprocessors (SMs), which is an increase from the four SMs each GPC held on the GM206. The GPC shares a raster engine between these five SMs. The "Pascal" streaming multiprocessor features a 4th generation PolyMorph Engine, a component for key render setup operations. With "Pascal," the PolyMorph Engine includes specialized hardware for the new Simultaneous MultiProjection feature. Each SM also holds a block of eight TMUs.
Each SM continues to feature 128 CUDA cores. The GP106 hence features a total of 1,280 CUDA cores. Other vital specifications include 80 TMUs and 48 ROPs. You'll notice that while the SIMD components are half that of the GP104, the GP106 features 75% of the GP104's raster operations machinery and memory interface. 6 GB is the standard memory amount, a three-fold increase from the GTX 960 (which launched with 2 GB as a standard memory amount as 4 GB variants were added by board partners much later).
The GeForce GTX 1060 features 8 Gbps GDDR5 memory. Across a 192-bit memory interface, you're looking at a memory bandwidth of 192 GB/s. NVIDIA has improved the delta color compression technology with the "Pascal" architecture, and in the best-case scenario, this should provide a memory bandwidth utilization uplift of 20 percent (effectively 230.4 GB/s).
The "Pascal" architecture supports Asynchronous Compute as standardized by Microsoft. It adds to that with its own variation of the concept with "Dynamic Load Balancing."