Architecture
The GeForce GTX 1050 Ti and its sibling, the GTX 1050, are based on NVIDIA's smallest silicon from the "Pascal" family, the GP107. With a die-area of 132 mm² and a transistor count of 3.3 billion, this chip is tiny, having been built with very clear cost objectives in mind.
NVIDIA has still made sure that unless a design choice doesn't substantially deviate from its cost objectives, it will implement it. A good example of this is the fact that the GP107 silicon, despite featuring just 768 CUDA cores spread across six streaming multiprocessors (SMs), is split into two graphics processing clusters (GPCs) of three SMs, each.
The decision to spread six streaming multiprocessors across two GPCs means that three SMs share a Raster Engine, specialized units with geometry/tessellation units. The streaming multiprocessor, the indivisible subunit of the GPU, is identical in design to those featured on NVIDIA's fastest TITAN X Pascal graphics cards. Each packs 128 CUDA cores, a PolyMorph Engine, and dedicated geometry processing components. The two GPCs are cushioned by a 1 MB L2 cache wired to a new-generation GigaThread Engine - the GPU's traffic cop - and a 128-bit wide GDDR5 memory interface.
At its reference clock speeds, the GPU has 112 GB/s of memory bandwidth at its disposal. This is bolstered by NVIDIA's lossless memory compression tech, which should increase effective bandwidth in a small but significant way. On the GTX 1080, NVIDIA claims this gain to be 20 percent in the best-case scenario. Something like that would certainly come in handy for the GTX 1050 Ti.
The "Pascal" architecture supports Asynchronous Compute as standardized by Microsoft. It adds to that with its own variation of the concept with "Dynamic Load Balancing."