At the heart of the Radeon RX 470 is the new "Ellesmere" (Polaris 10) GPU built on the 14 nanometer silicon fab process by Samsung and GlobalFoundries. The wafers are made in Upstate New York, USA, and are then bumped and packaged at a facility in Taiwan to be sent to the various graphics card manufacturers located there and across the straits.
This GPU is based on AMD's fourth generation Graphics CoreNext architecture codenamed "Polaris." According to AMD, Compute Units (CUs) based on Polaris are 15% more efficient at number crunching than CUs based on the preceding Graphics CoreNext 1.2 architecture (R9 Fury, R9 380X). Pay attention to the numbers here. While the number-crunching machinery is 15% more efficient, the chip is claimed to have a 2.5x leap in overall energy-efficiency over the previous generation. This is because AMD is cashing in on the immediate gains a new silicon fab process brings to the table, the 14 nm FinFET process in this case, to increase transistor counts and clock speeds.
The component hierarchy in the Polaris 10 "Ellesmere" silicon is similar to older-generation chips, although each of the components received major updates. We begin with the chip featuring two hardware schedulers and the introduction of dedicated real-time asynchronous compute with spatial and temporal scheduling. The chip also features four async compute engines (ACEs). AMD optimized the async compute engine with new quick-response queue tech.
There's a design focus on stepping up geometry processing performance and blunting the brute-tessellation advantage NVIDIA traditionally enjoyed over AMD. For a chip of this segment, Polaris 10 features four independent geometry processors. Their functionality is upgraded over the previous generation, featuring a primitive discard accelerator which culls (discards) triangles in the pipeline with zero area or no inclusive sample points. The geometry engine now features a tiny cache called the Index Cache, which cushions small instanced geometry and reduces data movement to improve primitive throughput during instancing.
The Polaris 10 silicon features 36 Compute Units (CUs). 32 of these are enabled on the RX 470, spread across four shader engines, each with a dedicated geometry processor, a raster engine, and two render backends. The four shader engines are supported by a large 2 MB L2 cache, which acts as the town-square for the GPU's various key components.
Most of the architecture-specific innovations are centered on the CU, which now features a hardware instruction prefetcher, a larger instruction buffer, and native half-precision (FP16/Int16) support, which should reliably crunch numbers for gaming applications with significantly reduced memory and register footprints while lowering power execution. Altogether, the "Polaris" CU is claimed to have up to 15% higher performance than CUs based on the GCN 1.1 architecture (R9 390X). Each CU features 64 stream processors, which has the 32 CUs amount to 2,048 stream processors. In summary, the Polaris 10 chip features 2,048 stream processors, 128 TMUs, and 32 ROPs.
The Radeon RX 470 features a 256-bit wide GDDR5 memory interface, holding 4 GB or 8 GB of memory, clocked at 6.6 Gbps. The actual memory bandwidth of this interface at its given clock speeds is rated at up to 211 GB/s, although its effective bandwidth could be higher thanks to an updated lossless delta color compression (DCC) tech with full 2/4/8:1 compression ratios. AMD claims that its new gen DCC tech can provide an effective bandwidth uplift of a staggering 30 percent. The ASUS Radeon RX 470 STRIX we're reviewing today features 4 GB of memory.
The multimedia accelerators receive a major update, now supporting H.265 Main10 decode hardware acceleration and 4K60 HEVC encode hardware acceleration. The other components with big updates are the display controllers, which now support DisplayPort 1.4 (DP 1.3 HBR3 and DP 1.4 HDR) and HDMI 2.0b. FreeSync is supported over both DP and HDMI. Resolutions as high as 5K60, 10-bit 4K96 HDR, and 4K120 SDR are supported.