The Radeon RX 6700 XT is the new kid on the block from AMD, and arguably its most important RX 6000 series graphics card launched to date as it's the most affordable (on paper) and targets the heart of the performance segment. The card is designed for high refresh-rate 1440p gaming and capable of real-time raytracing. It introduces the company's second discrete GPU based on the RDNA 2 graphics architecture, the 7 nm Navi 22. AMD also claims that the RX 6700 XT should disrupt the sub-$500 graphics market, taking the fight to two of NVIDIA's popular Ampere products, the GeForce RTX 3060 Ti and RTX 3070.
The new RDNA 2 graphics architecture from AMD breathed life back into the consumer graphics market by competing with NVIDIA at the highest market segments with the RX 6800 series and the flagship RX 6900 XT Big Navi. It offers full DirectX 12 Ultimate readiness, including real-time raytracing, variable-rate shading, mesh shaders, and sampler feedback. Raytracing is the holy grail of 3D graphics, and while fully raytraced interactive 3D is beyond the capabilities of consumer hardware, it's possible to combine conventional raster 3D graphics with certain real-time raytraced elements, such as lighting, shadows, reflections, global illumination, and so on, to significantly increase realism.
Even this much raytracing requires an enormous amount of compute power. The most compute intensive task of ray intersection is handled by special hardware AMD calls Ray Accelerators, while shaders handle other related tasks, such as denoising. A side-effect of this approach is that AMD has had to boost shader performance significantly over the past generation, which means most games that only use raster 3D graphics should see enormous performance gains over the previous RDNA generation.
The Radeon RX 6700 XT debuts the Navi 22 silicon, which is leaner and more space efficient than the Big Navi silicon powering the larger RX 6000 series cards. The chip physically packs 40 RDNA 2 compute units, working out to 2,560 stream processors and 40 Ray Accelerators. The number of stream processors is identical to that of the RX 5700 XT Navi, but the performance uplift comes from the higher IPC of the RDNA 2 compute unit, besides much higher engine clocks—2424 MHz vs. 1755 MHz (game clocks).
AMD has made a significant yet frugal change to the memory setup. You now get 12 GB of GDDR6 memory, which is 50% higher than the 8 GB of the RX 5700 XT, but at 192-bit wide, the memory bus width is 25% narrower. AMD has tried to make up for this by using the fastest JEDEC-standard 16 Gbps GDDR6 memory chips, resulting in 384 GB/s bandwidth. This is still much lower than the 448 GB/s of the RX 5700 XT. The company deployed its new Infinity Cache technology that it debuted with Big Navi. A 96 MB fast cache on the GPU die cushions memory access, and operates at 1.5 TB/s.
AMD is pricing the Radeon RX 6700 XT at US$479 for the reference design, undercutting the $499 price of the GeForce RTX 3070, but $479 is higher than the $399 starting price of the RTX 3060 Ti, the card it is extensively compared against in AMD's marketing materials. It also faces some internal competition from the $100 pricier RX 6800, which AMD is marketing as a 4K-capable 1440p card. All these prices are pure fiction; real-world graphics card pricing is completely whack right now. In this review, we'll focus on how the card competes with others in its vicinity on our swanky new March 2021 test system.
For AMD, a lot is riding on the success of the new RDNA 2 graphics architecture as it powers not just the Radeon RX 6000 series graphics cards, but also the GPU inside next-generation game consoles designed for 4K Ultra HD gaming with raytracing—a really tall engineering goal. AMD was first to market with a 7 nm GPU more than 15 months ago, using the original RDNA architecture and Navi. The company hasn't changed its process node, but implemented a host of new technologies after having acquired experience with the node. The Radeon RX 6700 XT is powered by AMD's new 7 nm Navi 22 silicon, built on the same TSMC 7 nm silicon fabrication node as the Big Navi. The chip measures 336 mm² and crams in 17.2 billion transistors, putting it in the same league as NVIDIA's 8 nm GA104 silicon that powers the RTX 3070. The die talks to the outside world with a 192-bit wide GDDR6 memory interface, a PCI-Express 4.0 x16 host interface, and display I/O that's good for multiple 4K or 8K displays due to DSC.
New design methodologies and component-level optimization throughout the silicon, along with new power-management features, allowed AMD to achieve two breakthroughs that enabled it to double compute unit counts over the previous generation while staying within a reasonable power envelope. Firstly, the company managed to halve the power draw per CU while adding a 30% increase in engine clocks, which can both be redeemed for performance gain per CU.
The RDNA 2 compute unit is where a bulk of the magic happens. Arranged in groups of two called Dual Compute Units, which share instruction and data caches, the RDNA 2 compute unit still packs 64 stream processors (128 per Dual CU) and has been optimized for increased frequencies, new kinds of math precision, new hardware that enables the Sampler Feedback feature, and the all-important Ray Accelerator, a fixed-function hardware component that calculates up to one triangle or four box ray intersections per clock cycle. AMD claims the Ray Accelerator makes intersection performance up to ten times faster than if it were executed with compute shaders. AMD also redesigned the render backends of the GPU from the ground up, towards enabling features such as Variable Rate Shading (both tier 1 and tier 2). At 64, the ROP count remains the same as for the previous-generation Navi 10.
Overall, the Navi 22 silicon has essentially the same component hierarchy as Navi 10. The Infinity Fabric interconnect is the link that binds all the components together. At the outermost level, you have the chip's 192-bit GDDR6 memory controllers, a PCI-Express 4.0 x16 host interface, and the multimedia and display engines which have been substantially updated from RDNA. A notch inside is the chip's 96-megabyte Infinity Cache, which we detail below. This cache is the town square for the GPU's high-speed 4 MB L2 caches and the graphics command processor, which dispatches the workload among two shader engines. Each of these shader engines packs 10 RDNA 2 Dual Compute Units (or 20 CUs) along with the updated render backends and L1 cache. Combined, the silicon has 2,560 stream processors across 40 CUs, 40 Ray Accelerators (1 per CU), 160 TMUs, and 64 ROPs. In every sense except the memory, the Navi 22 is half a Navi 21.
The Radeon RX 6700 XT maxes out the Navi 22 silicon by enabling all 40 RDNA 2 compute units. The card comes with 12 GB of GDDR6 memory running at 16 Gbps (GDDR6-effective) across the chip's 192-bit wide memory interface, which works out to 384 GB/s of memory bandwidth. The Infinity Cache runs at the highest possible 1.5 TB/s data-rate, while AMD claims that the engine clock can spike well above 2.50 GHz, with a 2.42 GHz "game clock."
Infinity Cache, or How AMD is Blunting NVIDIA's G6X Advantage
Despite its lofty design goals and a generational doubling in memory size to 12 GB, the RX 6700 XT has a rather unimpressive memory setup compared to NVIDIA's RTX 3070, or even AMD's own previous-generation RX 5700 XT. That is, at least on paper, with just a 192-bit bus width and JEDEC-standard 16 Gbps GDDR6, which works out to 384 GB/s raw bandwidth. Competing NVIDIA cards use 14 Gbps memory, but over a wider 256-bit memory interface. Memory compression secret sauce can at best increase effective bandwidth by a high single-digit percent.
AMD took a frugal approach to this problem, not wanting to invest in expensive HBM+interposer based solutions, which would have thrown overall production costs way off balance. AMD looked at how their Zen processor team leveraged large last-level caches on EPYC processors to significantly improve performance and carried the idea over to the GPU. A large chunk of the Navi 22 silicon die area now holds what AMD calls the "Infinite Cache," which is really just a new L3 cache that is 96 MB in size and talks to the GPU's four shader engines over a 1024 bit interface. This cache has an impressive bandwidth of 1.5 TB/s and can be used as a victim cache by the 4 MB L2 caches of the two shader engines.
The physical media of Infinity Cache is the same class of SRAM as for the L3 cache on Zen processors. It offers four times the density of 4 MB L2 caches, lower bandwidth in comparison, but four times the bandwidth over GDDR6. It also significantly reduces energy consumption, by a sixth for the GPU to fetch a byte of data compared to doing so from GDDR6 memory. I'm sure the questions on your mind are what difference 96 MB makes and why no one has done this earlier.
To answer the first question, even with just 96 MB spread across two slabs of 48 MB each, Infinity Cache takes up a large amount of the die area of the Navi 22 silicon, and AMD's data has shown that much of the small workloads involved in raytracing and raster operations are bandwidth rather than memory-size intensive. Having a 96 MB fast victim cache running at extremely low latencies compared to DRAM helps. As for why AMD didn't do this earlier, it's only now that there's an alignment of circumstances where the company can afford to go with a fast 96 MB victim cache as opposed to just cramming in more CUs to get comparable levels of performance, but for less power consumption—as a storage rather than a logic device, spending die area on Infinity Cache instead of more CUs does result in power savings.
Packaging
The Card
For their newest card, AMD made sure to keep the design language the same, so that the Radeon RX 6700 XT can clearly be identified as a Navi 2x RDNA 2 graphics card. The cooler is all metal, using various shades of gray to transport the Radeon branding. On the back, you'll find a high-quality metal backplate.
Dimensions of the card are 27 x 11 cm, and it weighs 882 g.
Installation requires two slots in your system.
Display connectivity includes three standard DisplayPort 1.4 and one HDMI 2.1.
The card has one 8-pin and one 6-pin power input. This configuration is rated for up to 300 W of power draw.
The AMD Radeon RX 6000 series doesn't support multi-GPU.