The Gigabyte Radeon RX 6700 XT Gaming OC is the company's factory-overclocked graphics card for those seeking a close-to-reference custom rendition of the RX 6700 XT from Gigabyte without the bells and whistles of the AORUS Gaming brand. The Radeon RX 6700 XT is AMD's fourth RX 6000 series RDNA 2 graphics card, and arguably its most important so far as it targets a sub-$500 (MSRP) price point, bringing the architecture to a wider audience. A successor to the RX 5700 XT, it brings full DirectX 12 Ultimate readiness, including real-time raytracing, and is suited for maxed out gaming at 1440p. AMD claims that the card is competitive not only against NVIDIA's RTX 3060 Ti, but also the pricier RTX 3070 in certain games.
The RDNA 2 graphics architecture powering the RX 6700 XT is spread far and wide in the current generation, spanning not just the Radeon RX 6000 series, but also the latest game consoles. This makes it easier for game developers to optimize for the architecture on the PC. AMD's approach to real-time raytracing involves using special hardware called Ray Accelerators to handle the most compute-intensive task in raytracing, of ray intersections, while compute shaders are used for almost everything else, including de-noising. A side effect of this approach is that AMD has had to bolster its SIMD muscle significantly over the previous generation, which can work wonders for conventional raster 3D games.
The Radeon RX 6700 XT is based on and maxes out the new 7 nm Navi 22 silicon, which physically features 40 RDNA 2 compute units. This works out to 2,560 stream processors, 40 Ray Accelerators, 160 TMUs, and 64 ROPs. The stream processor count is exactly the same as with the RX 5700 XT, but besides having a higher IPC, they run at much higher engine clocks in excess of 2.42 GHz, compared to the 1.77 GHz game clock of the RX 5700 XT.
AMD has increased the standard memory amount to 12 GB, but over a narrow 192-bit GDDR6 memory bus. AMD attempts to overcome the bandwidth deficit compared to the 256-bit GDDR6 interface of the RX 5700 series by increasing the memory clocks to 16 Gbps and deploying its new Infinity Cache technology—a fast 96 MB on-die level 3 cache that accelerates the memory subsystem.
The Gigabyte RX 6700 XT Gaming OC comes with the company's latest-generation WindForce 3X cooling solution found on several other current-generation products. Three aluminium fin stacks are skewered by five 6 mm-thick copper heat-pipes that make direct contact with the GPU at the base. This heatsink is ventilated by three fans. The cooler is longer than the PCB, so some of the airflow from the third fan flows through a hole in the backplate. The RX 6700 XT comes with factory overclocked speeds of 2514 MHz (game clock) as opposed to the 2424 MHz reference. In this review, we take the card for a spin.
Radeon RX 6700 XT Market Segment Analysis
Price
Shader Units
ROPs
Core Clock
Boost Clock
Memory Clock
GPU
Transistors
Memory
RX Vega 64
$400
4096
64
1247 MHz
1546 MHz
953 MHz
Vega 10
12500M
8 GB, HBM2, 2048-bit
RX 5700 XT
$370
2560
64
1605 MHz
1755 MHz
1750 MHz
Navi 10
10300M
8 GB, GDDR6, 256-bit
RTX 2070
$340
2304
64
1410 MHz
1620 MHz
1750 MHz
TU106
10800M
8 GB, GDDR6, 256-bit
RTX 3060
$600
3584
48
1320 MHz
1777 MHz
1875 MHz
GA106
13250M
12 GB, GDDR6, 192-bit
RTX 2070 Super
$450
2560
64
1605 MHz
1770 MHz
1750 MHz
TU104
13600M
8 GB, GDDR6, 256-bit
Radeon VII
$680
3840
64
1400 MHz
1800 MHz
1000 MHz
Vega 20
13230M
16 GB, HBM2, 4096-bit
RTX 2080
$600
2944
64
1515 MHz
1710 MHz
1750 MHz
TU104
13600M
8 GB, GDDR6, 256-bit
RTX 2080 Super
$690
3072
64
1650 MHz
1815 MHz
1940 MHz
TU104
13600M
8 GB, GDDR6, 256-bit
RTX 3060 Ti
$700
4864
80
1410 MHz
1665 MHz
1750 MHz
GA104
17400M
8 GB, GDDR6, 256-bit
RX 6700 XT
$800 MSRP: $480
2560
64
2424 MHz
2581 MHz
2000 MHz
Navi 22
17200M
12 GB, GDDR6, 192-bit
Gigabyte RX 6700 XT Gaming OC
$820
2560
64
2424 MHz
2614 MHz
2000 MHz
Navi 22
17200M
12 GB, GDDR6, 192-bit
RTX 2080 Ti
$1000
4352
88
1350 MHz
1545 MHz
1750 MHz
TU102
18600M
11 GB, GDDR6, 352-bit
RTX 3070
$800
5888
96
1500 MHz
1725 MHz
1750 MHz
GA104
17400M
8 GB, GDDR6, 256-bit
RX 6800
$1000
3840
96
1815 MHz
2105 MHz
2000 MHz
Navi 21
26800M
16 GB, GDDR6, 256-bit
RX 6800 XT
$1300
4608
128
2015 MHz
2250 MHz
2000 MHz
Navi 21
26800M
16 GB, GDDR6, 256-bit
RTX 3080
$1300
8704
96
1440 MHz
1710 MHz
1188 MHz
GA102
28000M
10 GB, GDDR6X, 320-bit
RX 6900 XT
$1500
5120
128
2015 MHz
2250 MHz
2000 MHz
Navi 21
26800M
16 GB, GDDR6, 256-bit
RTX 3090
$2000
10496
112
1395 MHz
1695 MHz
1219 MHz
GA102
28000M
24 GB, GDDR6X, 384-bit
RDNA 2 Architecture
For AMD, a lot is riding on the success of the new RDNA 2 graphics architecture as it powers not just the Radeon RX 6000 series graphics cards, but also the GPU inside next-generation game consoles designed for 4K Ultra HD gaming with raytracing—a really tall engineering goal. AMD was first to market with a 7 nm GPU more than 15 months ago with the original RDNA architecture and Navi. The company hasn't changed its process node, but implemented a host of new technologies after having acquired experience with the node. The Radeon RX 6700 XT is powered by AMD's new 7 nm Navi 22 silicon built on the same TSMC 7 nm silicon fabrication node as the Big Navi. The chip measures 336 mm² and crams in 17.2 billion transistors, putting it in the same league as NVIDIA's 8 nm GA104 silicon that powers the RTX 3070. The die talks to the outside world with a 192-bit wide GDDR6 memory interface, a PCI-Express 4.0 x16 host interface, and display I/O that's good for multiple 4K or 8K displays due to DSC.
New design methodologies and component-level optimization throughout the silicon and new power-management features allowed for two breakthroughs that enabled double the compute unit counts over the previous generation while staying within a reasonable power envelope. Firstly, AMD managed to halve the power draw per CU while adding a 30% increase in engine clocks, which can both be redeemed for performance gain per CU.
The RDNA 2 compute unit is where a bulk of the magic happens. Arranged in groups of two called Dual Compute Units which share instruction and data caches, the RDNA 2 compute unit still packs 64 stream processors (128 per Dual CU) and has been optimized for increased frequencies, new kinds of math precision, new hardware that enables the Sampler Feedback feature, and the all-important Ray Accelerator, a fixed-function hardware component that calculates up to one triangle or four box ray intersections per clock cycle. AMD claims the Ray Accelerator makes intersection performance up to ten times faster than if it were executed with compute shaders. AMD also redesigned the render backends of the GPU from the ground up, towards enabling features such as Variable Rate Shading (both tier 1 and tier 2). At 64, the ROP count remains the same as for the previous-generation Navi 10.
Overall, the Navi 22 silicon essentially has the same component hierarchy as Navi 10. The Infinity Fabric interconnect is the link that binds all the components together. At the outermost level, you have the chip's 192-bit GDDR6 memory controllers, a PCI-Express 4.0 x16 host interface, and the multimedia and display engines which have been updated substantially from RDNA. A notch inside is the chip's 96-megabyte Infinity Cache, which we detail below. This cache is the town square for the GPU's high-speed 4 MB L2 caches and the graphics command processor, which dispatches the workload among two shader engines. Each of these shader engines packs 10 RDNA 2 Dual Compute Units (or 20 CUs) along with the updated render backends and L1 cache. Combined, the silicon has 2,560 stream processors across 40 CUs, 40 Ray Accelerators (1 per CU), 160 TMUs, and 64 ROPs. In every sense except the memory, the Navi 22 is half a Navi 21.
The Radeon RX 6700 XT maxes out the Navi 22 silicon by enabling all 40 RDNA 2 compute units. The card comes with 12 GB of GDDR6 memory running at 16 Gbps (GDDR6-effective) across the chip's 192-bit wide memory interface, which works out to 384 GB/s of memory bandwidth. The Infinity Cache runs at the highest possible 1.5 TB/s data-rate, while AMD claims that the engine clock can spike well above 2.50 GHz, with a 2.42 GHz "game clock."
Infinity Cache, or How AMD is Blunting NVIDIA's G6X Advantage
Despite its lofty design goals and a generational doubling in memory size to 12 GB, the RX 6700 XT has a rather unimpressive memory setup compared to NVIDIA's RTX 3070 or even AMD's own previous-generation RX 5700 XT. That is, at least on paper, with just a 192-bit bus width and JEDEC-standard 16 Gbps GDDR6, which works out to 384 GB/s raw bandwidth. Competing NVIDIA cards use 14 Gbps memory, but over a wider 256-bit memory interface. Memory compression secret sauce can at best increase effective bandwidth by a high single-digit percent.
AMD took a frugal approach to this problem, not wanting to invest in expensive HBM+interposer based solutions, which would have thrown overall production costs way off balance. AMD looked at how their Zen processor team leveraged large last-level caches on EPYC processors to significantly improve performance and carried the idea over to the GPU. A large chunk of the Navi 22 silicon die area now holds what AMD calls the "Infinite Cache," which is really just a new L3 cache that is 96 MB in size and talks to the GPU's four shader engines over a 1024-bit interface. This cache has an impressive bandwidth of 1.5 TB/s and can be used as a victim cache by the 4 MB L2 caches of the two shader engines.
The physical media of Infinity Cache is the same class of SRAM as for the L3 cache on Zen processors. It offers four times the density of 4 MB L2 caches, lower bandwidth in comparison, but four times the bandwidth over GDDR6. It also significantly reduces energy consumption, by a sixth for the GPU to fetch a byte of data compared to doing so from GDDR6 memory. I'm sure the questions on your mind are what difference 96 MB makes and why no one has done this earlier.
To answer the first question, even with just 96 MB spread across two slabs of 48 MB each, Infinity Cache takes up a large amount of the die area of the Navi 22 silicon, and AMD's data has shown that much of the small workloads involved in raytracing and raster operations are bandwidth rather than memory-size intensive. Having a 96 MB fast victim cache running at extremely low latencies compared to DRAM helps. As for why AMD didn't do this earlier, it's only now that there's an alignment of circumstances where the company can afford to go with a fast 96 MB victim cache as opposed to just cramming in more CUs to get comparable levels of performance, but for less power consumption—as a storage rather than a logic device, spending die area on Infinity Cache instead of more CUs does result in power savings.
Packaging
The Card
Gigabyte's card uses a mix of black and gray highlights paired with a blocky industrial design. On the back, you'll find a high-quality metal backplate.
Dimensions of the card are 28.5 x 11.5 cm, and it weighs 890 g.
Installation requires three slots in your system.
Display connectivity includes two standard DisplayPort 1.4 and two HDMI 2.1.
The card has one 8-pin and one 6-pin power input. This configuration is rated for up to 300 W of power draw.
The AMD Radeon RX 6000 series doesn't support multi-GPU.