MSI today debuted its Radeon RX 5700 XT Evoke, a brand-new series of graphics cards by the company launching first with the Navi RX 5700 series. This card is part of several custom-design RX 5700 XT graphics cards from multiple vendors that are launching this week, a little over a month after the debut of these GPUs on the 7th of July. These cards have, until this week, only been available in the AMD reference-design sold by AMD's add-in board partners.
The Radeon RX 5700 XT is AMD's first true performance-segment graphics card since the RX Vega series released over two years ago. It's based on the brand new "Navi" architecture that leverages the 7 nm silicon fabrication process and brand new number-crunching machinery AMD calls RDNA compute units. These constitute the biggest update to AMD's GPU design since the very first Graphics CoreNext (GCN) architecture circa 2013. Together with clock-speeds, RDNA is designed to bring about massive IPC improvements over GCN. The silicon also has a number of architectural changes. An interesting series of price adjustments and product-launches ensures that even at its starting price of $399, it offers a bit more price-performance than NVIDIA.
AMD had originally planned to launch the Radeon RX 5700 XT at $449 and the RX 5700 at $399, with the two cards beating the $499 NVIDIA RTX 2070 and $349 RTX 2060, respectively. This forced NVIDIA to refresh its lineup with the new RTX 2070 Super at $499 and the RTX 2060 Super at $399. The RTX 2060 Super in particular was carefully crafted not to cannibalize the RTX 2070. AMD seeped into this imbroglio of NVIDIA and slotted the RX 5700 XT at $399, and the RX 5700 at $349, at which prices they outclass the RTX 2060 Super and the original RTX 2060, respectively. NVIDIA didn't adjust prices of its RTX 2060 Super or RTX 2070 Super any further, and we hence have a fair bit of headroom between the RTX 2060 Super and the RTX 2070 Super, in which AMD's board partners can launch custom-design RX 5700 XT cards with factory-overclocked speeds and other goodies, such as quieter coolers.
At the heart of the Radeon RX 5700 XT is the 7 nm "Navi 10" silicon with an impressive 10.3 billion transistors crammed into a 251 mm² die. Unlike the "Vega 20", Navi is a more traditional GPU in that the package only has the GPU die and is surrounded by memory chips. AMD opted for cost-effective 256-bit GDDR6 memory over exotic design-choices such as HBM2. At a memory frequency of 14 Gbps, Navi enjoys a healthy memory bandwidth of 448 GB/s. It also features the latest-generation PCI-Express gen 4.0 x16 host interface with full backwards compatibility for older generations of PCIe, so you can pair it with AMD's new Ryzen 3000 processors on an X570 chipset motherboard. The buzz-words "7 nm" and "PCIe gen 4.0" are extensively used in AMD's marketing, as if to suggest that Navi is a generation ahead of NVIDIA's Turing, which is built on 12 nm and has PCIe gen 3.0.
MSI Radeon RX 5700 XT Evoke features an all new card design language not found on other MSI cards. A champagne-gold metallic cuboidal triple-slot cooler dominates the design and gives the card a new-age industrial feel. The top is dominated by two 90 mm fans that turn off when the card is idling. Underneath are five nickel-plated heat pipes and an aluminium fin-stack heatsink. The design is neatly finished off with the backplate that fuses seamlessly with the cooler shroud. The circuit board closely resembles AMD's reference design, with its 7+2+1 phase VRM, but there are clear signs of customization, including the VRM components and controller and better-overclocking Micron memory. The card is running at a base clock of 1690 MHz, game clock is set at 1835 MHz, and highest boost is 1945 MHz. It is priced at US$430, a $30 premium over the $400 reference design.
Radeon RX 5700 XT Market Segment Analysis
Price
Shader Units
ROPs
Core Clock
Boost Clock
Memory Clock
GPU
Transistors
Memory
GTX 1070 Ti
$450
2432
64
1607 MHz
1683 MHz
2000 MHz
GP104
7200M
8 GB, GDDR5, 256-bit
RTX 2060
$290
1920
48
1365 MHz
1680 MHz
1750 MHz
TU106
10800M
6 GB, GDDR6, 192-bit
RX 5700
$350
2304
64
1465 MHz
1625 MHz
1750 MHz
Navi 10
10300M
8 GB, GDDR6, 256-bit
GTX 1080
$500
2560
64
1607 MHz
1733 MHz
1251 MHz
GP104
7200M
8 GB, GDDR5X, 256-bit
RTX 2060 Super
$400
2176
64
1470 MHz
1650 MHz
1750 MHz
TU106
10800M
8 GB, GDDR6, 256-bit
RX Vega 64
$400
4096
64
1247 MHz
1546 MHz
953 MHz
Vega 10
12500M
8 GB, HBM2, 2048-bit
GTX 1080 Ti
$700
3584
88
1481 MHz
1582 MHz
1376 MHz
GP102
12000M
11 GB, GDDR5X, 352-bit
RX 5700 XT
$400
2560
64
1605 MHz
1755 MHz
1750 MHz
Navi 10
10300M
8 GB, GDDR6, 256-bit
MSI RX 5700 XT Evoke
$430
2560
64
1690 MHz
1835 MHz
1750 MHz
Navi 10
10300M
8 GB, GDDR6, 256-bit
RTX 2070
$440
2304
64
1410 MHz
1620 MHz
1750 MHz
TU106
10800M
8 GB, GDDR6, 256-bit
RTX 2070 Super
$500
2560
64
1605 MHz
1770 MHz
1750 MHz
TU104
13600M
8 GB, GDDR6, 256-bit
Radeon VII
$680
3840
64
1802 MHz
N/A
1000 MHz
Vega 20
13230M
16 GB, HBM2, 4096-bit
RTX 2080
$630
2944
64
1515 MHz
1710 MHz
1750 MHz
TU104
13600M
8 GB, GDDR6, 256-bit
Architecture: Navi and RDNA
We've been hearing the moniker "Navi" for years now, and AMD threw another one at us this Computex, "RDNA", so let us demystify the two first. "Navi" is the codename for the family of silicon the GPU is based on. RDNA is a new architecture introduced by AMD to succeed Graphics Core Next (GCN). It prescribes the GPU's component hierarchy and, more importantly, its main number-crunching machinery, the compute units.
Another example of this distinction would be "Vega". Vega 10, Vega 20, and Vega 12 are pieces of silicon from the same family, while the GPU follows the 5th generation Graphics Core Next architecture governing even its compute units. Over many years, AMD made incremental updates to GCN, but this time, it claims that RDNA is sufficiently different from GCN to not be considered a new version, but rather a new hardware component that brings with it massive IPC gains over the previous generation.
The Radeon RX 5700 series is built around "Navi 10," an elegant little piece of silicon engineered on the 7 nm process at TSMC with 10.3 billion transistors crammed into a die measuring just 251 mm². The chip features a PCI-Express 4.0 x16 bus interface and a 256-bit wide GDDR6 memory interface. Infinity Fabric, which debuted on AMD's Ryzen CPUs, is extensively used as an on-die interconnect linking the various major components.
The bulk of AMD's engineering effort with RDNA has been to increase the number of dedicated resources to avoid starvation by fewer components waiting for access to a resource. The "Navi 10" silicon has two Shader Engines sharing a centralized Command Processor that distributes workloads, a Geometry Processor, and ACEs (asynchronous compute engines).
Each Shader Engine is further divided into two Graphics Engines. A graphics engine shares render backends, a Rasterizer, and a Prim Unit among five Workgroup Processors. This is where the core of RDNA begins. AMD figured it could merge two compute units (CUs) to share schedulers, scalar units, a data-share, instruction and data caches, and TMUs. The Workgroup Processor, or "dual-compute unit" as shown in the architecture block diagram, is for all intents and purposes indivisible, in that individual CUs cannot be disabled.
An RDNA compute unit packs 64 stream processors for vector operations and double the number of scalar units for localized serial processing. The stream processors in a CU are split into groups of two, each equipped with a scalar unit. According to AMD, this greatly reduces latency and improves the overall IPC of the compute unit. It also more efficiently utilizes local caches.
The vector execution units, or stream processors, is where much of the GPU's parallel processing happens. Due to the redesigned compute unit, two scalar processors pull two SIMD32 vector units made up of 32 stream processors, each, instead of a single scalar processor pulling four SIMD16 vector units. How is this important? On GCN, the way SIMD units are laid out, all items in a Wave64 operation get to do work once every four clocks due to hardware interleaving. With RDNA, Wave32 work items can do work every clock cycle. In all, RDNA minimizes wasted clock cycles by more efficiently and uniformly utilizing the hardware resources.
AMD examined previous generations of its graphics architecture to locate bottlenecks in the graphics pipeline. Besides increasing the number of dedicated resources, the company reworked the chip's cache hierarchy by cushioning data transfers at various stages. Each workgroup processor has dedicated 32 KB instruction and 16 KB data caches, which write back to a 128 KB L1 cache dedicated to each Graphics Engine.
These L1 caches talk to 4 MB of L2 cache. The introduction of the L1 cache and doubling in bandwidth between the various caches contributes greatly to IPC as it minimizes memory accesses, which are much slower than cache accesses. AMD is also using faster (lower latency) SRAM that reduces cache latencies by around 20 percent on die and by 8 percent at the memory level. AMD also introduced new features to the ACEs that include async-compute tunneling.
AMD summarizes the benefits of RDNA in a 25 percent IPC gain over the latest version of GCN, and an effective 50 percent performance gain for the GPU when taking into account IPC, the 7 nm process, and gains from the frequency and power management (ability to sustain boost frequencies better).
Elsewhere on the silicon, AMD updated the Display Engine and Multimedia Engine to keep up with the latest display and video standards. The Display Engine now supports DSC 1.2a (display stream compression) along with output standards HDMI 2.0 and DisplayPort 1.4 HDR to support display formats as bandwidth-intensive as 4K 240 Hz or 8K 60 Hz over a single cable, and support for 30 bits per pixel color depth. The multimedia engine supports VP9 and H.265 decoding at up to 8K 24 Hz, or 4K 90 Hz, and hardware-accelerated H.265 encoding at up to 4K 60 Hz.
Features: FidelityFX and Anti-Lag
With each new graphics architecture, gamers expect new image quality enhancement features. NVIDIA introduced DLSS, and AMD's response to that is FidelityFX, a combination of content-specific and image-specific quality enhancements. The first part of this is contrast-adaptive sharpening, which brings out details in a scene by enhancing their contrast. To work best, it requires game developers to declare which parts of the image are to be sharpened (like the HUD and on-screen texts). Details such as wear-lines on the slick tires of a race-car or hexagonal patterns on a wall come to life. We will test this feature later in a separate article.
AMD wants to improve its adoption by professional e-Sports gamers by addressing a key bottleneck with modern high-end graphics: mouse lag. This would be the amount of time taken for a click to register and a response to be rendered by the GPU. Radeon Anti-Lag is a CTR (click-to-response) enhancement that reduced mouse lag by roughly a third across various popular e-Sports titles. This setting is effectively identical to "pre-rendered" frames on NVIDIA. Modern GPUs calculate one or two frames ahead, so they can better time sending them to the monitor to avoid stuttering. Of course, this results in input lag because any input information that comes in only makes it to the screen one or two frames later.
Packaging and Contents
You will receive:
Graphics card
Documentation
The Card
MSI's Evoke looks great because of its champagne-gold-metallic color theme and straight lines. This lets it achieve clean visuals which somehow resemble that of NVIDIA's Titan RTX. On the back, you'll find a high-quality metal backplate that is integrated nicely with the front cooler design. Dimensions of the card are 23.5 x 13.0 cm.
Installation requires three slots in your system.
Display connectivity options include three standard DisplayPort 1.4a and an HDMI 2.0b.
AMD took the opportunity to update the display controllers handling these outputs by leveraging DSC 1.2a (display stream compression), which unlocks very high resolution and refresh-rate combinations over a single cable. Among the single-cable display modes supported are 8K 60 Hz (which took two DP 1.3 cables until now), 4K 240 Hz, and 1080p as high as 360 Hz. On top of these, the outputs support HDR and 30 bpc color-depth for better color accuracy in creative applications.
The board uses a 6-pin and an 8-pin power connector. This input configuration is specified for up to 300 watts of power draw.
Plugging in and out of the power cables is just as easy as on any other graphics card; the angle of the shot is slightly misleading—there is enough space.
AMD's Navi generation of GPUs no longer supports CrossFire. DirectX 12 does include its own set of multi-GPU capabilities, but implementation requires game developers to put serious development time into a feature only a tiny fraction of their customers might ever use.