The Arc B580 debuted Intel's second discrete gaming GPU architecture, codenamed Xe2 "Battlemage," in December 2024, and the new Arc B570 is the second discrete GPU based on it. A variant of Xe2 is used in the integrated graphics solution of Intel's Core Ultra 200V "Lunar Lake" mobile processors, the one today is its fully-fledged discrete gaming GPU version, with all hardware components enabled. The Arc B570 is a mid-range model based on the BMG-G21 silicon. The B570 is further cut down from the B580, which itself doesn't appear to max out the BMG-G21 silicon it is based on. The BMG-G21 is built on the TSMC N5 (5 nm EUV) foundry node, and packs 19.6 billion transistors across a 272 mm² monolithic die. The 5 nm node is contemporary, given that both NVIDIA "Ada" and AMD RDNA 3 gaming GPUs use it.
The BMG-G21 GPU features a PCI-Express 4.0 x8 host interface on the Arc B580 and B570. It is configured with a 192-bit GDDR6 memory bus on the B580, and a 160-bit bus on the B570. The GPU is organized in a very similar manner to modern GPUs from NVIDIA and AMD—a Global Dispatch processor distributes work among the five Render Slices, which talk to each other over the GPU's fabric and memory sub-system. The GPU's internal last-level cache is 18 MB in size. Besides the five Render Slices, there is the Media Engine, consisting of two MFX multi-format x-coders (encoders/decoders); and there are two sets of hardware encoders and decoders. Then there's the GDDR6 memory controller and the Display Engine, with four display interfaces.
Intel is claiming a 70% generational increase in performance per Xe Core, the indivisible number-crunching subunit of the GPU; and a 50% generational performance-per-watt increase. The above graphs illustrate the contribution of the individual sub-systems of the Xe2 Battlemage architecture toward these improvements; and how this plays out in a frametime analysis example of a real-world use case.
The Render Slice diagram (above) highlights the biggest chunk of the generational performance increase by Intel. It's thanks to increased IPC from the Xe Core, a more specialized and capable Ray Tracing Unit, a 300% faster Geometry engine, faster Sampler, 50% increase in HiZ, Z, stencil caches, and increases in performance of the pixel backends. Intel's engineering goal has been to reduce latency wherever it can, and reduce software (CPU) overhead as much as it can. The new second gen Xe Core features eight 512-bit vector engines, with SIMD16-native ALUs, and many more data formats. Rather than two sets of FP and INT units per vector engine, there is just one set of each per vector engine in Xe2, with larger numbers of ALUs.
Intel introduced its second gen Ray Tracing Unit, with massive generational improvements in performance and capability. It introduces a third Traversal Pipeline, which yields a 50% increase in box intersection performance. A second triangle intersection unit has been added to double the performance of triangle intersections. The BVH cache has doubled in size to 16 KB.
XeSS 2, Frame Generation, and Low Latency
Intel has codified the original XeSS as XeSS Super Resolution (XeSS-SR), as that's what it originally was—a performance enhancement that relies on super-resolution technology. The XeSS-SR SDK gets a new compute dispatcher backend for popular APIs—DirectX 11, DirectX 12, and Vulkan. There are two XeSS-SR models, the regular one, and a XeSS-SR Lite model for GPUs that lack XMX matrix acceleration capability.
XeSS 2 isn't a single technology, or an improvement over XeSS-SR, but a collection of three technologies—the existing XeSS-SR, which deals with performance; the new XeSS Frame Generation (XeSS-FG) technology, which nearly doubles frame rates based on intelligent frame doubling; and the new Xe Low Latency (XeLL) technology, which works to reduce the latency cost of SR and FG, but is something that can be used as a standalone whole-system latency technology, too.
XeSS-FG can either be implemented at native resolution, or in conjunction with XeSS-SR, where it is located right after the XeSS-SR step in the rendering queue. It relies on motion vectors, depth data, temporal frame data, and optical flow reprojection, to create interpolated frames that are then interleaved with the output frames, to effectively double the framerate. The interpolated image is then passed along to the next stage, where the HUD/UI is added at native resolution, and pushed to the frame buffer for output.
The SR + FG passes contribute to frame latency, and so, just as NVIDIA uses Reflex to counteract this latency, Intel innovated XeLL. The technology intelligently compacts the rendering queue to reduce the time it takes for an input to register as motion on-screen. XeLL remains enabled in all workloads the use XeSS-FG, but it can be used as a standalone feature, too. There's also an implicit driver-based low-latency mode that does this without a game having an explicit XeSS 2 or XeLL implementation.
Intel has updated its software package significantly. The new "Intel Graphics Software" replaces the "Arc Control" utility, and gives you a cleaner user interface. There are many new settings related to the display, including display scaling model/method/quantization range; 3D graphics settings, including a driver-based FPS limiter, the driver-based low-latency mode; and the exhaustive new Performance and Overclocking controls, which include the ability to set frequency offsets, tinker with the V/F curve, power limits, and GPU and memory clocks. It also integrates Intel's PresentMon metrics.