News Posts matching #AVX-512

Return to Keyword Browsing

AMD Formally Launches Ryzen Threadripper PRO 9000 and Radeon AI PRO 9000 Series

by

Jun 17th, 2025 09:43 Discuss (16 Comments)

AMD today formally launched a slew of new hardware for workstations. These include the new Ryzen Threadripper PRO 9000 "Shimada Peak" line of high core-count processors, and the Radeon AI PRO 9000 line of graphics cards that cover a range of use-cases spanning from edge AI acceleration to professional visualization. The Threadripper 9000 series is based on the "Zen 5" microarchitecture, and "Shimada Peak" is a variant of the "Turin" MCM powering 5th Gen EPYC processors, which comes with workstation-relevant I/O. Meanwhile, the Radeon AI Pro 9000 series is based on the same RDNA 4 graphics architecture powering the Radeon RX 9000 series gaming graphics cards.

The Ryzen Threadripper 9000 series comes with CPU core counts of up to 96-core/192-thread, with an IPC uplift from the "Zen 5" microarchitecture over the previous Threadripper 7000 series "Storm Peak" processors powered by "Zen 4." More than IPC, workstation users should benefit greatly from the architecture's full 512-bit FPU data-path, offering significant increases in performance of applications that leverage the AVX-512 instruction set. AMD also fine-tuned the IOD (I/O die) to support increased memory speeds of DDR5-6400 (native), AMD EXPO profiles, and CKD. With these changes, and minor increases in clock speeds for certain SKUs, AMD is promising a 16% uplift in performance over the Threadripper 7000 series predecessors in workstation benchmarks, and a significant 25% increase in SPEC Workstation AI and ML benchmarks (comparing identical core-counts and frequency).

Read full story

Hygon Prepares 128-Core, 512-Threaded x86 CPU with Four-Way SMT and AVX-512 Support

by

May 8th, 2025 09:00 Discuss (29 Comments)

Chinese server CPU maker Hygon, which owns a Zen core IP from AMD, has put a roadmap for C86-5G, its most powerful server processor to date, featuring up to 128 cores and an astonishing 512 threads. Thanks to a complete microarchitectural redesign, the new chip delivers more than 17 percent higher instructions per cycle (IPC) than its predecessor. It also supports the AVX-512 vector instruction set and four-way simultaneous multithreading, making it a strong contender for highly parallel workloads. Sixteen channels of DDR5-5600 memory feed data-intensive tasks, while CXL 2.0 interconnect support enables seamless scaling across multiple sockets. Built on an unknown semiconductor node, the C86-5G includes advanced power management and a hardened security engine. With 128 lanes of PCIe 5.0, it offers ample bandwidth for accelerators, NVMe storage, and high-speed networking. Hygon positions this flagship CPU as ideal for artificial intelligence training clusters, large-scale analytics platforms, and virtualized enterprise environments.

The C86-5G is the culmination of five years of steady development. The journey began with the C86-1G, an AMD-licensed design that served as a testbed for domestic engineers. It offered up to 32 cores, 64 threads, eight channels of DDR4-2666 memory, and 128 lanes of PCIe 3.0. Its goal was to absorb proven technology and build local know-how. Next came the C86-2G, which kept the same core count but introduced a revamped floating-point unit, 21 custom security instructions, and hardware-accelerated features for memory encryption, virtualization, and trusted computing. This model marked Hygon's first real step into independent research and development. With the C86-3G, Hygon rolled out a fully homegrown CPU core and system-on-chip framework. Memory support increased to DDR4-3200, I/O doubled to PCIe 4.0, and on-die networking included four 10 GbE and eight 1 GbE ports. The C86-4G raised the bar further by doubling compute density to 64 cores and 128 threads, boosting IPC by around 15 percent and adding 12-channel DDR5-4800 memory plus 128 lanes of PCIe 5.0. Socket options expanded to dual and quad configurations. Now, with the C86-5G, Hygon has shown it can compete head-to-head with global server CPU leaders, putting more faith in China's growing capabilities in high-performance computing.

Framework Dives Deep into Desktop Model's Deployment of Ryzen AI Max

Press Release by

Mar 13th, 2025 13:20 Discuss (12 Comments)

We dedicated a lot of our launch presentation of Framework Desktop to the Ryzen AI Max processor it uses, and for a good reason. These truly unique, ultra-high-performance parts are the culmination of decades of technology and architecture investments that AMD has made, going all the way back to their acquisition of ATI in 2006. For our first technical deep dive on Framework Desktop, we're going to go even deeper into Ryzen AI Max and what makes it a killer processor for gaming, workstation, and AI workloads.

What makes Ryzen AI Max special is a combination of three elements: full desktop-class Zen 5 CPU cores, a massive 40-CU Radeon RDNA 3.5 GPU, and a giant 256-bit LPDDR5x memory bus to feed the two, supporting up to 128 GB of memory. Chips and Cheese did an excellent technical overview of the processor with AMD that goes even deeper on this, and we'll pull out some of the highlights along with our own insights. We'll start with the CPUs. Ryzen AI Max supports up to 16 CPU cores split across two 4 nm FinFET dies that AMD calls CCDs. These dies are connected together using an extremely wide, low power, low latency bus across the package substrate. The CPUs are full Zen 5 cores with 512-bit FPUs and support for AVX-512, a vector processing instruction set otherwise only available on Intel's top end server CPUs. We're excited for you to see the multicore performance numbers these CPUs can do in our upcoming press review cycle!

Read full story

What the Intel-AMD x86 Ecosystem Advisory Group is, and What it's Not

Editorial by

Oct 16th, 2024 00:36 Discuss (47 Comments)

AVX-512 was proposed by Intel more than a decade ago—in 2013 to be precise. A decade later, the implementation of this instruction set on CPU cores remains wildly spotty—Intel implemented it first on an HPC accelerator, then its Xeon server processors, then its client processors, before realizing that hardware hasn't caught up with the technology to execute AVX-512 instructions in an energy-efficient manner, before deprecating it on the client. AMD implemented it just a couple of years ago with Zen 4 with a dual-pumped 256-bit FPU on 5 nm, before finally implementing a true 512-bit FPU on 4 nm. AVX-512 is a microcosm of what's wrong with the x86 ecosystem.

There are only two x86 CPU core vendors, the IP owner Intel, and its only surviving licensee capable of contemporary CPU cores, AMD. Any new additions to the ISA introduced by either of the two have to go through the grind of their duopolistic competition before software vendors could assume that there's a uniform install base to implement something new. x86 is a net-loser of this, and Arm is a net-winner. Arm Holdings makes no hardware of its own, except continuously developing the Arm machine architecture, and a first-party set of reference-design CPU cores that any licensee can implement. Arm's great march began with tiny embedded devices, before its explosion into client computing with smartphone SoCs. There are now Arm-based server processors, and the architecture is making inroads to the last market that x86 holds sway over—the PC. Apple's M-series processors compete with all segments of PC processors—right from the 7 W class, to the HEDT/workstation class. Qualcomm entered this space with its Snapdragon Elite family, and now Dell believes NVIDIA will take a swing at client processors in 2025. Then there's RISC-V. Intel finally did something it should have done two decades ago—set up a multi-brand Ecosystem Advisory Group. Here's what it is, and more importantly, what it's not.

Read full story

AMD Launches 5th Gen AMD EPYC CPUs, Maintaining Leadership Performance and Features for the Modern Data Center

Press Release by

Oct 10th, 2024 13:19 Discuss (24 Comments)

AMD (NASDAQ: AMD) today announced the availability of the 5th Gen AMD EPYC processors, formerly codenamed "Turin," the world's best server CPU for enterprise, AI and cloud. Using the "Zen 5" core architecture, compatible with the broadly deployed SP5 platform and offering a broad range of core counts spanning from 8 to 192, the AMD EPYC 9005 Series processors extend the record-breaking performance and energy efficiency of the previous generations with the top of stack 192 core CPU delivering up to 2.7X the performance compared to the competition.

New to the AMD EPYC 9005 Series CPUs is the 64 core AMD EPYC 9575F, tailor made for GPU powered AI solutions that need the ultimate in host CPU capabilities. Boosting up to 5 GHz, compared to the 3.8 GHz processor of the competition, it provides up to 28% faster processing needed to keep GPUs fed with data for demanding AI workloads.

Read full story

AMD MI300X Accelerators are Competitive with NVIDIA H100, Crunch MLPerf Inference v4.1

by

Aug 29th, 2024 02:24 Discuss (22 Comments)

The MLCommons consortium on Wednesday posted MLPerf Inference v4.1 benchmark results for popular AI inferencing accelerators available in the market, across brands that include NVIDIA, AMD, and Intel. AMD's Instinct MI300X accelerators emerged competitive to NVIDIA's "Hopper" H100 series AI GPUs. AMD also used the opportunity to showcase the kind of AI inferencing performance uplifts customers can expect from its next-generation EPYC "Turin" server processors powering these MI300X machines. "Turin" features "Zen 5" CPU cores, sporting a 512-bit FPU datapath, and improved performance in AI-relevant 512-bit SIMD instruction-sets, such as AVX-512, and VNNI. The MI300X, on the other hand, banks on the strengths of its memory sub-system, FP8 data format support, and efficient KV cache management.

The MLPerf Inference v4.1 benchmark focused on the 70 billion-parameter LLaMA2-70B model. AMD's submissions included machines featuring the Instinct MI300X, powered by the current EPYC "Genoa" (Zen 4), and next-gen EPYC "Turin" (Zen 5). The GPUs are backed by AMD's ROCm open-source software stack. The benchmark evaluated inference performance using 24,576 Q&A samples from the OpenORCA dataset, with each sample containing up to 1024 input and output tokens. Two scenarios were assessed: the offline scenario, focusing on batch processing to maximize throughput in tokens per second, and the server scenario, which simulates real-time queries with strict latency limits (TTFT ≤ 2 seconds, TPOT ≤ 200 ms). This lets you see the chip's mettle in both high-throughput and low-latency queries.

Read full story

Intel Launches Xeon W-3500 and W-2500 Series Workstation Processors

by

Aug 28th, 2024 11:45 Discuss (1 Comment)

Intel today launched its Xeon W-3500 series and Xeon W-2500 series workstation processors. These chips are based on the "Sapphire Rapids" microarchitecture featuring the enterprise version of "Golden Cove" P-cores. These are a refresh over the Xeon W-3400 series and W-2400 series, as they feature higher CPU core counts, L3 cache, and clock speeds, at given price-points. Intel has also slightly de-cluttered its lineup with this series. The key difference between the W-3500 series and the W-2500 series, is that the former comes with 8-channel DDR5 memory interface and 112 PCI-Express Gen 5 lanes; while the latter offers a 4-channel DDR5 memory interface, along with 64 PCI-Express Gen 5 lanes. The W-2500 series also comes with lower CPU core counts compared to the W-3500, which is somewhat made up for with higher CPU clock speeds. Perhaps the highlight of this refresh is that now Intel sells CPU core counts of up to 60-core/120-thread in the workstation segment. The W-3400 series had topped off at 36-core/72-thread.

The series is led by the Xeon W9-3595X. This beast maxes out the "Sapphire Rapids" chip, with a 60-core/120-thread configuration, with each of the 60 cores featuring 2 MB of dedicated L2 cache, and sharing 112.5 MB of L3 cache. The chip comes with a base frequency of 2.00 GHz, and a maximum boost frequency of 4.80 GHz. The next highest SKU sees a rather steep drop in core-counts, with the Xeon W9-3575X coming in with a 44-core/88-thread configuration, along with 97.5 MB of shared L3 cache, besides the 2 MB of dedicated L2 cache per core. This chip ticks at 2.20 GHz base, along with 4.80 GHz maximum boost. There's yet another steep drop in core-counts with the Xeon W7-3545, featuring a 24-core/48-thread configuration, 67.5 MB of shared L3 cache, 2.70 GHz base frequency, and 4.80 GHz maximum boost.

Read full story

FinalWire Releases AIDA64 v7.35 with New CheckMate 64-bit Benchmark

Press Release by

Jul 24th, 2024 02:02 Discuss (4 Comments)

FinalWire Ltd. today announced the immediate availability of AIDA64 Extreme 7.35 software, a streamlined diagnostic and benchmarking tool for home users; the immediate availability of AIDA64 Engineer 7.35 software, a professional diagnostic and benchmarking solution for corporate IT technicians and engineers; the immediate availability of AIDA64 Business 7.35 software, an essential network management solution for small and medium scale enterprises; and the immediate availability of AIDA64 Network Audit 7.35 software, a dedicated network audit toolset to collect and manage corporate network inventories. The new AIDA64 update introduces a new 64-bit CheckMate benchmark, AVX-512 accelerated benchmarks for AMD Ryzen AI APU, and supports the latest graphics and GPGPU computing technologies by AMD, Intel and NVIDIA.

DOWNLOAD: FinalWire AIDA64 v7.35 Extreme

Read full story

Intel Readies Arrow Lake-H Laptop CPU SKU with 24 Cores Based on Desktop Arrow Lake-S

by

Jun 25th, 2024 02:34 Discuss (9 Comments)

As Intel gears for the launch of Lunar Lake and Arrow Lake processors, the company appears to be preparing a new line of high-performance processors for gaming laptops. Recent developments suggest that the company is adapting its desktop-grade Arrow Lake-S chips for use in ultra-high-performance notebooks. The buzz began when X user @InstLatX64 spotted Intel testing a peculiar motherboard labeled "Arrow Lake Client Platform/ARL-S BGA SODIMM 2DPC." This discovery hints at the possibility of Intel packing up to 24 cores into laptop processors, eight more cores compared to the 16 cores expected in standard Arrow Lake-H mobile chips. By utilizing the full potential of Arrow Lake-S silicon in a mobile form factor, Intel aims to deliver desktop-class performance to high-end notebooks in a BGA laptop CPU.

The leaked chip would likely feature eight high-performance Lion Cove P-cores and 16 energy-efficient Skymont E-cores, along with an integrated Xe2 GPU. This configuration could provide the raw power needed for demanding games and professional applications in a portable package. However, implementing such powerful hardware in laptops presents challenges. The processors are expected to have a TDP of 45 W or 55 W, with actual power consumption potentially exceeding these figures to maintain high clock speeds. Success will depend not only on Intel's chip design but also on the cooling solutions and power delivery systems developed by laptop manufacturers. As of now, specific details about clock speeds and performance metrics remain under wraps. The test chip that surfaced showed a base frequency of 3.0 GHz, notably without AVX-512 support.

AMD Zen 5 Powered Ryzen AI 300 Series Mobile Processors Supercharge Next Gen Copilot+ AI PCs

Computex by

Jun 2nd, 2024 21:36 Discuss (16 Comments)

AMD today launched its Ryzen AI 300 series mobile processors, codenamed "Strix Point." These chips implement a combination of the AMD "Zen 5" microarchitecture for the CPU cores, the XDNA 2 architecture for its powerful new NPU, and the RDNA 3+ graphics architecture for its 33% faster iGPU. The new "Zen 5" microarchitecture provides a 16% generational IPC uplift over "Zen 4" on the backs of several front-end enhancements, wider execution pipelines, more intra core bandwidth, and a revamped FPU that doubles performance of AI and AVX-512 workloads. AMD didn't go in-depth with the microarchitecture, but the broad points of "Zen 5" are detailed in our article for the Ryzen 9000 "Granite Ridge" desktop processors. Not only is AMD using these faster "Zen 5" CPU cores, but also increased the CPU core count by 50%, for a maximum of 12-core/24-thread.

The "Strix Point" monolithic silicon is built on the 4 nm foundry node, and packs a CPU core complex (CCX) with 12 CPU cores, four of these are "Zen 5," which can achieve the highest possible boost frequencies, the other eight are "Zen 5c" cores that feature an identical IPC and the full ISA, including support for SMT; but don't boost as high as the "Zen 5" cores. AMD is claiming a productivity performance increase ranging between 4% and 73% for its top model based in the series, when compared to Intel's Core Ultra 9 185H "Meteor Lake" processor. The iGPU sees its compute unit (CU) count go all the way up to 16 from 12 in the previous generation, and this yields a claimed 33% increase in iGPU gaming performance compared to the integrated Arc graphics of the Core Ultra 9 185H. Lastly, the XDNA 2 NPU sees more that triple the AI inference performance to 50 AI TOPS, compared to the 16 TOPS of the Ryzen 8040 "Hawk Point" processor, and 12 TOPS of Core Ultra "Meteor Lake." This makes the processor meet Microsoft's Copilot+ AI PC requirements.

Read full story

FinalWire Releases AIDA64 v7.30

Press Release by

May 28th, 2024 00:13 Discuss (14 Comments)

FinalWire Ltd. today announced the immediate availability of AIDA64 Extreme 7.30 software, a streamlined diagnostic and benchmarking tool for home users; the immediate availability of AIDA64 Engineer 7.30 software, a professional diagnostic and benchmarking solution for corporate IT technicians and engineers; the immediate availability of AIDA64 Business 7.30 software, an essential network management solution for small and medium scale enterprises; and the immediate availability of AIDA64 Network Audit 7.30 software, a dedicated network audit toolset to collect and manage corporate network inventories.

AIDA64 update brings numerous improvements and optimizations for dark mode and high contrast mode, enhances speed, and supports the latest graphics and GPGPU computing technologies by AMD, Intel, and NVIDIA.

DOWNLOAD: FinalWire AIDA64 Extreme 7.30

Read full story

MSI - Micro-Star International

MSI First with Motherboard BIOS that Supports Ryzen 9000 "Zen 5" Processors

by

Apr 15th, 2024 00:58 Discuss (34 Comments)

In yet another clear sign that we could see some action from AMD on the next-gen desktop processor front this Computex, motherboard maker MSI released its first beta UEFI firmware update that packs an AGESA microcode that reportedly supports the upcoming AMD Ryzen 9000 "Granite Ridge" processors. The "7D74v1D2 beta" firmware update for the MSI MPG B650 Carbon Wi-Fi motherboard encapsulates AGESA ComboPI 1.1.7.0 patch-A, with the description that it supports a "next-gen CPU," a reference to the Ryzen 9000 "Granite Ridge."

A successor to the Ryzen 7000 Raphael, the Ryzen 9000 Granite Ridge introduces the new "Zen 5" microarchitecture to the desktop platform, with CPU core counts remaining up to 16-core/32-thread. The new microarchitecture is expected to introduce generational increase in IPC, as well as improve performance of certain exotic workloads such as AVX-512. The processors are said to be launching alongside the new AMD 800-series motherboard chipset. If AMD is using Computex as a platform to showcase these processors, it's likely we might see the first of these motherboards as well.

AMD Ryzen 9000 "Granite Ridge" Zen 5 Processor Pictured

by

Apr 8th, 2024 01:17 Discuss (31 Comments)

An alleged picture of an unreleased AMD Ryzen 9000 series "Granite Ridge" desktop processor, just hit the wires. "Granite Ridge" is codename for the desktop implementation of the "Zen 5" microarchitecture, it succeeds the current Ryzen 7000 "Raphael" that's powered by "Zen 4." From what we're hearing, the CPU core counts of "Granite Ridge" continue to top out at 16. These chips will be built in the existing AMD Socket AM5 package, and will be compatible with existing AMD 600-series chipset motherboards, although the company is working on a new motherboard chipset to go with the new chips.

The alleged AMD engineering sample pictured below has an OPN 100-000001290-11, which is unreleased. This OPN also showed up on an Einstein@Home online database, where the distributed computing platform read it as having 16 threads, making this possibly an 8-core/16-thread SKU. The "Zen 5" microarchitecture is expected to provide a generational IPC increase over "Zen 4," but more importantly, offer a significant performance increase for AVX-512 workloads due to an updated FPU. AMD is expected to unveil its Ryzen 9000 series "Zen 5" processors at the 2024 Computex.

AMD Zen 5 Execution Engine Leaked, Features True 512-bit FPU

Updated by

Apr 4th, 2024 21:11 Updated: Apr 5th, 2024 02:04 Discuss (63 Comments)

AMD "Zen 5" CPU microarchitecture will introduce a significant performance increase for AVX-512 workloads, with some sources reported as high as 40% performance increases over "Zen 4" in benchmarks that use AVX-512. A Moore's Law is Dead report detailing the execution engine of "Zen 5" holds the answer to how the company managed this—using a true 512-bit FPU. Currently, AMD uses a dual-pumped 256-bit FPU to execute AVX-512 workloads on "Zen 4." The updated FPU should significantly improve the core's performance in workloads that take advantage of 512-bit AVX or VNNI instructions, such as AI.

Giving "Zen 5" a 512-bit FPU meant that AMD also had to scale up the ancillaries—all the components that keep the FPU fed with data and instructions. The company therefore increased the capacity of the L1 DTLB. The load-store queues have been widened to meet the needs of the new FPU. The L1 Data cache has been doubled in bandwidth, and increased in size by 50%. The L1D is now 48 KB in size, up from 32 KB in "Zen 4." FPU MADD latency has been reduced by 1 cycle. Besides the FPU, AMD also increased the number of Integer execution pipes to 10, from 8 on "Zen 4." The exclusive L2 cache per core remains 1 MB in size.

Update 07:02 UTC: Moore's Law is Dead reached out to us and said that the slide previously posted by them, which we had used in an earlier version of this article, is fake, but said that the information contained in that slide is correct, and that they stand by the information.

Qubic Cryptocurrency Mining Craze Causes AMD Ryzen 9 7950X Stocks to Evaporate

by

Mar 13th, 2024 00:47 Discuss (61 Comments)

It looks like cryptocurrency mining is back in craze, as miners are firing up their old mining hardware from 2022 to cash in. Bitcoin is now north of $72,000, and is dragging up the value of several other cryptocurrencies, one such being Qubic (QBIC). Profitability calculators put 24 hours of Qubic mining on an AMD Ryzen 9 7950X 16-core processor at around $3, after subtracting energy costs involved in running the chip at its default 170 W TDP. "Zen 4" processors such as the 7950X tend to retain much of their performance with slight underclocking, and reducing their power limits; which is bound to hold or increase profitability, while also prolonging the life of the hardware.

And thus, the inevitable has happened—stocks of the AMD Ryzen 9 7950X have disappeared overnight across online retail. With the market presence of the 7950X3D and the Intel Core i9-14900K, the 7950X was typically found between $550-600, which would have added great value considering its low input costs. CPU-based cryptocurrency miners, including the QBIC miner, appear to be taking advantage of the AVX-512 instruction set. AMD "Zen 4" microarchitecture supports AVX-512 through its dual-pumped 256-bit FPU, and the upcoming "Zen 5" microarchitecture is rumored to double AVX-512 performance over "Zen 4." Meanwhile, Intel has deprecated what few client-relevant AVX-512 instructions its Core processors had since 12th Gen "Alder Lake," as it reportedly affected sales of Xeon processors. What about the 7950X3D? It's pricier, but mining doesn't benefit from the 3D V-cache, and the chip doesn't sustain the kind of CPU clocks the 7950X manages to do across all its 16 cores. It's only a matter of time before the 7950X3D disappears, too; followed by 12-core models such as the 65 W 7900, the 170 W 7900X, and the 7900X3D.

Google: CPUs are Leading AI Inference Workloads, Not GPUs

by

Mar 4th, 2024 00:42 Discuss (6 Comments)

The AI infrastructure of today is mostly fueled by the expansion that relies on GPU-accelerated servers. Google, one of the world's largest hyperscalers, has noted that CPUs are still a leading compute for AI/ML workloads, recorded on their Google Cloud Services cloud internal analysis. During the TechFieldDay event, a speech by Brandon Royal, product manager at Google Cloud, explained the position of CPUs in today's AI game. The AI lifecycle is divided into two parts: training and inference. During training, massive compute capacity is needed, along with enormous memory capacity, to fit ever-expanding AI models into memory. The latest models, like GPT-4 and Gemini, contain billions of parameters and require thousands of GPUs or other accelerators working in parallel to train efficiently.

On the other hand, inference requires less compute intensity but still benefits from acceleration. The pre-trained model is optimized and deployed during inference to make predictions on new data. While less compute is needed than training, latency and throughput are essential for real-time inference. Google found out that, while GPUs are ideal for the training phase, models are often optimized and run inference on CPUs. This means that there are customers who choose CPUs as their medium of AI inference for a wide variety of reasons.

Read full story

AMD Zen 5 Details Emerge with GCC "Znver5" Patch: New AVX Instructions, Larger Pipelines

by

Feb 12th, 2024 01:00 Discuss (29 Comments)

AMD's upcoming family of Ryzen 9000 series of processors on the AM5 platform will carry a new silicon SKU under the hood—Zen 5. The latest revision of AMD's x86-64 microarchitecture will feature a few interesting improvements over its current Zen 4 that it is replacing, targeting the rumored 10-15% IPC improvement. Thanks to the latest set of patches for GNU Compiler Collection (GCC), we have the patch set that proposes changes taking place with "znver5" enablement. One of the most interesting additions to the Zen 5 over the previous Zen 4 is the expansion of the AVX instruction set, mainly new AVX and AVX-512 instructions: AVX-VNNI, MOVDIRI, MOVDIR64B, AVX512VP2INTERSECT, and PREFETCHI.

AVX-VNNI is a 256-bit vector version of the AVX-512 VNNI instruction set that accelerates neural network inferencing workloads. AVX-VNNI delivers the same VNNI instruction set for CPUs that support 256-bit vectors but lack full 512-bit AVX-512 capabilities. AVX-VNNI effectively extends useful VNNI instructions for AI acceleration down to 256-bit vectors, making the technology more efficient. While narrow in scope (no opmasking and extra vector register access compared to AVX-512 VNNI), AVX-VNNI is crucial in spreading VNNI inferencing speedups to real-world CPUs and applications. The new AVX-512 VP2INTERSECT instruction is also making it in Zen 5, as noted above, which has been present only in Intel Tiger Lake processor generation, and is now considered deprecated for Intel SKUs. We don't know the rationale behind this inclusion, but AMD sure had a use case for it.

Read full story

AMD Ryzen 7 8840U APU Benched in GPD Win Max 2 Handheld

by

Jan 22nd, 2024 08:36 Discuss (7 Comments)

GPD has disclosed to ITHome that a specification refresh of its Win Max 2 handheld/mini-laptop gaming PC is incoming—this model debuted last year with Ryzen 7040 "Phoenix" APUs sitting in the driver's seat. A company representative provided a sneak peek of an upgraded device that sports a Team Red Ryzen 8040 series "Hawk Point" mobile processor, and a larger pool of system memory (32 GB versus the 2023 model's 16 GB). The refreshed GPD Win Max 2's Ryzen 7 8840U APU was compared to the predecessor's Ryzen 7 7840U in CPU-Z benchmarks (standard and AX-512)—the results demonstrate a very slight difference in performance between generations.

The 8040 and 7040 APUs share the same "Phoenix" basic CPU design (8-cores + 16-threads) based on the prevalent "Zen 4" microarchitecture, plus an integration of AMD's Radeon 780M GPU. The former's main upgrade lies in its AI-crunching capabilities—a deployment of Team Red's XDNA AI engine. Ryzen 8040's: "NPU performance has been increased to 16 TOPS, compared to 10 TOPS of the NPU on the 'Phoenix' silicon. AMD is taking a whole-of-silicon approach to AI acceleration, which includes not just the NPU, but also the 'Zen 4' CPU cores that support the AVX-512 VNNI instruction set that's relevant to AI; and the iGPU based on the RDNA 3 graphics architecture, with each of its compute unit featuring two AI accelerators, components that make the SIMD cores crunch matrix math. The whole-of-silicon performance figures for "Phoenix" is 33 TOPS; while 'Hawk Point' boasts of 39 TOPS. In benchmarks by AMD, 'Hawk Point' is shown delivering a 40% improvement in vision models, and Llama 2, over the Ryzen 7040 "Phoenix" series."

Read full story

AVX-512 Doubles Intel 5th Gen "Emerald Rapids" Xeon Processor Performance, Up to 10x Improvement in AI Workloads

by

Jan 8th, 2024 03:15 Discuss (30 Comments)

According to the latest round of tests by Phoronix, we are seeing proof of substantial performance gains Intel's 5th Gen Xeon Emerald Rapids server CPUs deliver when employing AVX-512 vector instructions. Enabling AVX-512 doubled throughput on average across a range of workloads, with specific AI tasks accelerating over 10x versus having it disabled. Running on the top-end 64-core Platinum 8592+ SKU, benchmarks saw minimal frequency differences between AVX-512 on and off states. However, the specialized 512-bit vector processing unlocked dramatic speedups, exemplified in the OpenVINO AI framework. Specifically, weld porosity detection, which has real-world applications, showed the biggest speedups. Power draw also increased moderately - the expected tradeoff for such an unconstrained performance upside.

With robust optimizations, the vector engine potential has now been fully demonstrated. Workloads spanning AI, visualization, simulation, and analytics could multiply speed by upgrading to Emerald Rapids. Of course, developer implementation work remains non-trivial. But for the data center applications that can take advantage, AVX-512 enables Intel to partially close raw throughput gaps versus AMD's core count leadership. Whether those targeted acceleration gains offset EPYC's wider general-purpose value depends on customer workloads. But with tests proving dramatic upside, Intel is betting big on vector acceleration as its ace card. AMD also supports the AVX-512 instruction set. Below, you can find the geometric mean of all test results, and check the review with benchmarks here.

AMD Ryzen 8040 Series "Hawk Point" Mobile Processors Announced with a Faster NPU

by

Dec 6th, 2023 14:00 Discuss (19 Comments)

AMD today announced the new Ryzen 8040 mobile processor series codenamed "Hawk Point." These chips are shipping to notebook manufacturers now, and the first notebooks powered by these should be available to consumers in Q1-2024. At the heart of this processor is a significantly faster neural processing unit (NPU), designed to accelerate AI applications that will become relevant next year, as Microsoft prepares to launch Windows 12, and software vendors make greater use of generative AI in consumer applications.

The Ryzen 8040 "Hawk Point" processor is almost identical in design and features to the Ryzen 7040 "Phoenix," except for a faster Ryzen AI NPU. While this is based on the same first-generation XDNA architecture, its NPU performance has been increased to 16 TOPS, compared to 10 TOPS of the NPU on the "Phoenix" silicon. AMD is taking a whole-of-silicon approach to AI acceleration, which includes not just the NPU, but also the "Zen 4" CPU cores that support the AVX-512 VNNI instruction set that's relevant to AI; and the iGPU based on the RDNA 3 graphics architecture, with each of its compute unit featuring two AI accelerators, components that make the SIMD cores crunch matrix math. The whole-of-silicon performance figures for "Phoenix" is 33 TOPS; while "Hawk Point" boasts of 39 TOPS. In benchmarks by AMD, "Hawk Point" is shown delivering a 40% improvement in vision models, and Llama 2, over the Ryzen 7040 "Phoenix" series.

Read full story

FinalWire Releases AIDA64 v7.00 with Revamped Design and AMD Threadripper 7000 Optimizations

Press Release by

Dec 5th, 2023 04:10 Discuss (6 Comments)

FinalWire Ltd. today announced the immediate availability of AIDA64 Extreme 7.00 software, a streamlined diagnostic and benchmarking tool for home users; the immediate availability of AIDA64 Engineer 7.00 software, a professional diagnostic and benchmarking solution for corporate IT technicians and engineers; the immediate availability of AIDA64 Business 7.00 software, an essential network management solution for small and medium scale enterprises; and the immediate availability of AIDA64 Network Audit 7.00 software, a dedicated network audit toolset to collect and manage corporate network inventories.

The new AIDA64 update introduces a revamped user interface with a configurable toolbar, as well as AVX-512 accelerated benchmarks for AMD Threadripper 7000 processors, AVX2 optimized benchmarks for Intel Meteor Lake processors, and supports the latest AMD and Intel CPU platforms as well as the new graphics and GPGPU computing technologies by AMD, Intel and NVIDIA.

DOWNLOAD: FinalWire AIDA64 Extreme v7.0

Read full story

FinalWire AIDA64 v6.92 Released

Press Release by

Sep 12th, 2023 01:29 Discuss (1 Comment)

FinalWire Ltd. today announced the immediate availability of AIDA64 Extreme 6.92 software, a streamlined diagnostic and benchmarking tool for home users; the immediate availability of AIDA64 Engineer 6.92 software, a professional diagnostic and benchmarking solution for corporate IT technicians and engineers; the immediate availability of AIDA64 Business 6.92 software, an essential network management solution for small and medium scale enterprises; and the immediate availability of AIDA64 Network Audit 6.92 software, a dedicated network audit toolset to collect and manage corporate network inventories.

The latest AIDA64 update introduces AVX-512 optimized benchmarks for Intel Sapphire Rapids processors, and supports the latest AMD and Intel CPU platforms as well as the new graphics and GPGPU computing technologies by AMD, Intel and NVIDIA.,

DOWNLOAD: FinalWire AIDA64 Extreme v6.92

Read full story

"Downfall" Intel CPU Vulnerability Can Impact Performance By 50%

by

Aug 11th, 2023 09:06 Discuss (162 Comments)

Intel has recently revealed a security vulnerability named Downfall (CVE-2022-40982) that impacts multiple generations of Intel processors. The vulnerability is linked to Intel's memory optimization feature, exploiting the Gather instruction, a function that accelerates data fetching from scattered memory locations. It inadvertently exposes internal hardware registers, allowing malicious software access to data held by other programs. The flaw affects Intel mainstream and server processors ranging from the Skylake to Rocket Lake microarchitecture. The entire list of affected CPUs is here. Intel has responded by releasing updated software-level microcode to fix the flaw. However, there's concern over the performance impact of the fix, potentially affecting AVX2 and AVX-512 workloads involving the Gather instruction by up to 50%.

Phoronix tested the Downfall mitigations and reported varying performance decreases on different processors. For instance, two Xeon Platinum 8380 processors were around 6% slower in certain tests, while the Core i7-1165G7 faced performance degradation ranging from 11% to 39% in specific benchmarks. While these reductions were less than Intel's forecasted 50% overhead, they remain significant, especially in High-Performance Computing (HPC) workloads. The ramifications of Downfall are not restricted to specialized tasks like AI or HPC but may extend to more common applications such as video encoding. Though the microcode update is not mandatory and Intel provides an opt-out mechanism, users are left with a challenging decision between security and performance. Executing a Downfall attack might seem complex, but the final choice between implementing the mitigation or retaining performance will likely vary depending on individual needs and risk assessments.

Intel Previews AVX10 ISA, Next-Gen E-Cores to get AVX-512 Capabilities

by

Jul 25th, 2023 10:11 Discuss (17 Comments)

Intel has published a preview article covering its new AVX10 ISA (Instruction Set Architecture)—the announcement reveals that both P-Cores & E-Cores (on next-gen processors) will be getting support for AVX-512. Team Blue stated: "Intel AVX10 represents a major shift to supporting a high-performance vector ISA across future Intel processors. It allows the developer to maintain a single code-path that achieves high performance across all Intel platforms with the minimum of overhead checking for feature support. Future development of the Intel AVX10 ISA will continue to provide a rich, flexible, and consistent environment that optimally supports both Server and Client products."

Due to technical issues (E-core related), Intel decided to disable AVX-512 for Alder Lake and Raptor Lake client-oriented CPU lineups. AMD has recently adopted the fairly new instruction set for its Ryzen 7040 mobile series, so it is no wonder that Team Blue is attempting to reintroduce it in the near future—AVX-512 was last seen working properly on Rocket and Tiger Lake chips. AVX10 implementation is expected to debut with Granite Rapids (according to Longhorn), and VideoCardz reckons that Intel will get advanced instructions for Efficiency cores working with its Clearwater Forest CPU architecture.

AMD Ryzen 7040 Series Phoenix APUs Surprisingly Performant with AVX-512 Workloads

by

Jul 17th, 2023 12:25 Discuss (10 Comments)

Intel decided to drop the relatively new AVX-512 instruction set for laptop/mobile platforms when it was discovered that it would not work in conjunction with their E-core designs. Alder Lake was the last generation to (semi) support these sets thanks to P-cores agreeing to play nice, albeit with the efficiency side of proceedings disabled (via BIOS settings). Intel chose to fuse off AVX-512 support in production circa early 2022, with AMD picking up the slack soon after and working on the integration of AVX-512 into Zen 4 CPU architecture. The Ryzen 7040 series is the only current generation mobile platform that offers AVX-512 support. Phoronix decided to benchmark a Ryzen 7 7840U against older Intel i7-1165G7 (Tiger Lake) and i7-1065G7 (Ice Lake) SoCs in AVX-512-based workloads.

Team Red's debut foray into AVX-512 was surprisingly performant according to Phoronix's test results—the Ryzen 7 7840U did very well for itself. It outperformed the 1165G7 by 46%, and the older 1065G7 by an impressive 63%. The Ryzen 7 APU was found to attain the highest performance gain with AVX-512 enabled—a 54% performance margin over operating with AVX-512 disabled. In comparison Phoronix found that: "the i7-1165G7 Tiger Lake impact came in at 34% with these AVX-512-heavy benchmarks or 35% with the i7-1065G7 Ice Lake SoC for that generation where AVX-512 on Intel laptops became common."

Read full story

Return to Keyword Browsing

Jul 5th, 2025 15:57 CDT change timezone

Latest GPU Drivers

New Forum Posts

15:44 by sepheronx
Stalker 2 is looking great. (187)
15:33 by phanbuey
Frametime spikes and stuttering after switching to AMD CPU? (520)
15:26 by R-T-B
Do you use Linux? (676)
15:17 by ir_cow
b550m aorus elite not posting with new ram (7)
15:11 by Nathaaab
Gigabyte graphic cards - TIM gel SLIPPAGE problem (131)
14:55 by Count von Schwalbe
Can you guess Which game it is? (203)
14:54 by sudothelinuxwizard
How do you view TPU & the internet in general? (With poll) (74)
14:11 by bielfrbfr
EVGA XC GTX 1660 Ti 8GB ROM (9)
13:58 by Princess Garnet
What are you playing? (23892)
13:47 by maxfly
AMD RX 9070 XT & RX 9070 non-XT thread (OC, undervolt, benchmarks, ...) (155)

Popular Reviews

Jul 4th, 2025 NVIDIA GeForce RTX 5050 8 GB Review
Jul 3rd, 2025 Fractal Design Scape Review - Debut Done Right
Jul 1st, 2025 Crucial T710 2 TB Review - Record-Breaking Gen 5
Jun 30th, 2025 ASUS ROG Crosshair X870E Extreme Review
Jul 2nd, 2025 PowerColor ALPHYN AM10 Review
Jun 20th, 2025 Sapphire Radeon RX 9060 XT Pulse OC 16 GB Review - An Excellent Choice
May 13th, 2025 Upcoming Hardware Launches 2025 (Updated May 2025)
Nov 6th, 2024 AMD Ryzen 7 9800X3D Review - The Best Gaming Processor
Mar 5th, 2025 Sapphire Radeon RX 9070 XT Nitro+ Review - Beating NVIDIA
May 27th, 2025 NVIDIA GeForce RTX 5060 8 GB Review

TPU on YouTube

Controversial News Posts