News Posts matching #AVX-512

Return to Keyword Browsing

What the Intel-AMD x86 Ecosystem Advisory Group is, and What it's Not

AVX-512 was proposed by Intel more than a decade ago—in 2013 to be precise. A decade later, the implementation of this instruction set on CPU cores remains wildly spotty—Intel implemented it first on an HPC accelerator, then its Xeon server processors, then its client processors, before realizing that hardware hasn't caught up with the technology to execute AVX-512 instructions in an energy-efficient manner, before deprecating it on the client. AMD implemented it just a couple of years ago with Zen 4 with a dual-pumped 256-bit FPU on 5 nm, before finally implementing a true 512-bit FPU on 4 nm. AVX-512 is a microcosm of what's wrong with the x86 ecosystem.

There are only two x86 CPU core vendors, the IP owner Intel, and its only surviving licensee capable of contemporary CPU cores, AMD. Any new additions to the ISA introduced by either of the two have to go through the grind of their duopolistic competition before software vendors could assume that there's a uniform install base to implement something new. x86 is a net-loser of this, and Arm is a net-winner. Arm Holdings makes no hardware of its own, except continuously developing the Arm machine architecture, and a first-party set of reference-design CPU cores that any licensee can implement. Arm's great march began with tiny embedded devices, before its explosion into client computing with smartphone SoCs. There are now Arm-based server processors, and the architecture is making inroads to the last market that x86 holds sway over—the PC. Apple's M-series processors compete with all segments of PC processors—right from the 7 W class, to the HEDT/workstation class. Qualcomm entered this space with its Snapdragon Elite family, and now Dell believes NVIDIA will take a swing at client processors in 2025. Then there's RISC-V. Intel finally did something it should have done two decades ago—set up a multi-brand Ecosystem Advisory Group. Here's what it is, and more importantly, what it's not.

AMD Launches 5th Gen AMD EPYC CPUs, Maintaining Leadership Performance and Features for the Modern Data Center

AMD (NASDAQ: AMD) today announced the availability of the 5th Gen AMD EPYC processors, formerly codenamed "Turin," the world's best server CPU for enterprise, AI and cloud. Using the "Zen 5" core architecture, compatible with the broadly deployed SP5 platform and offering a broad range of core counts spanning from 8 to 192, the AMD EPYC 9005 Series processors extend the record-breaking performance and energy efficiency of the previous generations with the top of stack 192 core CPU delivering up to 2.7X the performance compared to the competition.

New to the AMD EPYC 9005 Series CPUs is the 64 core AMD EPYC 9575F, tailor made for GPU powered AI solutions that need the ultimate in host CPU capabilities. Boosting up to 5 GHz, compared to the 3.8 GHz processor of the competition, it provides up to 28% faster processing needed to keep GPUs fed with data for demanding AI workloads.

AMD MI300X Accelerators are Competitive with NVIDIA H100, Crunch MLPerf Inference v4.1

The MLCommons consortium on Wednesday posted MLPerf Inference v4.1 benchmark results for popular AI inferencing accelerators available in the market, across brands that include NVIDIA, AMD, and Intel. AMD's Instinct MI300X accelerators emerged competitive to NVIDIA's "Hopper" H100 series AI GPUs. AMD also used the opportunity to showcase the kind of AI inferencing performance uplifts customers can expect from its next-generation EPYC "Turin" server processors powering these MI300X machines. "Turin" features "Zen 5" CPU cores, sporting a 512-bit FPU datapath, and improved performance in AI-relevant 512-bit SIMD instruction-sets, such as AVX-512, and VNNI. The MI300X, on the other hand, banks on the strengths of its memory sub-system, FP8 data format support, and efficient KV cache management.

The MLPerf Inference v4.1 benchmark focused on the 70 billion-parameter LLaMA2-70B model. AMD's submissions included machines featuring the Instinct MI300X, powered by the current EPYC "Genoa" (Zen 4), and next-gen EPYC "Turin" (Zen 5). The GPUs are backed by AMD's ROCm open-source software stack. The benchmark evaluated inference performance using 24,576 Q&A samples from the OpenORCA dataset, with each sample containing up to 1024 input and output tokens. Two scenarios were assessed: the offline scenario, focusing on batch processing to maximize throughput in tokens per second, and the server scenario, which simulates real-time queries with strict latency limits (TTFT ≤ 2 seconds, TPOT ≤ 200 ms). This lets you see the chip's mettle in both high-throughput and low-latency queries.

Intel Launches Xeon W-3500 and W-2500 Series Workstation Processors

Intel today launched its Xeon W-3500 series and Xeon W-2500 series workstation processors. These chips are based on the "Sapphire Rapids" microarchitecture featuring the enterprise version of "Golden Cove" P-cores. These are a refresh over the Xeon W-3400 series and W-2400 series, as they feature higher CPU core counts, L3 cache, and clock speeds, at given price-points. Intel has also slightly de-cluttered its lineup with this series. The key difference between the W-3500 series and the W-2500 series, is that the former comes with 8-channel DDR5 memory interface and 112 PCI-Express Gen 5 lanes; while the latter offers a 4-channel DDR5 memory interface, along with 64 PCI-Express Gen 5 lanes. The W-2500 series also comes with lower CPU core counts compared to the W-3500, which is somewhat made up for with higher CPU clock speeds. Perhaps the highlight of this refresh is that now Intel sells CPU core counts of up to 60-core/120-thread in the workstation segment. The W-3400 series had topped off at 36-core/72-thread.

The series is led by the Xeon W9-3595X. This beast maxes out the "Sapphire Rapids" chip, with a 60-core/120-thread configuration, with each of the 60 cores featuring 2 MB of dedicated L2 cache, and sharing 112.5 MB of L3 cache. The chip comes with a base frequency of 2.00 GHz, and a maximum boost frequency of 4.80 GHz. The next highest SKU sees a rather steep drop in core-counts, with the Xeon W9-3575X coming in with a 44-core/88-thread configuration, along with 97.5 MB of shared L3 cache, besides the 2 MB of dedicated L2 cache per core. This chip ticks at 2.20 GHz base, along with 4.80 GHz maximum boost. There's yet another steep drop in core-counts with the Xeon W7-3545, featuring a 24-core/48-thread configuration, 67.5 MB of shared L3 cache, 2.70 GHz base frequency, and 4.80 GHz maximum boost.

FinalWire Releases AIDA64 v7.35 with New CheckMate 64-bit Benchmark

FinalWire Ltd. today announced the immediate availability of AIDA64 Extreme 7.35 software, a streamlined diagnostic and benchmarking tool for home users; the immediate availability of AIDA64 Engineer 7.35 software, a professional diagnostic and benchmarking solution for corporate IT technicians and engineers; the immediate availability of AIDA64 Business 7.35 software, an essential network management solution for small and medium scale enterprises; and the immediate availability of AIDA64 Network Audit 7.35 software, a dedicated network audit toolset to collect and manage corporate network inventories. The new AIDA64 update introduces a new 64-bit CheckMate benchmark, AVX-512 accelerated benchmarks for AMD Ryzen AI APU, and supports the latest graphics and GPGPU computing technologies by AMD, Intel and NVIDIA.

DOWNLOAD: FinalWire AIDA64 v7.35 Extreme

Intel Readies Arrow Lake-H Laptop CPU SKU with 24 Cores Based on Desktop Arrow Lake-S

As Intel gears for the launch of Lunar Lake and Arrow Lake processors, the company appears to be preparing a new line of high-performance processors for gaming laptops. Recent developments suggest that the company is adapting its desktop-grade Arrow Lake-S chips for use in ultra-high-performance notebooks. The buzz began when X user @InstLatX64 spotted Intel testing a peculiar motherboard labeled "Arrow Lake Client Platform/ARL-S BGA SODIMM 2DPC." This discovery hints at the possibility of Intel packing up to 24 cores into laptop processors, eight more cores compared to the 16 cores expected in standard Arrow Lake-H mobile chips. By utilizing the full potential of Arrow Lake-S silicon in a mobile form factor, Intel aims to deliver desktop-class performance to high-end notebooks in a BGA laptop CPU.

The leaked chip would likely feature eight high-performance Lion Cove P-cores and 16 energy-efficient Skymont E-cores, along with an integrated Xe2 GPU. This configuration could provide the raw power needed for demanding games and professional applications in a portable package. However, implementing such powerful hardware in laptops presents challenges. The processors are expected to have a TDP of 45 W or 55 W, with actual power consumption potentially exceeding these figures to maintain high clock speeds. Success will depend not only on Intel's chip design but also on the cooling solutions and power delivery systems developed by laptop manufacturers. As of now, specific details about clock speeds and performance metrics remain under wraps. The test chip that surfaced showed a base frequency of 3.0 GHz, notably without AVX-512 support.

AMD Zen 5 Powered Ryzen AI 300 Series Mobile Processors Supercharge Next Gen Copilot+ AI PCs

AMD today launched its Ryzen AI 300 series mobile processors, codenamed "Strix Point." These chips implement a combination of the AMD "Zen 5" microarchitecture for the CPU cores, the XDNA 2 architecture for its powerful new NPU, and the RDNA 3+ graphics architecture for its 33% faster iGPU. The new "Zen 5" microarchitecture provides a 16% generational IPC uplift over "Zen 4" on the backs of several front-end enhancements, wider execution pipelines, more intra core bandwidth, and a revamped FPU that doubles performance of AI and AVX-512 workloads. AMD didn't go in-depth with the microarchitecture, but the broad points of "Zen 5" are detailed in our article for the Ryzen 9000 "Granite Ridge" desktop processors. Not only is AMD using these faster "Zen 5" CPU cores, but also increased the CPU core count by 50%, for a maximum of 12-core/24-thread.

The "Strix Point" monolithic silicon is built on the 4 nm foundry node, and packs a CPU core complex (CCX) with 12 CPU cores, four of these are "Zen 5," which can achieve the highest possible boost frequencies, the other eight are "Zen 5c" cores that feature an identical IPC and the full ISA, including support for SMT; but don't boost as high as the "Zen 5" cores. AMD is claiming a productivity performance increase ranging between 4% and 73% for its top model based in the series, when compared to Intel's Core Ultra 9 185H "Meteor Lake" processor. The iGPU sees its compute unit (CU) count go all the way up to 16 from 12 in the previous generation, and this yields a claimed 33% increase in iGPU gaming performance compared to the integrated Arc graphics of the Core Ultra 9 185H. Lastly, the XDNA 2 NPU sees more that triple the AI inference performance to 50 AI TOPS, compared to the 16 TOPS of the Ryzen 8040 "Hawk Point" processor, and 12 TOPS of Core Ultra "Meteor Lake." This makes the processor meet Microsoft's Copilot+ AI PC requirements.

FinalWire Releases AIDA64 v7.30

FinalWire Ltd. today announced the immediate availability of AIDA64 Extreme 7.30 software, a streamlined diagnostic and benchmarking tool for home users; the immediate availability of AIDA64 Engineer 7.30 software, a professional diagnostic and benchmarking solution for corporate IT technicians and engineers; the immediate availability of AIDA64 Business 7.30 software, an essential network management solution for small and medium scale enterprises; and the immediate availability of AIDA64 Network Audit 7.30 software, a dedicated network audit toolset to collect and manage corporate network inventories.

AIDA64 update brings numerous improvements and optimizations for dark mode and high contrast mode, enhances speed, and supports the latest graphics and GPGPU computing technologies by AMD, Intel, and NVIDIA.

DOWNLOAD: FinalWire AIDA64 Extreme 7.30

MSI First with Motherboard BIOS that Supports Ryzen 9000 "Zen 5" Processors

In yet another clear sign that we could see some action from AMD on the next-gen desktop processor front this Computex, motherboard maker MSI released its first beta UEFI firmware update that packs an AGESA microcode that reportedly supports the upcoming AMD Ryzen 9000 "Granite Ridge" processors. The "7D74v1D2 beta" firmware update for the MSI MPG B650 Carbon Wi-Fi motherboard encapsulates AGESA ComboPI 1.1.7.0 patch-A, with the description that it supports a "next-gen CPU," a reference to the Ryzen 9000 "Granite Ridge."

A successor to the Ryzen 7000 Raphael, the Ryzen 9000 Granite Ridge introduces the new "Zen 5" microarchitecture to the desktop platform, with CPU core counts remaining up to 16-core/32-thread. The new microarchitecture is expected to introduce generational increase in IPC, as well as improve performance of certain exotic workloads such as AVX-512. The processors are said to be launching alongside the new AMD 800-series motherboard chipset. If AMD is using Computex as a platform to showcase these processors, it's likely we might see the first of these motherboards as well.

AMD Ryzen 9000 "Granite Ridge" Zen 5 Processor Pictured

An alleged picture of an unreleased AMD Ryzen 9000 series "Granite Ridge" desktop processor, just hit the wires. "Granite Ridge" is codename for the desktop implementation of the "Zen 5" microarchitecture, it succeeds the current Ryzen 7000 "Raphael" that's powered by "Zen 4." From what we're hearing, the CPU core counts of "Granite Ridge" continue to top out at 16. These chips will be built in the existing AMD Socket AM5 package, and will be compatible with existing AMD 600-series chipset motherboards, although the company is working on a new motherboard chipset to go with the new chips.

The alleged AMD engineering sample pictured below has an OPN 100-000001290-11, which is unreleased. This OPN also showed up on an Einstein@Home online database, where the distributed computing platform read it as having 16 threads, making this possibly an 8-core/16-thread SKU. The "Zen 5" microarchitecture is expected to provide a generational IPC increase over "Zen 4," but more importantly, offer a significant performance increase for AVX-512 workloads due to an updated FPU. AMD is expected to unveil its Ryzen 9000 series "Zen 5" processors at the 2024 Computex.

AMD Zen 5 Execution Engine Leaked, Features True 512-bit FPU

AMD "Zen 5" CPU microarchitecture will introduce a significant performance increase for AVX-512 workloads, with some sources reported as high as 40% performance increases over "Zen 4" in benchmarks that use AVX-512. A Moore's Law is Dead report detailing the execution engine of "Zen 5" holds the answer to how the company managed this—using a true 512-bit FPU. Currently, AMD uses a dual-pumped 256-bit FPU to execute AVX-512 workloads on "Zen 4." The updated FPU should significantly improve the core's performance in workloads that take advantage of 512-bit AVX or VNNI instructions, such as AI.

Giving "Zen 5" a 512-bit FPU meant that AMD also had to scale up the ancillaries—all the components that keep the FPU fed with data and instructions. The company therefore increased the capacity of the L1 DTLB. The load-store queues have been widened to meet the needs of the new FPU. The L1 Data cache has been doubled in bandwidth, and increased in size by 50%. The L1D is now 48 KB in size, up from 32 KB in "Zen 4." FPU MADD latency has been reduced by 1 cycle. Besides the FPU, AMD also increased the number of Integer execution pipes to 10, from 8 on "Zen 4." The exclusive L2 cache per core remains 1 MB in size.
Update 07:02 UTC: Moore's Law is Dead reached out to us and said that the slide previously posted by them, which we had used in an earlier version of this article, is fake, but said that the information contained in that slide is correct, and that they stand by the information.

Qubic Cryptocurrency Mining Craze Causes AMD Ryzen 9 7950X Stocks to Evaporate

It looks like cryptocurrency mining is back in craze, as miners are firing up their old mining hardware from 2022 to cash in. Bitcoin is now north of $72,000, and is dragging up the value of several other cryptocurrencies, one such being Qubic (QBIC). Profitability calculators put 24 hours of Qubic mining on an AMD Ryzen 9 7950X 16-core processor at around $3, after subtracting energy costs involved in running the chip at its default 170 W TDP. "Zen 4" processors such as the 7950X tend to retain much of their performance with slight underclocking, and reducing their power limits; which is bound to hold or increase profitability, while also prolonging the life of the hardware.

And thus, the inevitable has happened—stocks of the AMD Ryzen 9 7950X have disappeared overnight across online retail. With the market presence of the 7950X3D and the Intel Core i9-14900K, the 7950X was typically found between $550-600, which would have added great value considering its low input costs. CPU-based cryptocurrency miners, including the QBIC miner, appear to be taking advantage of the AVX-512 instruction set. AMD "Zen 4" microarchitecture supports AVX-512 through its dual-pumped 256-bit FPU, and the upcoming "Zen 5" microarchitecture is rumored to double AVX-512 performance over "Zen 4." Meanwhile, Intel has deprecated what few client-relevant AVX-512 instructions its Core processors had since 12th Gen "Alder Lake," as it reportedly affected sales of Xeon processors. What about the 7950X3D? It's pricier, but mining doesn't benefit from the 3D V-cache, and the chip doesn't sustain the kind of CPU clocks the 7950X manages to do across all its 16 cores. It's only a matter of time before the 7950X3D disappears, too; followed by 12-core models such as the 65 W 7900, the 170 W 7900X, and the 7900X3D.

Google: CPUs are Leading AI Inference Workloads, Not GPUs

The AI infrastructure of today is mostly fueled by the expansion that relies on GPU-accelerated servers. Google, one of the world's largest hyperscalers, has noted that CPUs are still a leading compute for AI/ML workloads, recorded on their Google Cloud Services cloud internal analysis. During the TechFieldDay event, a speech by Brandon Royal, product manager at Google Cloud, explained the position of CPUs in today's AI game. The AI lifecycle is divided into two parts: training and inference. During training, massive compute capacity is needed, along with enormous memory capacity, to fit ever-expanding AI models into memory. The latest models, like GPT-4 and Gemini, contain billions of parameters and require thousands of GPUs or other accelerators working in parallel to train efficiently.

On the other hand, inference requires less compute intensity but still benefits from acceleration. The pre-trained model is optimized and deployed during inference to make predictions on new data. While less compute is needed than training, latency and throughput are essential for real-time inference. Google found out that, while GPUs are ideal for the training phase, models are often optimized and run inference on CPUs. This means that there are customers who choose CPUs as their medium of AI inference for a wide variety of reasons.

AMD Zen 5 Details Emerge with GCC "Znver5" Patch: New AVX Instructions, Larger Pipelines

AMD's upcoming family of Ryzen 9000 series of processors on the AM5 platform will carry a new silicon SKU under the hood—Zen 5. The latest revision of AMD's x86-64 microarchitecture will feature a few interesting improvements over its current Zen 4 that it is replacing, targeting the rumored 10-15% IPC improvement. Thanks to the latest set of patches for GNU Compiler Collection (GCC), we have the patch set that proposes changes taking place with "znver5" enablement. One of the most interesting additions to the Zen 5 over the previous Zen 4 is the expansion of the AVX instruction set, mainly new AVX and AVX-512 instructions: AVX-VNNI, MOVDIRI, MOVDIR64B, AVX512VP2INTERSECT, and PREFETCHI.

AVX-VNNI is a 256-bit vector version of the AVX-512 VNNI instruction set that accelerates neural network inferencing workloads. AVX-VNNI delivers the same VNNI instruction set for CPUs that support 256-bit vectors but lack full 512-bit AVX-512 capabilities. AVX-VNNI effectively extends useful VNNI instructions for AI acceleration down to 256-bit vectors, making the technology more efficient. While narrow in scope (no opmasking and extra vector register access compared to AVX-512 VNNI), AVX-VNNI is crucial in spreading VNNI inferencing speedups to real-world CPUs and applications. The new AVX-512 VP2INTERSECT instruction is also making it in Zen 5, as noted above, which has been present only in Intel Tiger Lake processor generation, and is now considered deprecated for Intel SKUs. We don't know the rationale behind this inclusion, but AMD sure had a use case for it.

AMD Ryzen 7 8840U APU Benched in GPD Win Max 2 Handheld

GPD has disclosed to ITHome that a specification refresh of its Win Max 2 handheld/mini-laptop gaming PC is incoming—this model debuted last year with Ryzen 7040 "Phoenix" APUs sitting in the driver's seat. A company representative provided a sneak peek of an upgraded device that sports a Team Red Ryzen 8040 series "Hawk Point" mobile processor, and a larger pool of system memory (32 GB versus the 2023 model's 16 GB). The refreshed GPD Win Max 2's Ryzen 7 8840U APU was compared to the predecessor's Ryzen 7 7840U in CPU-Z benchmarks (standard and AX-512)—the results demonstrate a very slight difference in performance between generations.

The 8040 and 7040 APUs share the same "Phoenix" basic CPU design (8-cores + 16-threads) based on the prevalent "Zen 4" microarchitecture, plus an integration of AMD's Radeon 780M GPU. The former's main upgrade lies in its AI-crunching capabilities—a deployment of Team Red's XDNA AI engine. Ryzen 8040's: "NPU performance has been increased to 16 TOPS, compared to 10 TOPS of the NPU on the 'Phoenix' silicon. AMD is taking a whole-of-silicon approach to AI acceleration, which includes not just the NPU, but also the 'Zen 4' CPU cores that support the AVX-512 VNNI instruction set that's relevant to AI; and the iGPU based on the RDNA 3 graphics architecture, with each of its compute unit featuring two AI accelerators, components that make the SIMD cores crunch matrix math. The whole-of-silicon performance figures for "Phoenix" is 33 TOPS; while 'Hawk Point' boasts of 39 TOPS. In benchmarks by AMD, 'Hawk Point' is shown delivering a 40% improvement in vision models, and Llama 2, over the Ryzen 7040 "Phoenix" series."

AVX-512 Doubles Intel 5th Gen "Emerald Rapids" Xeon Processor Performance, Up to 10x Improvement in AI Workloads

According to the latest round of tests by Phoronix, we are seeing proof of substantial performance gains Intel's 5th Gen Xeon Emerald Rapids server CPUs deliver when employing AVX-512 vector instructions. Enabling AVX-512 doubled throughput on average across a range of workloads, with specific AI tasks accelerating over 10x versus having it disabled. Running on the top-end 64-core Platinum 8592+ SKU, benchmarks saw minimal frequency differences between AVX-512 on and off states. However, the specialized 512-bit vector processing unlocked dramatic speedups, exemplified in the OpenVINO AI framework. Specifically, weld porosity detection, which has real-world applications, showed the biggest speedups. Power draw also increased moderately - the expected tradeoff for such an unconstrained performance upside.

With robust optimizations, the vector engine potential has now been fully demonstrated. Workloads spanning AI, visualization, simulation, and analytics could multiply speed by upgrading to Emerald Rapids. Of course, developer implementation work remains non-trivial. But for the data center applications that can take advantage, AVX-512 enables Intel to partially close raw throughput gaps versus AMD's core count leadership. Whether those targeted acceleration gains offset EPYC's wider general-purpose value depends on customer workloads. But with tests proving dramatic upside, Intel is betting big on vector acceleration as its ace card. AMD also supports the AVX-512 instruction set. Below, you can find the geometric mean of all test results, and check the review with benchmarks here.

AMD Ryzen 8040 Series "Hawk Point" Mobile Processors Announced with a Faster NPU

AMD today announced the new Ryzen 8040 mobile processor series codenamed "Hawk Point." These chips are shipping to notebook manufacturers now, and the first notebooks powered by these should be available to consumers in Q1-2024. At the heart of this processor is a significantly faster neural processing unit (NPU), designed to accelerate AI applications that will become relevant next year, as Microsoft prepares to launch Windows 12, and software vendors make greater use of generative AI in consumer applications.

The Ryzen 8040 "Hawk Point" processor is almost identical in design and features to the Ryzen 7040 "Phoenix," except for a faster Ryzen AI NPU. While this is based on the same first-generation XDNA architecture, its NPU performance has been increased to 16 TOPS, compared to 10 TOPS of the NPU on the "Phoenix" silicon. AMD is taking a whole-of-silicon approach to AI acceleration, which includes not just the NPU, but also the "Zen 4" CPU cores that support the AVX-512 VNNI instruction set that's relevant to AI; and the iGPU based on the RDNA 3 graphics architecture, with each of its compute unit featuring two AI accelerators, components that make the SIMD cores crunch matrix math. The whole-of-silicon performance figures for "Phoenix" is 33 TOPS; while "Hawk Point" boasts of 39 TOPS. In benchmarks by AMD, "Hawk Point" is shown delivering a 40% improvement in vision models, and Llama 2, over the Ryzen 7040 "Phoenix" series.

FinalWire Releases AIDA64 v7.00 with Revamped Design and AMD Threadripper 7000 Optimizations

FinalWire Ltd. today announced the immediate availability of AIDA64 Extreme 7.00 software, a streamlined diagnostic and benchmarking tool for home users; the immediate availability of AIDA64 Engineer 7.00 software, a professional diagnostic and benchmarking solution for corporate IT technicians and engineers; the immediate availability of AIDA64 Business 7.00 software, an essential network management solution for small and medium scale enterprises; and the immediate availability of AIDA64 Network Audit 7.00 software, a dedicated network audit toolset to collect and manage corporate network inventories.

The new AIDA64 update introduces a revamped user interface with a configurable toolbar, as well as AVX-512 accelerated benchmarks for AMD Threadripper 7000 processors, AVX2 optimized benchmarks for Intel Meteor Lake processors, and supports the latest AMD and Intel CPU platforms as well as the new graphics and GPGPU computing technologies by AMD, Intel and NVIDIA.

DOWNLOAD: FinalWire AIDA64 Extreme v7.0

FinalWire AIDA64 v6.92 Released

FinalWire Ltd. today announced the immediate availability of AIDA64 Extreme 6.92 software, a streamlined diagnostic and benchmarking tool for home users; the immediate availability of AIDA64 Engineer 6.92 software, a professional diagnostic and benchmarking solution for corporate IT technicians and engineers; the immediate availability of AIDA64 Business 6.92 software, an essential network management solution for small and medium scale enterprises; and the immediate availability of AIDA64 Network Audit 6.92 software, a dedicated network audit toolset to collect and manage corporate network inventories.

The latest AIDA64 update introduces AVX-512 optimized benchmarks for Intel Sapphire Rapids processors, and supports the latest AMD and Intel CPU platforms as well as the new graphics and GPGPU computing technologies by AMD, Intel and NVIDIA.,

DOWNLOAD: FinalWire AIDA64 Extreme v6.92

"Downfall" Intel CPU Vulnerability Can Impact Performance By 50%

Intel has recently revealed a security vulnerability named Downfall (CVE-2022-40982) that impacts multiple generations of Intel processors. The vulnerability is linked to Intel's memory optimization feature, exploiting the Gather instruction, a function that accelerates data fetching from scattered memory locations. It inadvertently exposes internal hardware registers, allowing malicious software access to data held by other programs. The flaw affects Intel mainstream and server processors ranging from the Skylake to Rocket Lake microarchitecture. The entire list of affected CPUs is here. Intel has responded by releasing updated software-level microcode to fix the flaw. However, there's concern over the performance impact of the fix, potentially affecting AVX2 and AVX-512 workloads involving the Gather instruction by up to 50%.

Phoronix tested the Downfall mitigations and reported varying performance decreases on different processors. For instance, two Xeon Platinum 8380 processors were around 6% slower in certain tests, while the Core i7-1165G7 faced performance degradation ranging from 11% to 39% in specific benchmarks. While these reductions were less than Intel's forecasted 50% overhead, they remain significant, especially in High-Performance Computing (HPC) workloads. The ramifications of Downfall are not restricted to specialized tasks like AI or HPC but may extend to more common applications such as video encoding. Though the microcode update is not mandatory and Intel provides an opt-out mechanism, users are left with a challenging decision between security and performance. Executing a Downfall attack might seem complex, but the final choice between implementing the mitigation or retaining performance will likely vary depending on individual needs and risk assessments.

Intel Previews AVX10 ISA, Next-Gen E-Cores to get AVX-512 Capabilities

Intel has published a preview article covering its new AVX10 ISA (Instruction Set Architecture)—the announcement reveals that both P-Cores & E-Cores (on next-gen processors) will be getting support for AVX-512. Team Blue stated: "Intel AVX10 represents a major shift to supporting a high-performance vector ISA across future Intel processors. It allows the developer to maintain a single code-path that achieves high performance across all Intel platforms with the minimum of overhead checking for feature support. Future development of the Intel AVX10 ISA will continue to provide a rich, flexible, and consistent environment that optimally supports both Server and Client products."

Due to technical issues (E-core related), Intel decided to disable AVX-512 for Alder Lake and Raptor Lake client-oriented CPU lineups. AMD has recently adopted the fairly new instruction set for its Ryzen 7040 mobile series, so it is no wonder that Team Blue is attempting to reintroduce it in the near future—AVX-512 was last seen working properly on Rocket and Tiger Lake chips. AVX10 implementation is expected to debut with Granite Rapids (according to Longhorn), and VideoCardz reckons that Intel will get advanced instructions for Efficiency cores working with its Clearwater Forest CPU architecture.

AMD Ryzen 7040 Series Phoenix APUs Surprisingly Performant with AVX-512 Workloads

Intel decided to drop the relatively new AVX-512 instruction set for laptop/mobile platforms when it was discovered that it would not work in conjunction with their E-core designs. Alder Lake was the last generation to (semi) support these sets thanks to P-cores agreeing to play nice, albeit with the efficiency side of proceedings disabled (via BIOS settings). Intel chose to fuse off AVX-512 support in production circa early 2022, with AMD picking up the slack soon after and working on the integration of AVX-512 into Zen 4 CPU architecture. The Ryzen 7040 series is the only current generation mobile platform that offers AVX-512 support. Phoronix decided to benchmark a Ryzen 7 7840U against older Intel i7-1165G7 (Tiger Lake) and i7-1065G7 (Ice Lake) SoCs in AVX-512-based workloads.

Team Red's debut foray into AVX-512 was surprisingly performant according to Phoronix's test results—the Ryzen 7 7840U did very well for itself. It outperformed the 1165G7 by 46%, and the older 1065G7 by an impressive 63%. The Ryzen 7 APU was found to attain the highest performance gain with AVX-512 enabled—a 54% performance margin over operating with AVX-512 disabled. In comparison Phoronix found that: "the i7-1165G7 Tiger Lake impact came in at 34% with these AVX-512-heavy benchmarks or 35% with the i7-1065G7 Ice Lake SoC for that generation where AVX-512 on Intel laptops became common."

New Intel oneAPI 2023 Tools Maximize Value of Upcoming Intel Hardware

Today, Intel announced the 2023 release of the Intel oneAPI tools - available in the Intel Developer Cloud and rolling out through regular distribution channels. The new oneAPI 2023 tools support the upcoming 4th Gen Intel Xeon Scalable processors, Intel Xeon CPU Max Series and Intel Data Center GPUs, including Flex Series and the new Max Series. The tools deliver performance and productivity enhancements, and also add support for new Codeplay plug-ins that make it easier than ever for developers to write SYCL code for non-Intel GPU architectures. These standards-based tools deliver choice in hardware and ease in developing high-performance applications that run on multiarchitecture systems.

"We're seeing encouraging early application performance results on our development systems using Intel Max Series GPU accelerators - applications built with Intel's oneAPI compilers and libraries. For leadership-class computational science, we value the benefits of code portability from multivendor, multiarchitecture programming standards such as SYCL and Python AI frameworks such as PyTorch, accelerated by Intel libraries. We look forward to the first exascale scientific discoveries from these technologies on the Aurora system next year."
-Timothy Williams, deputy director, Argonne Computational Science Division

FinalWire AIDA64 v6.85 Released with NVIDIA Ada and AMD RDNA3 Support

FinalWire Ltd. today announced the immediate availability of AIDA64 Extreme 6.85 software, a streamlined diagnostic and benchmarking tool for home users; the immediate availability of AIDA64 Engineer 6.85 software, a professional diagnostic and benchmarking solution for corporate IT technicians and engineers; the immediate availability of AIDA64 Business 6.85 software, an essential network management solution for small and medium scale enterprises; and the immediate availability of AIDA64 Network Audit 6.85 software, a dedicated network audit toolset to collect and manage corporate network inventories.

The new AIDA64 update introduces AVX-512 optimized stress testing for AMD Ryzen 7000 Series processors, and supports the latest AMD and Intel CPU platforms as well as the new graphics and GPGPU computing technologies by both AMD and NVIDIA.

DOWNLOAD: FinalWire AIDA64 Extreme v6.85

GIGABYTE Delivers a Comprehensive Portfolio of Enterprise Solutions with AMD EPYC 9004 Series Processors

GIGABYTE Technology, an industry leader in high-performance servers and workstations, today announced its portfolio of products ready to support the new AMD EPYC 9004 Series Processors in the first wave of GIGABYTE solutions that will target a wide range of demanding workloads that include GPU-centric, high-density, edge, and general computing. A new x86 platform, a new socket, and a wealth of highly performant technologies provided new opportunities for GIGABYTE to tailor products for leading data centers. So far, GIGABYTE has released twenty-two new servers and motherboards to support the new AMD "Zen 4" architecture. Both single-socket and dual-socket options are available to handle big data and digital transformation. The ongoing collaboration between GIGABYTE and AMD has allowed for a comprehensive portfolio of computing solutions that are ready for the market.

The new 4th Gen AMD EPYC processors feature substantial compute performance and scalability by combing high core counts with impressive PCIe and memory throughput. In terms of out of the box performance, AMD estimates found that 4th Gen AMD EPYC CPUs are the highest performing server processors in the worldi. With the advancement to 5 nm technology and other performant innovations, the new AMD EPYC 9004 series processors move to a new SP5 socket. The new architecture leads the way to faster data insights with high performance and built-in security features, and this platform targets HPC, AI, cloud, big data, and general enterprise IT.
Return to Keyword Browsing
Dec 21st, 2024 20:24 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts