News Posts matching #AVX-512

Return to Keyword Browsing

MSI First with Motherboard BIOS that Supports Ryzen 9000 "Zen 5" Processors

In yet another clear sign that we could see some action from AMD on the next-gen desktop processor front this Computex, motherboard maker MSI released its first beta UEFI firmware update that packs an AGESA microcode that reportedly supports the upcoming AMD Ryzen 9000 "Granite Ridge" processors. The "7D74v1D2 beta" firmware update for the MSI MPG B650 Carbon Wi-Fi motherboard encapsulates AGESA ComboPI 1.1.7.0 patch-A, with the description that it supports a "next-gen CPU," a reference to the Ryzen 9000 "Granite Ridge."

A successor to the Ryzen 7000 Raphael, the Ryzen 9000 Granite Ridge introduces the new "Zen 5" microarchitecture to the desktop platform, with CPU core counts remaining up to 16-core/32-thread. The new microarchitecture is expected to introduce generational increase in IPC, as well as improve performance of certain exotic workloads such as AVX-512. The processors are said to be launching alongside the new AMD 800-series motherboard chipset. If AMD is using Computex as a platform to showcase these processors, it's likely we might see the first of these motherboards as well.

AMD Ryzen 9000 "Granite Ridge" Zen 5 Processor Pictured

An alleged picture of an unreleased AMD Ryzen 9000 series "Granite Ridge" desktop processor, just hit the wires. "Granite Ridge" is codename for the desktop implementation of the "Zen 5" microarchitecture, it succeeds the current Ryzen 7000 "Raphael" that's powered by "Zen 4." From what we're hearing, the CPU core counts of "Granite Ridge" continue to top out at 16. These chips will be built in the existing AMD Socket AM5 package, and will be compatible with existing AMD 600-series chipset motherboards, although the company is working on a new motherboard chipset to go with the new chips.

The alleged AMD engineering sample pictured below has an OPN 100-000001290-11, which is unreleased. This OPN also showed up on an Einstein@Home online database, where the distributed computing platform read it as having 16 threads, making this possibly an 8-core/16-thread SKU. The "Zen 5" microarchitecture is expected to provide a generational IPC increase over "Zen 4," but more importantly, offer a significant performance increase for AVX-512 workloads due to an updated FPU. AMD is expected to unveil its Ryzen 9000 series "Zen 5" processors at the 2024 Computex.

AMD Zen 5 Execution Engine Leaked, Features True 512-bit FPU

AMD "Zen 5" CPU microarchitecture will introduce a significant performance increase for AVX-512 workloads, with some sources reported as high as 40% performance increases over "Zen 4" in benchmarks that use AVX-512. A Moore's Law is Dead report detailing the execution engine of "Zen 5" holds the answer to how the company managed this—using a true 512-bit FPU. Currently, AMD uses a dual-pumped 256-bit FPU to execute AVX-512 workloads on "Zen 4." The updated FPU should significantly improve the core's performance in workloads that take advantage of 512-bit AVX or VNNI instructions, such as AI.

Giving "Zen 5" a 512-bit FPU meant that AMD also had to scale up the ancillaries—all the components that keep the FPU fed with data and instructions. The company therefore increased the capacity of the L1 DTLB. The load-store queues have been widened to meet the needs of the new FPU. The L1 Data cache has been doubled in bandwidth, and increased in size by 50%. The L1D is now 48 KB in size, up from 32 KB in "Zen 4." FPU MADD latency has been reduced by 1 cycle. Besides the FPU, AMD also increased the number of Integer execution pipes to 10, from 8 on "Zen 4." The exclusive L2 cache per core remains 1 MB in size.
Update 07:02 UTC: Moore's Law is Dead reached out to us and said that the slide previously posted by them, which we had used in an earlier version of this article, is fake, but said that the information contained in that slide is correct, and that they stand by the information.

Qubic Cryptocurrency Mining Craze Causes AMD Ryzen 9 7950X Stocks to Evaporate

It looks like cryptocurrency mining is back in craze, as miners are firing up their old mining hardware from 2022 to cash in. Bitcoin is now north of $72,000, and is dragging up the value of several other cryptocurrencies, one such being Qubic (QBIC). Profitability calculators put 24 hours of Qubic mining on an AMD Ryzen 9 7950X 16-core processor at around $3, after subtracting energy costs involved in running the chip at its default 170 W TDP. "Zen 4" processors such as the 7950X tend to retain much of their performance with slight underclocking, and reducing their power limits; which is bound to hold or increase profitability, while also prolonging the life of the hardware.

And thus, the inevitable has happened—stocks of the AMD Ryzen 9 7950X have disappeared overnight across online retail. With the market presence of the 7950X3D and the Intel Core i9-14900K, the 7950X was typically found between $550-600, which would have added great value considering its low input costs. CPU-based cryptocurrency miners, including the QBIC miner, appear to be taking advantage of the AVX-512 instruction set. AMD "Zen 4" microarchitecture supports AVX-512 through its dual-pumped 256-bit FPU, and the upcoming "Zen 5" microarchitecture is rumored to double AVX-512 performance over "Zen 4." Meanwhile, Intel has deprecated what few client-relevant AVX-512 instructions its Core processors had since 12th Gen "Alder Lake," as it reportedly affected sales of Xeon processors. What about the 7950X3D? It's pricier, but mining doesn't benefit from the 3D V-cache, and the chip doesn't sustain the kind of CPU clocks the 7950X manages to do across all its 16 cores. It's only a matter of time before the 7950X3D disappears, too; followed by 12-core models such as the 65 W 7900, the 170 W 7900X, and the 7900X3D.

Google: CPUs are Leading AI Inference Workloads, Not GPUs

The AI infrastructure of today is mostly fueled by the expansion that relies on GPU-accelerated servers. Google, one of the world's largest hyperscalers, has noted that CPUs are still a leading compute for AI/ML workloads, recorded on their Google Cloud Services cloud internal analysis. During the TechFieldDay event, a speech by Brandon Royal, product manager at Google Cloud, explained the position of CPUs in today's AI game. The AI lifecycle is divided into two parts: training and inference. During training, massive compute capacity is needed, along with enormous memory capacity, to fit ever-expanding AI models into memory. The latest models, like GPT-4 and Gemini, contain billions of parameters and require thousands of GPUs or other accelerators working in parallel to train efficiently.

On the other hand, inference requires less compute intensity but still benefits from acceleration. The pre-trained model is optimized and deployed during inference to make predictions on new data. While less compute is needed than training, latency and throughput are essential for real-time inference. Google found out that, while GPUs are ideal for the training phase, models are often optimized and run inference on CPUs. This means that there are customers who choose CPUs as their medium of AI inference for a wide variety of reasons.

AMD Zen 5 Details Emerge with GCC "Znver5" Patch: New AVX Instructions, Larger Pipelines

AMD's upcoming family of Ryzen 9000 series of processors on the AM5 platform will carry a new silicon SKU under the hood—Zen 5. The latest revision of AMD's x86-64 microarchitecture will feature a few interesting improvements over its current Zen 4 that it is replacing, targeting the rumored 10-15% IPC improvement. Thanks to the latest set of patches for GNU Compiler Collection (GCC), we have the patch set that proposes changes taking place with "znver5" enablement. One of the most interesting additions to the Zen 5 over the previous Zen 4 is the expansion of the AVX instruction set, mainly new AVX and AVX-512 instructions: AVX-VNNI, MOVDIRI, MOVDIR64B, AVX512VP2INTERSECT, and PREFETCHI.

AVX-VNNI is a 256-bit vector version of the AVX-512 VNNI instruction set that accelerates neural network inferencing workloads. AVX-VNNI delivers the same VNNI instruction set for CPUs that support 256-bit vectors but lack full 512-bit AVX-512 capabilities. AVX-VNNI effectively extends useful VNNI instructions for AI acceleration down to 256-bit vectors, making the technology more efficient. While narrow in scope (no opmasking and extra vector register access compared to AVX-512 VNNI), AVX-VNNI is crucial in spreading VNNI inferencing speedups to real-world CPUs and applications. The new AVX-512 VP2INTERSECT instruction is also making it in Zen 5, as noted above, which has been present only in Intel Tiger Lake processor generation, and is now considered deprecated for Intel SKUs. We don't know the rationale behind this inclusion, but AMD sure had a use case for it.

AMD Ryzen 7 8840U APU Benched in GPD Win Max 2 Handheld

GPD has disclosed to ITHome that a specification refresh of its Win Max 2 handheld/mini-laptop gaming PC is incoming—this model debuted last year with Ryzen 7040 "Phoenix" APUs sitting in the driver's seat. A company representative provided a sneak peek of an upgraded device that sports a Team Red Ryzen 8040 series "Hawk Point" mobile processor, and a larger pool of system memory (32 GB versus the 2023 model's 16 GB). The refreshed GPD Win Max 2's Ryzen 7 8840U APU was compared to the predecessor's Ryzen 7 7840U in CPU-Z benchmarks (standard and AX-512)—the results demonstrate a very slight difference in performance between generations.

The 8040 and 7040 APUs share the same "Phoenix" basic CPU design (8-cores + 16-threads) based on the prevalent "Zen 4" microarchitecture, plus an integration of AMD's Radeon 780M GPU. The former's main upgrade lies in its AI-crunching capabilities—a deployment of Team Red's XDNA AI engine. Ryzen 8040's: "NPU performance has been increased to 16 TOPS, compared to 10 TOPS of the NPU on the 'Phoenix' silicon. AMD is taking a whole-of-silicon approach to AI acceleration, which includes not just the NPU, but also the 'Zen 4' CPU cores that support the AVX-512 VNNI instruction set that's relevant to AI; and the iGPU based on the RDNA 3 graphics architecture, with each of its compute unit featuring two AI accelerators, components that make the SIMD cores crunch matrix math. The whole-of-silicon performance figures for "Phoenix" is 33 TOPS; while 'Hawk Point' boasts of 39 TOPS. In benchmarks by AMD, 'Hawk Point' is shown delivering a 40% improvement in vision models, and Llama 2, over the Ryzen 7040 "Phoenix" series."

AVX-512 Doubles Intel 5th Gen "Emerald Rapids" Xeon Processor Performance, Up to 10x Improvement in AI Workloads

According to the latest round of tests by Phoronix, we are seeing proof of substantial performance gains Intel's 5th Gen Xeon Emerald Rapids server CPUs deliver when employing AVX-512 vector instructions. Enabling AVX-512 doubled throughput on average across a range of workloads, with specific AI tasks accelerating over 10x versus having it disabled. Running on the top-end 64-core Platinum 8592+ SKU, benchmarks saw minimal frequency differences between AVX-512 on and off states. However, the specialized 512-bit vector processing unlocked dramatic speedups, exemplified in the OpenVINO AI framework. Specifically, weld porosity detection, which has real-world applications, showed the biggest speedups. Power draw also increased moderately - the expected tradeoff for such an unconstrained performance upside.

With robust optimizations, the vector engine potential has now been fully demonstrated. Workloads spanning AI, visualization, simulation, and analytics could multiply speed by upgrading to Emerald Rapids. Of course, developer implementation work remains non-trivial. But for the data center applications that can take advantage, AVX-512 enables Intel to partially close raw throughput gaps versus AMD's core count leadership. Whether those targeted acceleration gains offset EPYC's wider general-purpose value depends on customer workloads. But with tests proving dramatic upside, Intel is betting big on vector acceleration as its ace card. AMD also supports the AVX-512 instruction set. Below, you can find the geometric mean of all test results, and check the review with benchmarks here.

AMD Ryzen 8040 Series "Hawk Point" Mobile Processors Announced with a Faster NPU

AMD today announced the new Ryzen 8040 mobile processor series codenamed "Hawk Point." These chips are shipping to notebook manufacturers now, and the first notebooks powered by these should be available to consumers in Q1-2024. At the heart of this processor is a significantly faster neural processing unit (NPU), designed to accelerate AI applications that will become relevant next year, as Microsoft prepares to launch Windows 12, and software vendors make greater use of generative AI in consumer applications.

The Ryzen 8040 "Hawk Point" processor is almost identical in design and features to the Ryzen 7040 "Phoenix," except for a faster Ryzen AI NPU. While this is based on the same first-generation XDNA architecture, its NPU performance has been increased to 16 TOPS, compared to 10 TOPS of the NPU on the "Phoenix" silicon. AMD is taking a whole-of-silicon approach to AI acceleration, which includes not just the NPU, but also the "Zen 4" CPU cores that support the AVX-512 VNNI instruction set that's relevant to AI; and the iGPU based on the RDNA 3 graphics architecture, with each of its compute unit featuring two AI accelerators, components that make the SIMD cores crunch matrix math. The whole-of-silicon performance figures for "Phoenix" is 33 TOPS; while "Hawk Point" boasts of 39 TOPS. In benchmarks by AMD, "Hawk Point" is shown delivering a 40% improvement in vision models, and Llama 2, over the Ryzen 7040 "Phoenix" series.

FinalWire Releases AIDA64 v7.00 with Revamped Design and AMD Threadripper 7000 Optimizations

FinalWire Ltd. today announced the immediate availability of AIDA64 Extreme 7.00 software, a streamlined diagnostic and benchmarking tool for home users; the immediate availability of AIDA64 Engineer 7.00 software, a professional diagnostic and benchmarking solution for corporate IT technicians and engineers; the immediate availability of AIDA64 Business 7.00 software, an essential network management solution for small and medium scale enterprises; and the immediate availability of AIDA64 Network Audit 7.00 software, a dedicated network audit toolset to collect and manage corporate network inventories.

The new AIDA64 update introduces a revamped user interface with a configurable toolbar, as well as AVX-512 accelerated benchmarks for AMD Threadripper 7000 processors, AVX2 optimized benchmarks for Intel Meteor Lake processors, and supports the latest AMD and Intel CPU platforms as well as the new graphics and GPGPU computing technologies by AMD, Intel and NVIDIA.

DOWNLOAD: FinalWire AIDA64 Extreme v7.0

FinalWire AIDA64 v6.92 Released

FinalWire Ltd. today announced the immediate availability of AIDA64 Extreme 6.92 software, a streamlined diagnostic and benchmarking tool for home users; the immediate availability of AIDA64 Engineer 6.92 software, a professional diagnostic and benchmarking solution for corporate IT technicians and engineers; the immediate availability of AIDA64 Business 6.92 software, an essential network management solution for small and medium scale enterprises; and the immediate availability of AIDA64 Network Audit 6.92 software, a dedicated network audit toolset to collect and manage corporate network inventories.

The latest AIDA64 update introduces AVX-512 optimized benchmarks for Intel Sapphire Rapids processors, and supports the latest AMD and Intel CPU platforms as well as the new graphics and GPGPU computing technologies by AMD, Intel and NVIDIA.,

DOWNLOAD: FinalWire AIDA64 Extreme v6.92

"Downfall" Intel CPU Vulnerability Can Impact Performance By 50%

Intel has recently revealed a security vulnerability named Downfall (CVE-2022-40982) that impacts multiple generations of Intel processors. The vulnerability is linked to Intel's memory optimization feature, exploiting the Gather instruction, a function that accelerates data fetching from scattered memory locations. It inadvertently exposes internal hardware registers, allowing malicious software access to data held by other programs. The flaw affects Intel mainstream and server processors ranging from the Skylake to Rocket Lake microarchitecture. The entire list of affected CPUs is here. Intel has responded by releasing updated software-level microcode to fix the flaw. However, there's concern over the performance impact of the fix, potentially affecting AVX2 and AVX-512 workloads involving the Gather instruction by up to 50%.

Phoronix tested the Downfall mitigations and reported varying performance decreases on different processors. For instance, two Xeon Platinum 8380 processors were around 6% slower in certain tests, while the Core i7-1165G7 faced performance degradation ranging from 11% to 39% in specific benchmarks. While these reductions were less than Intel's forecasted 50% overhead, they remain significant, especially in High-Performance Computing (HPC) workloads. The ramifications of Downfall are not restricted to specialized tasks like AI or HPC but may extend to more common applications such as video encoding. Though the microcode update is not mandatory and Intel provides an opt-out mechanism, users are left with a challenging decision between security and performance. Executing a Downfall attack might seem complex, but the final choice between implementing the mitigation or retaining performance will likely vary depending on individual needs and risk assessments.

Intel Previews AVX10 ISA, Next-Gen E-Cores to get AVX-512 Capabilities

Intel has published a preview article covering its new AVX10 ISA (Instruction Set Architecture)—the announcement reveals that both P-Cores & E-Cores (on next-gen processors) will be getting support for AVX-512. Team Blue stated: "Intel AVX10 represents a major shift to supporting a high-performance vector ISA across future Intel processors. It allows the developer to maintain a single code-path that achieves high performance across all Intel platforms with the minimum of overhead checking for feature support. Future development of the Intel AVX10 ISA will continue to provide a rich, flexible, and consistent environment that optimally supports both Server and Client products."

Due to technical issues (E-core related), Intel decided to disable AVX-512 for Alder Lake and Raptor Lake client-oriented CPU lineups. AMD has recently adopted the fairly new instruction set for its Ryzen 7040 mobile series, so it is no wonder that Team Blue is attempting to reintroduce it in the near future—AVX-512 was last seen working properly on Rocket and Tiger Lake chips. AVX10 implementation is expected to debut with Granite Rapids (according to Longhorn), and VideoCardz reckons that Intel will get advanced instructions for Efficiency cores working with its Clearwater Forest CPU architecture.

AMD Ryzen 7040 Series Phoenix APUs Surprisingly Performant with AVX-512 Workloads

Intel decided to drop the relatively new AVX-512 instruction set for laptop/mobile platforms when it was discovered that it would not work in conjunction with their E-core designs. Alder Lake was the last generation to (semi) support these sets thanks to P-cores agreeing to play nice, albeit with the efficiency side of proceedings disabled (via BIOS settings). Intel chose to fuse off AVX-512 support in production circa early 2022, with AMD picking up the slack soon after and working on the integration of AVX-512 into Zen 4 CPU architecture. The Ryzen 7040 series is the only current generation mobile platform that offers AVX-512 support. Phoronix decided to benchmark a Ryzen 7 7840U against older Intel i7-1165G7 (Tiger Lake) and i7-1065G7 (Ice Lake) SoCs in AVX-512-based workloads.

Team Red's debut foray into AVX-512 was surprisingly performant according to Phoronix's test results—the Ryzen 7 7840U did very well for itself. It outperformed the 1165G7 by 46%, and the older 1065G7 by an impressive 63%. The Ryzen 7 APU was found to attain the highest performance gain with AVX-512 enabled—a 54% performance margin over operating with AVX-512 disabled. In comparison Phoronix found that: "the i7-1165G7 Tiger Lake impact came in at 34% with these AVX-512-heavy benchmarks or 35% with the i7-1065G7 Ice Lake SoC for that generation where AVX-512 on Intel laptops became common."

New Intel oneAPI 2023 Tools Maximize Value of Upcoming Intel Hardware

Today, Intel announced the 2023 release of the Intel oneAPI tools - available in the Intel Developer Cloud and rolling out through regular distribution channels. The new oneAPI 2023 tools support the upcoming 4th Gen Intel Xeon Scalable processors, Intel Xeon CPU Max Series and Intel Data Center GPUs, including Flex Series and the new Max Series. The tools deliver performance and productivity enhancements, and also add support for new Codeplay plug-ins that make it easier than ever for developers to write SYCL code for non-Intel GPU architectures. These standards-based tools deliver choice in hardware and ease in developing high-performance applications that run on multiarchitecture systems.

"We're seeing encouraging early application performance results on our development systems using Intel Max Series GPU accelerators - applications built with Intel's oneAPI compilers and libraries. For leadership-class computational science, we value the benefits of code portability from multivendor, multiarchitecture programming standards such as SYCL and Python AI frameworks such as PyTorch, accelerated by Intel libraries. We look forward to the first exascale scientific discoveries from these technologies on the Aurora system next year."
-Timothy Williams, deputy director, Argonne Computational Science Division

FinalWire AIDA64 v6.85 Released with NVIDIA Ada and AMD RDNA3 Support

FinalWire Ltd. today announced the immediate availability of AIDA64 Extreme 6.85 software, a streamlined diagnostic and benchmarking tool for home users; the immediate availability of AIDA64 Engineer 6.85 software, a professional diagnostic and benchmarking solution for corporate IT technicians and engineers; the immediate availability of AIDA64 Business 6.85 software, an essential network management solution for small and medium scale enterprises; and the immediate availability of AIDA64 Network Audit 6.85 software, a dedicated network audit toolset to collect and manage corporate network inventories.

The new AIDA64 update introduces AVX-512 optimized stress testing for AMD Ryzen 7000 Series processors, and supports the latest AMD and Intel CPU platforms as well as the new graphics and GPGPU computing technologies by both AMD and NVIDIA.

DOWNLOAD: FinalWire AIDA64 Extreme v6.85

GIGABYTE Delivers a Comprehensive Portfolio of Enterprise Solutions with AMD EPYC 9004 Series Processors

GIGABYTE Technology, an industry leader in high-performance servers and workstations, today announced its portfolio of products ready to support the new AMD EPYC 9004 Series Processors in the first wave of GIGABYTE solutions that will target a wide range of demanding workloads that include GPU-centric, high-density, edge, and general computing. A new x86 platform, a new socket, and a wealth of highly performant technologies provided new opportunities for GIGABYTE to tailor products for leading data centers. So far, GIGABYTE has released twenty-two new servers and motherboards to support the new AMD "Zen 4" architecture. Both single-socket and dual-socket options are available to handle big data and digital transformation. The ongoing collaboration between GIGABYTE and AMD has allowed for a comprehensive portfolio of computing solutions that are ready for the market.

The new 4th Gen AMD EPYC processors feature substantial compute performance and scalability by combing high core counts with impressive PCIe and memory throughput. In terms of out of the box performance, AMD estimates found that 4th Gen AMD EPYC CPUs are the highest performing server processors in the worldi. With the advancement to 5 nm technology and other performant innovations, the new AMD EPYC 9004 series processors move to a new SP5 socket. The new architecture leads the way to faster data insights with high performance and built-in security features, and this platform targets HPC, AI, cloud, big data, and general enterprise IT.

Intel Introduces the Max Series Product Family: Ponte Vecchio and Sapphire Rapids

In advance of Supercomputing '22 in Dallas, Intel Corporation has introduced the Intel Max Series product family with two leading-edge products for high performance computing (HPC) and artificial intelligence (AI): Intel Xeon CPU Max Series (code-named Sapphire Rapids HBM) and Intel Data Center GPU Max Series (code-named Ponte Vecchio). The new products will power the upcoming Aurora supercomputer at Argonne National Laboratory, with updates on its deployment shared today.

The Xeon Max CPU is the first and only x86-based processor with high bandwidth memory, accelerating many HPC workloads without the need for code changes. The Max Series GPU is Intel's highest density processor, packing over 100 billion transistors into a 47-tile package with up to 128 gigabytes (GB) of high bandwidth memory. The oneAPI open software ecosystem provides a single programming environment for both new processors. Intel's 2023 oneAPI and AI tools will deliver capabilities to enable the Intel Max Series products' advanced features.

AMD Rolls Out GCC Enablement for "Zen 4" Processors with Zenver4 Target, Enables AVX-512 Instructions

AMD earlier this week released basic enablement for the GNU Compiler Collections (GCC), which extend "Zen 4" microarchitecture awareness. The "basic enablement patch" for the new Zenver4 target is essentially similar to Zenver3, but with added support for the new AVX-512 instructions, namely AVX512F, AVX512DQ, AVX512IFMA, AVX512CD, AVX512BW, AVX512VL, AVX512BF16, AVX512VBMI, AVX512VBMI2, GFNI, AVX512VNNI, AVX512BITALG, and AVX512VPOPCNTDQ. Besides AVX-512, "Zen 4" is largely identical to its predecessor, architecturally, and so the enablement is rather basic. This should come just in time for software vendors to prepare for next-generation EPYC "Genoa" server processors, or even small/medium businesses building servers with Ryzen 7000-series processors.

Intel Outs First Xeon Scalable "Sapphire Rapids" Benchmarks, On-package Accelerators Help Catch Up with AMD EPYC

Intel in the second day of its InnovatiON event, turned attention to its next-generation Xeon Scalable "Sapphire Rapids" server processors, and demonstrated on-package accelerators. These are fixed-function hardware components that accelerate specific kinds of popular server workloads (i.e. run them faster than a CPU core can). With these, Intel hopes to close the CPU core-count gap it has with AMD EPYC, with the upcoming "Zen 4" EPYC chips expected to launch with up to 96 cores per socket in its conventional variant, and up to 128 cores per socket in its cloud-optimized variant.

Intel's on-package accelerators include AMX (advanced matrix extensions), which accelerate recommendation-engines, natural language processing (NLP), image-recognition, etc; DLB (dynamic load-balancing), which accelerates security-gateway and load-balancing; DSA (data-streaming accelerator), which speeds up the network stack, guest OS, and migration; IAA (in-memory analysis accelerator), which speeds up big-data (Apache Hadoop), IMDB, and warehousing applications; a feature-rich implementation of the AVX-512 instruction-set for a plethora of content-creation and scientific applications; and lastly, the QAT (QuickAssist Technology), with speed-ups for data compression, OpenSSL, nginx, IPsec, etc. Unlike "Ice Lake-SP," QAT is now implemented on the processor package instead of the PCH.

RPCS3 PlayStation 3 Emulator Updated with AVX-512 Support for AMD Zen 4

The popular PlayStation 3 emulator for PCs, RPCS3, just received a major update that lets it take advantage of the AVX-512 instruction-set on processors based on the AMD Zen 4 microarchitecture (the recently launched Ryzen 7000 series). RPCS3 emulates the PS3's CELL Broadband Engine SoC entirely on CPU, and does not use your GPU to draw any raster graphics. To emulate both a CPU and GPU of that time entirely on a multi-threaded CPU of today is no easy task, but is helped greatly by leveraging the latest instruction-sets. RPCS3 supports an AVX-512 code-path on Intel processors such as the Core i9-11900K "Rocket Lake," but the company has been fidgeting with AVX-512 support on its client processors since 12th Gen "Alder Lake." The developer of RPCS3 in a tweet confirmed that they have enabled AVX-512 support for AMD Zen 4 with the latest build.

AMD "Zen 4" Dies, Transistor-Counts, Cache Sizes and Latencies Detailed

As we await technical documents from AMD detailing its new "Zen 4" microarchitecture, particularly the all-important CPU core Front-End and Branch Prediction units that have contributed two-thirds of the 13% IPC gain over the previous-generation "Zen 3" core, the tech enthusiast community is already decoding images from the Ryzen 7000 series launch presentation. "Skyjuice" presented the first annotation of the "Zen 4" core, revealing its large branch-prediction unit, enlarged micro-op cache, TLB, load/store unit, and dual-pumped 256-bit FPU that enables AVX-512 support. A quarter of the core's die-area is also taken up by the 1 MB dedicated L2 cache.

Chiakokhua (aka Retired Engineer) posted a table detailing the various caches and their latencies, comparing it with those of the "Zen 3" core. As AMD's Mark Papermaster revealed in the Ryzen 7000 launch event, the company has enlarged the micro-op cache of the core from 4 K entries to 6.75 K entries. The L1I and L1D caches remain 32 KB in size, each; while the L2 cache has doubled in size. The enlargement of the L2 cache has slightly increased latency, from 12 cycles to 14. Latency of the shared L3 cache is also up, from 46 cycles to 50 cycles. The reorder buffer (ROB) in the dispatch stage has been enlarged from 256 entries to 320 entries. The L1 branch target buffer (BTB) has increased in size from 1 KB to 1.5 KB.

Latest Y-Cruncher Version Comes with "Zen 4" and AVX512 Optimization

Y-Cruncher is a multi-threaded Pi calculation benchmark. Its author, Alexander Yee, has access to an AMD Ryzen 9 7950X 16-core/32-thread sample, and has developed the latest version 0.7.10 of the Y-Cruncher binary with optimization for the "Zen 4" microarchitecture, and to take advantage of the AVX-512 instruction-set on these chips. Without disclosing the juicy performance numbers obtained in his testing, Yee posted a screenshot of Y-cruncher with the 7950X, on a machine with Windows 11 22Hx, and 64 GB of memory. You know it's optimized, since the multi-core efficiency is as high as 98% (all threads are being saturated with the Pi calculation workload).

Intel Teams Up with Aible to Fast-Track Enterprise Analytics and AI

Intel's collaboration with Aible enables teams across key industries to leverage artificial intelligence and deliver rapid and measurable business impact. This deep collaboration, which includes engineering optimizations and an innovative benchmarking program, enhances Aible's ability to deliver rapid results to its enterprise customers. When paired with Intel processors, Aible's technology provides a serverless-first approach, allowing developers to build and run applications without having to manage servers, and build modern applications with increased agility and lower total cost of ownership (TCO).

"Today's enterprise IT infrastructure leaders face significant challenges building a foundation that is designed to help business teams drive value from AI initiatives in the data center. We've moved past talking about the potential of AI, as business teams across key industries are experiencing measurable business impact within days, using Intel Xeon Scalable processors with built-in Intel software optimizations with Aible," said Kavitha Prasad, Intel vice president and general manager of Datacenter, AI and Cloud Execution and Strategy.

Intel Xeon W9-3495 Sapphire Rapids HEDT CPU with 56 Cores and 112 Threads Appears

Intel's upcoming Sapphire Rapids processors will not only be present in the server sector but will also span the high-end desktop (HEDT) platform. Today, according to the findings of a Twitter user, @InstLatX64, we have an appearance of Intel's upcoming Sapphire Rapids HEDT SKU in Kernel.org boot logs. Named Intel Xeon W9-3495, this model features 56 cores and 112 threads. While there is no specific information about base and boost frequencies, we know that the SKU supports AVX-512 and AMX instructions. This is a welcome addition, as we have seen Intel disable AVX-512 on consumer chips altogether.

With a high core count and additional instructions for Deep Learning, this CPU will power workstations sometimes in the future. With the late arrival of Sapphire Rapids for servers, a HEDT variant should follow.
Return to Keyword Browsing
May 1st, 2024 07:51 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts