News Posts matching #AVX-512

Return to Keyword Browsing

Intel Introduces the Max Series Product Family: Ponte Vecchio and Sapphire Rapids

In advance of Supercomputing '22 in Dallas, Intel Corporation has introduced the Intel Max Series product family with two leading-edge products for high performance computing (HPC) and artificial intelligence (AI): Intel Xeon CPU Max Series (code-named Sapphire Rapids HBM) and Intel Data Center GPU Max Series (code-named Ponte Vecchio). The new products will power the upcoming Aurora supercomputer at Argonne National Laboratory, with updates on its deployment shared today.

The Xeon Max CPU is the first and only x86-based processor with high bandwidth memory, accelerating many HPC workloads without the need for code changes. The Max Series GPU is Intel's highest density processor, packing over 100 billion transistors into a 47-tile package with up to 128 gigabytes (GB) of high bandwidth memory. The oneAPI open software ecosystem provides a single programming environment for both new processors. Intel's 2023 oneAPI and AI tools will deliver capabilities to enable the Intel Max Series products' advanced features.

AMD Rolls Out GCC Enablement for "Zen 4" Processors with Zenver4 Target, Enables AVX-512 Instructions

AMD earlier this week released basic enablement for the GNU Compiler Collections (GCC), which extend "Zen 4" microarchitecture awareness. The "basic enablement patch" for the new Zenver4 target is essentially similar to Zenver3, but with added support for the new AVX-512 instructions, namely AVX512F, AVX512DQ, AVX512IFMA, AVX512CD, AVX512BW, AVX512VL, AVX512BF16, AVX512VBMI, AVX512VBMI2, GFNI, AVX512VNNI, AVX512BITALG, and AVX512VPOPCNTDQ. Besides AVX-512, "Zen 4" is largely identical to its predecessor, architecturally, and so the enablement is rather basic. This should come just in time for software vendors to prepare for next-generation EPYC "Genoa" server processors, or even small/medium businesses building servers with Ryzen 7000-series processors.

Intel Outs First Xeon Scalable "Sapphire Rapids" Benchmarks, On-package Accelerators Help Catch Up with AMD EPYC

Intel in the second day of its InnovatiON event, turned attention to its next-generation Xeon Scalable "Sapphire Rapids" server processors, and demonstrated on-package accelerators. These are fixed-function hardware components that accelerate specific kinds of popular server workloads (i.e. run them faster than a CPU core can). With these, Intel hopes to close the CPU core-count gap it has with AMD EPYC, with the upcoming "Zen 4" EPYC chips expected to launch with up to 96 cores per socket in its conventional variant, and up to 128 cores per socket in its cloud-optimized variant.

Intel's on-package accelerators include AMX (advanced matrix extensions), which accelerate recommendation-engines, natural language processing (NLP), image-recognition, etc; DLB (dynamic load-balancing), which accelerates security-gateway and load-balancing; DSA (data-streaming accelerator), which speeds up the network stack, guest OS, and migration; IAA (in-memory analysis accelerator), which speeds up big-data (Apache Hadoop), IMDB, and warehousing applications; a feature-rich implementation of the AVX-512 instruction-set for a plethora of content-creation and scientific applications; and lastly, the QAT (QuickAssist Technology), with speed-ups for data compression, OpenSSL, nginx, IPsec, etc. Unlike "Ice Lake-SP," QAT is now implemented on the processor package instead of the PCH.

RPCS3 PlayStation 3 Emulator Updated with AVX-512 Support for AMD Zen 4

The popular PlayStation 3 emulator for PCs, RPCS3, just received a major update that lets it take advantage of the AVX-512 instruction-set on processors based on the AMD Zen 4 microarchitecture (the recently launched Ryzen 7000 series). RPCS3 emulates the PS3's CELL Broadband Engine SoC entirely on CPU, and does not use your GPU to draw any raster graphics. To emulate both a CPU and GPU of that time entirely on a multi-threaded CPU of today is no easy task, but is helped greatly by leveraging the latest instruction-sets. RPCS3 supports an AVX-512 code-path on Intel processors such as the Core i9-11900K "Rocket Lake," but the company has been fidgeting with AVX-512 support on its client processors since 12th Gen "Alder Lake." The developer of RPCS3 in a tweet confirmed that they have enabled AVX-512 support for AMD Zen 4 with the latest build.

AMD "Zen 4" Dies, Transistor-Counts, Cache Sizes and Latencies Detailed

As we await technical documents from AMD detailing its new "Zen 4" microarchitecture, particularly the all-important CPU core Front-End and Branch Prediction units that have contributed two-thirds of the 13% IPC gain over the previous-generation "Zen 3" core, the tech enthusiast community is already decoding images from the Ryzen 7000 series launch presentation. "Skyjuice" presented the first annotation of the "Zen 4" core, revealing its large branch-prediction unit, enlarged micro-op cache, TLB, load/store unit, and dual-pumped 256-bit FPU that enables AVX-512 support. A quarter of the core's die-area is also taken up by the 1 MB dedicated L2 cache.

Chiakokhua (aka Retired Engineer) posted a table detailing the various caches and their latencies, comparing it with those of the "Zen 3" core. As AMD's Mark Papermaster revealed in the Ryzen 7000 launch event, the company has enlarged the micro-op cache of the core from 4 K entries to 6.75 K entries. The L1I and L1D caches remain 32 KB in size, each; while the L2 cache has doubled in size. The enlargement of the L2 cache has slightly increased latency, from 12 cycles to 14. Latency of the shared L3 cache is also up, from 46 cycles to 50 cycles. The reorder buffer (ROB) in the dispatch stage has been enlarged from 256 entries to 320 entries. The L1 branch target buffer (BTB) has increased in size from 1 KB to 1.5 KB.

Latest Y-Cruncher Version Comes with "Zen 4" and AVX512 Optimization

Y-Cruncher is a multi-threaded Pi calculation benchmark. Its author, Alexander Yee, has access to an AMD Ryzen 9 7950X 16-core/32-thread sample, and has developed the latest version 0.7.10 of the Y-Cruncher binary with optimization for the "Zen 4" microarchitecture, and to take advantage of the AVX-512 instruction-set on these chips. Without disclosing the juicy performance numbers obtained in his testing, Yee posted a screenshot of Y-cruncher with the 7950X, on a machine with Windows 11 22Hx, and 64 GB of memory. You know it's optimized, since the multi-core efficiency is as high as 98% (all threads are being saturated with the Pi calculation workload).

Intel Teams Up with Aible to Fast-Track Enterprise Analytics and AI

Intel's collaboration with Aible enables teams across key industries to leverage artificial intelligence and deliver rapid and measurable business impact. This deep collaboration, which includes engineering optimizations and an innovative benchmarking program, enhances Aible's ability to deliver rapid results to its enterprise customers. When paired with Intel processors, Aible's technology provides a serverless-first approach, allowing developers to build and run applications without having to manage servers, and build modern applications with increased agility and lower total cost of ownership (TCO).

"Today's enterprise IT infrastructure leaders face significant challenges building a foundation that is designed to help business teams drive value from AI initiatives in the data center. We've moved past talking about the potential of AI, as business teams across key industries are experiencing measurable business impact within days, using Intel Xeon Scalable processors with built-in Intel software optimizations with Aible," said Kavitha Prasad, Intel vice president and general manager of Datacenter, AI and Cloud Execution and Strategy.

Intel Xeon W9-3495 Sapphire Rapids HEDT CPU with 56 Cores and 112 Threads Appears

Intel's upcoming Sapphire Rapids processors will not only be present in the server sector but will also span the high-end desktop (HEDT) platform. Today, according to the findings of a Twitter user, @InstLatX64, we have an appearance of Intel's upcoming Sapphire Rapids HEDT SKU in Kernel.org boot logs. Named Intel Xeon W9-3495, this model features 56 cores and 112 threads. While there is no specific information about base and boost frequencies, we know that the SKU supports AVX-512 and AMX instructions. This is a welcome addition, as we have seen Intel disable AVX-512 on consumer chips altogether.

With a high core count and additional instructions for Deep Learning, this CPU will power workstations sometimes in the future. With the late arrival of Sapphire Rapids for servers, a HEDT variant should follow.

Intel Core i9-13900 "Raptor Lake" Processor Gets a Preview

Intel is preparing to launch its 13th generation of desktop processors codenamed Raptor Lake. Succeeding Alder Lake, the 13th gen design will implement up to eight P-cores with 16 E-cores manufactured on Intel's improved 7+ technology node. Today, we got a performance preview from SiSoftware that has collected SiSoftware Sandra database scores of Intel Core i9-13900 Raptor Lake-S processor. They present an overview of a few benchmarks. Firstly, the SoC features 36 MB of unified L3 cache versus 30 MB in Alder Lake. With DDR5 memory running up to 5600 MT/s and PCIe 5.0, the SoC features the latest IO and memory standards. The big P-cores now lack AVX-512 and feature 2 MB of L2 cache per core. We see 4 MB of L2 cache for a cluster of small E-cores. An exciting addition to E-cores is the AVX/AVX2 support, which is a first for Atom cores.

Regarding testing, the author has collected a few tests that seemed appropriate to compare to the equivalent Alder Lake model. Starting with ALU/FPU tests that benchmark basic arithmetic tasks, Raptor Lake delivered 33% to 50% improvement over Alder Lake. The Raptor Lake design achieved this with 3.7 GHz P-Core and 2.76 GHz E-Core frequency. In vectorized and SIMD tests, the 13th gen design showed only 5% to 8% improvement over the previous generation. For more benchmarks and accurate results, we have to wait for TechPowerUp's test, which will be coming on the release day.

Intel Sapphire Rapids 56-Core ES Processor Boosts to 3.3 GHz at 420 Watts

Intel is slowly transitioning its data center customers to a new processor generation called Sapphire Rapids. Today, thanks to the hardware leaker Yuuki_ans we have more profound insights into the top-end 56-core Sapphire Rapids processor and its power settings. According to the leak, we have information on either Xeon Platinum 8476 or Platinum 8480 designs that are equipped with 56 cores and 112 threads. This model was running at the base frequency of 1.9 GHz and a boost frequency of 3.3 GHz. Single-core can boost to 3.7 GHz if the report is giving a correct reading. Remember that this is only an engineering sample, so the final target speeds could differ. It carries 112 MB of L2 and 105 MB of L3 cache, and this sample was running with 1 TB of DDR5 memory with CL40-39-38-76 timings.

Perhaps the most exciting finding is the power configuration of this SKU. Intel has enabled this CPU to consume 350 Watts in PL1 rating, with up to 420 Watts in PL2 performance mode. The enforced BIOS power limit rating is set at an astonishing 764 Watts, which could happen with AVX-512 enabled. Final TDP ratings are yet to be disclosed; however, these Sapphire Rapids processors are shaping to be relatively power-hungry chips.

Intel is Now Fusing Off AVX-512 support in Alder Lake CPUs

If you have already bought a 12th gen Intel Alder Lake CPU, you could be sitting on a collectors item, as according to Tom's Hardware, Intel is now fusing off AVX-512 support in production. It's possible this could be in preparation for the arrival of the Core "W" series of CPUs that might be replacing the Xeon-W series of processors for Intel. It should be noted that this isn't a rumour, as Tom's Hardware has had an official statement on the matter from Intel.

The statement reads, "Although AVX-512 was not fuse-disabled on certain early Alder Lake desktop products, Intel plans to fuse off AVX-512 on Alder Lake products going forward." As to exactly when this will go into full effect isn't clear, but according to Tom's Hardware, they've already had reports of batches of non-K Alder Lake CPUs that are lacking AVX-512 support. In all fairness to Intel, the company never claimed that its Alder Lake CPUs would support AVX-512 and the support has never been guaranteed to be flawless on the chips that have shipped with it enabled. Intel has also disabled AVX-512 via a microcode update that shipped to motherboard makers in January, but at least some motherboard makers have added a toggle to allow people to re-enable AVX-512 support. It's unlikely that this will affect many potential customers, since AVX-512 instructions aren't widely used in consumer facing software.

Intel Launches Xeon D Processor Built for the Network and Edge

Today, ahead of MWC Barcelona 2022, Intel launched new Intel Xeon D processors: the D-2700 and the D-1700. They are Intel's newest system-on-chip (SoC) built for the software-defined network and edge, with integrated AI and crypto acceleration, built-in Ethernet, support for Intel Time Coordinated Computing (Intel TCC) and Time Sensitive Networking (TSN), and industrial-class reliability. New Intel Xeon D processors extend compute with acceleration beyond the core data center, generating a better overall experience for key network and edge usages and workloads.

"As the industry enters a world of software-defined everything, Intel is delivering programmable platforms for networking and the edge to enable one of the most significant transformations our industry has ever seen. The new Intel Xeon D processor is built for this. Based on the proven and trusted Intel architecture, this processor is designed for a range of use cases to unleash innovation across the network and edge," said Dan Rodriguez, Intel corporate vice president, Network & Edge Group, general manager of the Network Platforms Group.

MSI Partially Reenables AVX-512 Support for Alder Lake-S Processors

Intel's Alder Lake processors have two types of cores present, with two distinct sets of features and capabilities enabled. For example, smaller E-cores don't support the execution of AVX-512 instructions, while the bigger P-cores have support for AVX-512 instructions. So Intel has decided to remove support for it altogether not to create software errors and run into issues with executing AVX-512 code on Alder Lake processors. This happened just months before the launch of Alder Lake, making us see some initial motherboard BIOSes come with AVX-512 enabled from the box. Later on, all motherboard makers pulled the plug on it, and it is a rare sight to see support for it.

However, it seems like MSI is unhappy with the lack of AVX-512, and the company is reenabling partial support for it. According to Xaver Amberger, editor at Igor's Lab, MSI reintroduces selecting microcode version with its MEG Z690 Unify-X motherboard. There is an option for AVX-512 enablement in the menu, and it is indeed a functional one. With BIOS A22, MSI enabled AVX-512 instruction execution, and there are benchmarks to prove it works. This shows an advantage of 512-bit wide execution units of AVX-512 over something like AVX2, which offers only 256-bit wide execution units. In applications such as Y-Cruncher, AVX-512 enabled the CPU to reach higher performance targets while consuming less power.
Return to Keyword Browsing
Nov 21st, 2024 12:10 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts