News Posts matching #Instinct MI250

Return to Keyword Browsing

AMD to Acquire Silo AI to Expand Enterprise AI Solutions Globally

AMD today announced the signing of a definitive agreement to acquire Silo AI, the largest private AI lab in Europe, in an all-cash transaction valued at approximately $665 million. The agreement represents another significant step in the company's strategy to deliver end-to-end AI solutions based on open standards and in strong partnership with the global AI ecosystem. The Silo AI team consists of world-class AI scientists and engineers with extensive experience developing tailored AI models, platforms and solutions for leading enterprises spanning cloud, embedded and endpoint computing markets.

Silo AI CEO and co-founder Peter Sarlin will continue to lead the Silo AI team as part of the AMD Artificial Intelligence Group, reporting to AMD senior vice president Vamsi Boppana. The acquisition is expected to close in the second half of 2024.

TOP500: Frontier Keeps Top Spot, Aurora Officially Becomes the Second Exascale Machine

The 63rd edition of the TOP500 reveals that Frontier has once again claimed the top spot, despite no longer being the only exascale machine on the list. Additionally, a new system has found its way into the Top 10.

The Frontier system at Oak Ridge National Laboratory in Tennessee, USA remains the most powerful system on the list with an HPL score of 1.206 EFlop/s. The system has a total of 8,699,904 combined CPU and GPU cores, an HPE Cray EX architecture that combines 3rd Gen AMD EPYC CPUs optimized for HPC and AI with AMD Instinct MI250X accelerators, and it relies on Cray's Slingshot 11 network for data transfer. On top of that, this machine has an impressive power efficiency rating of 52.93 GFlops/Watt - putting Frontier at the No. 13 spot on the GREEN500.

AMD Delivers Leadership Portfolio of Data Center AI Solutions with AMD Instinct MI300 Series

Today, AMD announced the availability of the AMD Instinct MI300X accelerators - with industry leading memory bandwidth for generative AI and leadership performance for large language model (LLM) training and inferencing - as well as the AMD Instinct MI300A accelerated processing unit (APU) - combining the latest AMD CDNA 3 architecture and "Zen 4" CPUs to deliver breakthrough performance for HPC and AI workloads.

"AMD Instinct MI300 Series accelerators are designed with our most advanced technologies, delivering leadership performance, and will be in large scale cloud and enterprise deployments," said Victor Peng, president, AMD. "By leveraging our leadership hardware, software and open ecosystem approach, cloud providers, OEMs and ODMs are bringing to market technologies that empower enterprises to adopt and deploy AI-powered solutions."

Frontier Remains As Sole Exaflop Machine on TOP500 List

Increasing its HPL score from 1.02 Eflop/s in November 2022 to an impressive 1.194 Eflop/s on this list, Frontier was able to improve upon its score after a stagnation between June 2022 and November 2022. Considering exascale was only a goal to aspire to just a few years ago, a roughly 17% increase here is an enormous success. Additionally, Frontier earned a score of 9.95 Eflop/s on the HLP-MxP benchmark, which measures performance for mixed-precision calculation. This is also an increase over the 7.94 EFlop/s that the system achieved on the previous list and nearly 10 times more powerful than the machine's HPL score. Frontier is based on the HPE Cray EX235a architecture and utilizes AMD EPYC 64C 2 GHz processors. It also has 8,699,904 cores and an incredible energy efficiency rating of 52.59 Gflops/watt. It also relies on gigabit ethernet for data transfer.

Giga Computing Leaked Server Roadmap Points to 600 W CPUs & 700 W GPUs

A leaked roadmap (that seems to be authored) by Giga Computing provides an interesting peak into the future of next generation enterprise-oriented CPUs and GPUs. TDP details of Intel, AMD and NVIDIA hardware are featured within the presentation slide - and all indications point to a trend of continued power consumption growth. Intel's server CPU lineups, including fourth generation Sapphire Rapids-SP and fifth-gen Emerald Rapids-SP Xeon chips, are projected to hit maximum TGPs of 350 W by mid-2024. Team Blue's sixth gen Granite Rapids is expected to arrive in the latter half of 2024, and Gigabyte's leaked roadmap points to a push into 500 W territories going forward into 2025.

AMD's Zen 5-based Turin server CPUs are expected to ship by the second half of 2024, and power consumption is estimated to hit a maximum of 600 W - representing a 50% increase over the Zen 4-based Genoa family. The 2024 NVIDIA PCIe GPU lineup is likely hitting TDPs of up to 500 W, it is rumored that these enterprise cards will be based on the Blackwell chip architecture - set to succeed current generation H100 "Hopper" PCIe accelerators (featuring 350-450 W TDPs). It is possible that AMD's Instinct-class PCIe accelerator family will become the direct competition, these cards are rated up to 400 W. The AMD Instinct MI250 OAM category has a maximum rating of 560 W. The NVIDIA Grace and Grace Hopper CPU Superchips are said to feature 600 W and 1000 W TDPs (respectively).

AMD Shows Instinct MI300 Exascale APU with 146 Billion Transistors

During its CES 2023 keynote, AMD announced its latest Instinct MI300 APU, a first of its kind in the data center world. Combining the CPU, GPU, and memory elements into a single package eliminates latency imposed by long travel distances of data from CPU to memory and from CPU to GPU throughout the PCIe connector. In addition to solving some latency issues, less power is needed to move the data and provide greater efficiency. The Instinct MI300 features 24 Zen4 cores with simultaneous multi-threading enabled, CDNA3 GPU IP, and 128 GB of HBM3 memory on a single package. The memory bus is 8192-bit wide, providing unified memory access for CPU and GPU cores. CLX 3.0 is also supported, making cache-coherent interconnecting a reality.

The Instinct MI300 APU package is an engineering marvel of its own, with advanced chiplet techniques used. AMD managed to do 3D stacking and has nine 5 nm logic chiplets that are 3D stacked on top of four 6 nm chiplets with HBM surrounding it. All of this makes the transistor count go up to 146 billion, representing the sheer complexity of a such design. For performance figures, AMD provided a comparison to Instinct MI250X GPU. In raw AI performance, the MI300 features an 8x improvement over MI250X, while the performance-per-watt is "reduced" to a 5x increase. While we do not know what benchmark applications were used, there is a probability that some standard benchmarks like MLPerf were used. For availability, AMD targets the end of 2023, when the "El Capitan" exascale supercomputer will arrive using these Instinct MI300 APU accelerators. Pricing is unknown and will be unveiled to enterprise customers first around launch.

AMD-Powered Frontier Supercomputer Faces Difficulties, Can't Operate a Day without Issues

When AMD announced that the company would deliver the world's fastest supercomputer, Frontier, the company also took a massive task to provide a machine capable of producing one ExaFLOP of total sustained ability to perform computing tasks. While the system is finally up and running, making a machine of that size run properly is challenging. In the world of High-Performance Computing, getting the hardware is only a portion of running the HPC center. In an interview with InsideHPC, Justin Whitt, program director for the Oak Ridge Leadership Computing Facility (OLCF), provided insight into what it is like to run the world's fastest supercomputer and what kinds of issues it is facing.

The Frontier system is powered by AMD EPYC 7A53s "Trento" 64-core 2.0 GHz CPUs and Instinct MI250X GPUs. Interconnecting everything is the HPE (Cray) Slingshot 64-port switch, which is responsible for sending data in and out of compute blades. The recent interview points out a rather interesting finding: exactly AMD Instinct MI250X GPUs and Slingshot interconnect cause hardware troubles for the Frontier. "It's mostly issues of scale coupled with the breadth of applications, so the issues we're encountering mostly relate to running very, very large jobs using the entire system … and getting all the hardware to work in concert to do that," says Justin Whitt. In addition to the limits of scale "The issues span lots of different categories, the GPUs are just one. A lot of challenges are focused around those, but that's not the majority of the challenges that we're seeing," he said. "It's a pretty good spread among common culprits of parts failures that have been a big part of it. I don't think that at this point that we have a lot of concern over the AMD products. We're dealing with a lot of the early-life kind of things we've seen with other machines that we've deployed, so it's nothing too out of the ordinary."

AMD Details Instinct MI200 Series Compute Accelerator Lineup

AMD today announced the new AMD Instinct MI200 series accelerators, the first exascale-class GPU accelerators. AMD Instinct MI200 series accelerators includes the world's fastest high performance computing (HPC) and artificial intelligence (AI) accelerator,1 the AMD Instinct MI250X.

Built on AMD CDNA 2 architecture, AMD Instinct MI200 series accelerators deliver leading application performance for a broad set of HPC workloads. The AMD Instinct MI250X accelerator provides up to 4.9X better performance than competitive accelerators for double precision (FP64) HPC applications and surpasses 380 teraflops of peak theoretical half-precision (FP16) for AI workloads to enable disruptive approaches in further accelerating data-driven research.
Return to Keyword Browsing
Dec 19th, 2024 10:16 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts