News Posts matching #CDNA

Return to Keyword Browsing

AMD to Skip RDNA 5: UDNA Takes the Spotlight After RDNA 4

While the current generation of AMD graphics cards employs RDNA 3 at its core, and the upcoming RX 8000 series will feature RDNA 4, the latest leaks suggest RDNA 5 is not in development. Instead, UDNA will succeed RDNA 4, simplifying AMD's GPU roadmap. A credible source on the Chiphell forums, zhangzhonghao, reports that the UDNA-based RX 9000 series and Instinct MI400 AI accelerator will incorporate the same advanced Arithmetic Logic Unit (ALU) designs in both products, reminiscent of AMD's earlier GCN architectures before the CDNA and RDNA split. Sony's next-generation PlayStation 6 is also rumored to adopt UDNA technology. The PS5 and PS5 Pro currently utilize RDNA 2, while the Pro variant integrates elements of RDNA 4 for enhanced ray tracing. The PS6's CPU configuration remains unclear, but speculation revolves around Zen 4 or Zen 5 architectures.

The first UDNA gaming GPUs are expected to enter production by Q2 2026. Interestingly, AMD's RDNA 4 GPUs are anticipated to focus on entry-level to mid-range markets, potentially leaving high-end offerings until the UDNA generation. This strategic pause may allow AMD to refine AI-accelerated technologies like FidelityFX Super Resolution (FSR) 4, aiming to compete with NVIDIA's DLSS. This unification is inspired by NVIDIA's CUDA ecosystem, which supports cross-platform compatibility from laptops to high-performance servers. As AMD sees it, the decision addresses the challenges posed by maintaining separate architectures, which complicate memory subsystem optimizations and hinder forward and backward compatibility. Putting developer resources into RDNA 5 is not economically or strategically wise, given that UDNA is about to take over. Additionally, the company is enabling ROCm software support across all products ranging from consumer Radeon to enterprise Instinct MI. Accelerating software for one platform will translate to the entire product stack.

GIGABYTE Releases Servers with AMD EPYC 9005 Series Processors and AMD Instinct MI325X GPUs

Giga Computing, a subsidiary of GIGABYTE and an industry leader in generative AI servers and advanced cooling technologies, today announced support for AMD EPYC 9005 Series processors with the release of new GIGABYTE servers alongside BIOS updates for some existing GIGABYTE servers using the SP5 platform. This first wave of updates supports over 60 servers and motherboards that customers can choose from that deliver exceptional performance for 5th Generation AMD EPYC processors. In addition, with the launch of the AMD Instinct MI325X accelerator, a newly designed GIGABYTE server was created, and it will be showcased at SC24 (Nov. 19-21) in Atlanta.

New GIGABYTE Servers and Updates
To fill in all possible workload scenarios, using modular design servers to edge servers to enterprise-grade motherboards, these new solutions will ship already supporting AMD EPYC 9005 Series processors. The XV23-ZX0 is one of the many new solutions and it is notable for its modularized server design using two AMD EPYC 9005 processors and supporting up to four GPUs and three additional FHFL slots. It also has 2+2 redundant power supplies on the front-side for ease of access.

AMD Launches Instinct MI325X Accelerator for AI Workloads: 256 GB HBM3E Memory and 2.6 PetaFLOPS FP8 Compute

During its "Advancing AI" conference today, AMD has updated its AI accelerator portfolio with the Instinct MI325X accelerator, designed to succeed its MI300X predecessor. Built on the CDNA 3 architecture, Instinct MI325X brings a suite of improvements over the old SKU. Now, the MI325X features 256 GB of HBM3E memory running at 6 TB/s bandwidth. The capacity memory alone is a 1.8x improvement over the old MI300 SKU, which features 192 GB of regular HBM3 memory. Providing more memory capacity is crucial as upcoming AI workloads are training models with parameter counts measured in trillions, as opposed to billions with current models we have today. When it comes to compute resources, the Instinct MI325X provides 1.3 PetaFLOPS at FP16 and 2.6 PetaFLOPS at FP8 training and inference. This represents a 1.3x improvement over the Instinct MI300.

A chip alone is worthless without a good platform, and AMD decided to make the Instinct MI325X OAM modules a drop-in replacement for the current platform designed for MI300X, as they are both pin-compatible. In systems packing eight MI325X accelerators, there are 2 TB of HBM3E memory running at 48 TB/s memory bandwidth. Such a system achieves 10.4 PetaFLOPS of FP16 and 20.8 PetaFLOPS of FP8 compute performance. The company uses NVIDIA's H200 HGX as reference claims for its performance competitiveness, where the company claims that the Instinct MI325X outperforms NVIDIA H200 HGX system by 1.3x across the board in memory bandwidth, FP16 / FP8 compute performance and 1.8x in memory capacity.

Interview with AMD's Senior Vice President and Chief Software Officer Andrej Zdravkovic: UDNA, ROCm for Radeon, AI Everywhere, and Much More!

A few days ago, we reported on AMD's newest expansion plans for Serbia. The company opened two new engineering design centers with offices in Belgrade and Nis. We were invited to join the opening ceremony and got an exclusive interview with one of AMD's top executives, Andrej Zdravkovic, who is the senior vice president and Chief Software Officer. Previously, we reported on AMD's transition to become a software company. The company has recently tripled its software engineering workforce and is moving some of its best people to support these teams. AMD's plan is spread over a three to five-year timeframe to improve its software ecosystem, accelerating hardware development to launch new products more frequently and to react to changes in software demand. AMD found that to help these expansion efforts, opening new design centers in Serbia would be very advantageous.

We sat down with Andrej Zdravkovic to discuss the purpose of AMD's establishment in Serbia and the future of some products. Zdravkovic is actually an engineer from Serbia, where he completed his Bachelor's and Master's degrees in electrical engineering from Belgrade University. In 1998, Zdravkovic joined ATI and quickly rose through the ranks, eventually becoming a senior director. During his decade-long tenure, Zdravkovic witnessed a significant industry shift as AMD acquired ATI in 2006. After a brief stint at another company, Zdravkovic returned to AMD in 2015, bringing with him a wealth of experience and a unique perspective on the evolution of the graphics and computing industry.
Here is the full interview:

AMD to Unify Gaming "RDNA" and Data Center "CDNA" into "UDNA": Singular GPU Architecture Similar to NVIDIA's CUDA

According to new information from Tom's Hardware, AMD has announced plans to unify its consumer-focused gaming RDNA and data center CDNA graphics architectures into a single, unified design called "UDNA." The announcement was made by AMD's Jack Huynh, Senior Vice President and General Manager of the Computing and Graphics Business Group, at IFA 2024 in Berlin. The goal of the new UDNA architecture is to provide a single focus point for developers so that each optimized application can run on consumer-grade GPU like Radeon RX 7900XTX as well as high-end data center GPU like Instinct MI300. This will create a unification similar to NVIDIA's CUDA, which enables CUDA-focused developers to run applications on everything ranging from laptops to data centers.
Jack HuynhSo, part of a big change at AMD is today we have a CDNA architecture for our Instinct data center GPUs and RDNA for the consumer stuff. It's forked. Going forward, we will call it UDNA. There'll be one unified architecture, both Instinct and client [consumer]. We'll unify it so that it will be so much easier for developers versus today, where they have to choose and value is not improving.

NVIDIA Hit with DOJ Antitrust Probe over AI GPUs, Unfair Sales Tactics and Pricing Alleged

NVIDIA has reportedly been hit with a US Department of Justice (DOJ) antitrust probe over the tactics the company allegedly employs to sell or lease its AI GPUs and data-center networking equipment, "The Information" reported. Shares of the NVIDIA stock fell 3.6% in the pre-market trading on Friday (08/02). The main complainants behind the probe appear to be a special interest group among the customers of AI GPUs, and not NVIDIA's competitors in the AI GPU industry per se. US Senator Elizabeth Warren and US progressives have been most vocal about calling upon the DOJ to investigate antitrust allegations against NVIDIA.

Meanwhile, US officials are reportedly reaching out to NVIDIA's competitors, including AMD and Intel, to gather information about the complaints. NVIDIA holds 80% of the AI GPU market, while AMD, and to a much lesser extent, Intel, have received spillover demand for AI GPUs. "The Information" report says that the complaint alleges NVIDIA pressured cloud customers to buy "multiple products". We don't know what this means, one theory holds that NVIDIA is getting them to commit to buying multiple generations of products (eg: Ampere, Hopper, and over to Blackwell); while another holds that it's getting them to buy multiple kinds of products, which include not just the AI GPUs, but also NVIDIA's first-party server systems and networking equipment. Yet another theory holds that it is bundle first-party software and services to go with the hardware, far beyond the basic software needed to get the hardware to work.

AMD is Becoming a Software Company. Here's the Plan

Just a few weeks ago, AMD invited us to Barcelona as part of a roundtable, to share their vision for the future of the company, and to get our feedback. On site, were prominent AMD leadership, including Phil Guido, Executive Vice President & Chief Commercial Officer and Jack Huynh, Senior VP & GM, Computing and Graphics Business Group. AMD is making changes in a big way to how they are approaching technology, shifting their focus from hardware development to emphasizing software, APIs, and AI experiences. Software is no longer just a complement to hardware; it's the core of modern technological ecosystems, and AMD is finally aligning its strategy accordingly.

The major difference between AMD and NVIDIA is that AMD is a hardware company that makes software on the side to support its hardware; while NVIDIA is a software company that designs hardware on the side to accelerate its software. This is about to change, as AMD is making a pivot toward software. They believe that they now have the full stack of computing hardware—all the way from CPUs, to AI accelerators, to GPUs, to FPGAs, to data-processing and even server architecture. The only frontier left for AMD is software.

Curious "Navi 48 XTX" Graphics Card Prototype Detected in Regulatory Filings

A curiously described graphics card was detected by Olrak29 as it was making it through international shipping. The shipment description for the card reads "GRAPHIC CARD NAVI48 G28201 DT XTX REVB-PRE-CORRELATION AO PLATSI TT(SAMSUNG)-Q2 2024-3A-102-G28201." This can be decoded as a graphics card with the board number "G28201," for the desktop platform. It features a maxed out version of the "Navi 48" silicon, and is based on the B revision of the PCB. It features Samsung-made memory chips, and is dated Q2-2024.

AMD is planning to retreat from the enthusiast segment of gaming graphics cards with the RDNA 4 generation. The company originally entered this segment with the RX 6800 series and RX 6900 series RDNA 2 generation, where it saw unexpected success with the crypto-mining market boom, besides being competitive with the RTX 3080 and RTX 3090. This bust by the time RDNA 3 and the RX 7900 series arrived, and the chip wasn't competitive with NVIDIA's top-end. Around this time, the AI acceleration boom squeezed foundry allocation of all major chipmakers, including AMD, making large chips based on the latest process nodes even less viable for a market such as enthusiast graphics—the company would rather make CDNA AI accelerators with its allocation. Given all this, the company's fastest GPUs from the RDNA 4 generation could be the ones that succeed the current RX 7800 XT and RX 7700 XT, so AMD could capture a slice of the performance segment.

AMD Adds RDNA 4 Generation Navi 44 and MI300X1 GPUs to ROCm Software

AMD has quietly added some interesting codenames to its ROCm hardware support list. The biggest surprise is the appearance of "RDNA 4" and "Navi 44" codenames, hinting at a potential successor to the current RDNA 3 GPU architecture powering AMD's Radeon RX 7000 series graphics cards. The upcoming Radeon RX 8000 series could see Navi 44 SKU with a codename "gfx1200". While details are scarce, the inclusion of RDNA 4 and Navi 44 in the ROCm list suggests AMD is working on a new GPU microarchitecture that could bring significant performance and efficiency gains. While RDNA 4 may be destined for future Radeon gaming GPUs, in the data center GPU compute market, AMD is preparing a CDNA 4 based successors to the MI300 series. However, it appears that we haven't seen all the MI300 variants first. Equally intriguing is the "MI300X1" codename, which appears to reference an upcoming AI-focused accelerator from AMD.

While we wait for more information, we can't decipher whether the Navi 44 GPU SKU is for the high-end or low-end segment. If previous generations are for reference, then the Navi 44 SKU would target the low end of the GPU performance spectrum. The previous generation RDNA 3 had Navi 33 as an entry-level model, whereas the RDNA 2 had a Navi 24 SKU for entry-level GPUs. We have reported on RDNA 4 merely being a "bug correction" generation to fix the perf/Watt curve and offer better efficiency overall. What happens finally, we have to wait and see. AMD could announce more details in its upcoming Computex keynote.

Dr. Lisa Su Responds to TinyBox's Radeon RX 7900 XTX GPU Firmware Problems

The TinyBox AI server system attracted plenty of media attention last week—its creator, George Hotz, decided to build with AMD RDNA 3.0 GPU hardware rather than the expected/traditional choice of CDNA 3.0. Tiny Corp. is a startup firm dealing in neural network frameworks—they currently "write and maintain tinygrad." Hotz & Co. are in the process of assembling rack-mounted 12U TinyBox systems for customers—an individual server houses an AMD EPYC 7532 processor and six XFX Speedster MERC310 Radeon RX 7900 XTX graphics cards. The Tiny Corp. social media account has engaged in numerous NVIDIA vs. AMD AI hardware debates/tirades—Hotz appears to favor the latter, as evidenced in his latest choice of components. ROCm support on Team Red AI Instinct accelerators is fairly mature at this point in time, but a much newer prospect on gaming-oriented graphics cards.

Tiny Corporation's unusual leveraging of Radeon RX 7900 XTX GPUs in a data center configuration has already hit a developmental roadblock. Yesterday, the company's social media account expressed driver-related frustrations in a public forum: "If AMD open sources their firmware, I'll fix their LLVM spilling bug and write a fuzzer for HSA. Otherwise, it's not worth putting tons of effort into fixing bugs on a platform you don't own." Hotz's latest complaint was taken onboard by AMD's top brass—Dr. Lisa Su responded with the following message: "Thanks for the collaboration and feedback. We are all in to get you a good solution. Team is on it." Her software engineers—within a few hours—managed to fling out a set of fixes in Tiny Corporation's direction. Hotz appreciated the quick turnaround, and proceeded to run a model without encountering major stability issues: "AMD sent me an updated set of firmware blobs to try. They are responsive, and there have been big strides in the driver in the last year. It will be good! This training run is almost 5 hours in, hasn't crashed yet." Tiny Corp. drummed up speculation about AMD open sourcing GPU MES firmware—Hotz disclosed that he will be talking (on the phone) to Team Red leadership.

AMD Delivers Leadership Portfolio of Data Center AI Solutions with AMD Instinct MI300 Series

Today, AMD announced the availability of the AMD Instinct MI300X accelerators - with industry leading memory bandwidth for generative AI and leadership performance for large language model (LLM) training and inferencing - as well as the AMD Instinct MI300A accelerated processing unit (APU) - combining the latest AMD CDNA 3 architecture and "Zen 4" CPUs to deliver breakthrough performance for HPC and AI workloads.

"AMD Instinct MI300 Series accelerators are designed with our most advanced technologies, delivering leadership performance, and will be in large scale cloud and enterprise deployments," said Victor Peng, president, AMD. "By leveraging our leadership hardware, software and open ecosystem approach, cloud providers, OEMs and ODMs are bringing to market technologies that empower enterprises to adopt and deploy AI-powered solutions."

GIGABYTE Unveils Next-gen HPC & AI Servers with AMD Instinct MI300 Series Accelerators

GIGABYTE Technology: Giga Computing, a subsidiary of GIGABYTE and an industry leader in high-performance servers, and IT infrastructure, today announced the GIGABYTE G383-R80 for the AMD Instinct MI300A APU and two GIGABYTE G593 series servers for the AMD Instinct MI300X GPU and AMD EPYC 9004 Series processor. As a testament to the performance of AMD Instinct MI300 Series family of products, the El Capitan supercomputer at Lawrence Livermore National Laboratory uses the MI300A APU to power exascale computing. And these new GIGABYTE servers are the ideal platform to propel discoveries in HPC & AI at exascale.⁠

Marrying of a CPU & GPU: G383-R80
For incredible advancements in HPC there is the GIGABYTE G383-R80 that houses four LGA6096 sockets for MI300A APUs. This chip integrates a CPU that has twenty-four AMD Zen 4 cores with a powerful GPU built with AMD CDNA 3 GPU cores. And the chiplet design shares 128 GB of unified HBM3 memory for impressive performance for large AI models. The G383 server has lots of expansion slots for networking, storage, or other accelerators, with a total of twelve PCIe Gen 5 slots. And in the front of the chassis are eight 2.5" Gen 5 NVMe bays to handle heavy workloads such as real-time big data analytics and latency-sensitive workloads in finance and telecom. ⁠

Two-ExaFLOP El Capitan Supercomputer Starts Installation Process with AMD Instinct MI300A

When Lawrence Livermore National Laboratory (LLNL) announced the creation of a two-ExaFLOP supercomputer named El Capitan, we heard that AMD would power it with its Instinct MI300 accelerator. Today, LNLL published a Tweet that states, "We've begun receiving & installing components for El Capitan, @NNSANews' first #exascale #supercomputer. While we're still a ways from deploying it for national security purposes in 2024, it's exciting to see years of work becoming reality." As published images show, HPE racks filled with AMD Instinct MI300 are showing up now at LNLL's facility, and the supercomputer is expected to go operational in 2024. This could mean that November 2023 TOP500 list update wouldn't feature El Capitan, as system enablement would be very hard to achieve in four months until then.

The El Capitan supercomputer is expected to run on AMD Instinct MI300A accelerator, which features 24 Zen4 cores, CDNA3 architecture, and 128 GB of HBM3 memory. All paired together in a four-accelerator configuration goes inside each node from HPE, also getting water cooling treatment. While we don't have many further details on the memory and storage of El Capitan, we know that the system will exceed two ExFLOPS at peak and will consume close to 40 MW of power.

AMD Confirms that Instinct MI300X GPU Can Consume 750 W

AMD recently revealed its Instinct MI300X GPU at their Data Center and AI Technology Premiere event on Tuesday (June 15). The keynote presentation did not provide any details about the new accelerator model's power consumption, but that did not stop one tipster - Hoang Anh Phu - from obtaining this information from Team Red's post-event footnotes. A comparative observation was made: "MI300X (192 GB HBM3, OAM Module) TBP is 750 W, compared to last gen, MI250X TBP is only 500-560 W." A leaked Giga Computing roadmap from last month anticipated server-grade GPUs hitting the 700 W mark.

NVIDIA's Hopper H100 took the crown - with its demand for a maximum of 700 W - as the most power-hungry data center enterprise GPU until now. The MI300X's OCP Accelerator Module-based design now surpasses Team Green's flagship with a slightly greater rating. AMD's new "leadership generative AI accelerator" sports 304 CDNA 3 compute units, which is a clear upgrade over the MI250X's 220 (CDNA 2) CUs. Engineers have also introduced new 24G B HBM3 stacks, so the MI300X can be specced with 192 GB of memory (as a maximum), the MI250X is limited to a 128 GB memory capacity with its slower HBM2E stacks. We hope to see sample units producing benchmark results very soon, with the MI300X pitted against H100.

AMD ROCm 5.5 Now Available on GitHub

As expected with AMD's activity on GitHub, ROCm 5.5 has now been officially released. It brings several big changes, including better RDNA 3 support. While officially focused on AMD's professional/workstation graphics cards, the ROCm 5.5 should also bring better support for Radeon RX 7000 series graphics cards on Linux.

Surprisingly, the release notes do not officially mention RDNA 3 improvements in its release notes, but those have been already tested and confirmed. The GPU support list is pretty short including AMD GFX9, RDNA, and CDNA GPUs, ranging from Radeon VII, Pro VII, W6800, V620, and Instinct lineup. The release notes do mention new HIP enhancements, enhanced stack size limit, raising it from 16k to 128k, new APIs, OpenMP enhancements, and more. You can check out the full release notes, downloads, and more details over at GitHub.

AMD Brings ROCm to Consumer GPUs on Windows OS

AMD has published an exciting development for its Radeon Open Compute Ecosystem (ROCm) users today. Now, ROCm is coming to the Windows operating system, and the company has extended ROCm support for consumer graphics cards instead of only supporting professional-grade GPUs. This development milestone is essential for making AMD's GPU family more competent with NVIDIA and its CUDA-accelerated GPUs. For those unaware, AMD ROCm is a software stack designed for GPU programming. Similarly to NVIDIA's CUDA, ROCm is designed for AMD GPUs and was historically limited to Linux-based OSes and GFX9, CDNA, and professional-grade RDNA GPUs.

However, according to documents obtained by Tom's Hardware (which are behind a login wall), AMD has brought support for ROCm to Radeon RX 6900 XT, Radeon RX 6600, and R9 Fury GPU. What is interesting is not the inclusion of RX 6900 XT and RX 6600 but the support for R9 Fury, an eight-year-old graphics card. Also, what is interesting is that out of these three GPUs, only R9 Fury has full ROCm support, the RX 6900 XT has HIP SDK support, and RX 6600 has only HIP runtime support. And to make matters even more complicated, the consumer-grade R9 Fury GPU has full ROCm support only on Linux and not Windows. The reason for this strange selection of support has yet to be discovered. However, it is a step in the right direction, as AMD has yet to enable more functionality on Windows and more consumer GPUs to compete with NVIDIA.

AMD Shows Instinct MI300 Exascale APU with 146 Billion Transistors

During its CES 2023 keynote, AMD announced its latest Instinct MI300 APU, a first of its kind in the data center world. Combining the CPU, GPU, and memory elements into a single package eliminates latency imposed by long travel distances of data from CPU to memory and from CPU to GPU throughout the PCIe connector. In addition to solving some latency issues, less power is needed to move the data and provide greater efficiency. The Instinct MI300 features 24 Zen4 cores with simultaneous multi-threading enabled, CDNA3 GPU IP, and 128 GB of HBM3 memory on a single package. The memory bus is 8192-bit wide, providing unified memory access for CPU and GPU cores. CLX 3.0 is also supported, making cache-coherent interconnecting a reality.

The Instinct MI300 APU package is an engineering marvel of its own, with advanced chiplet techniques used. AMD managed to do 3D stacking and has nine 5 nm logic chiplets that are 3D stacked on top of four 6 nm chiplets with HBM surrounding it. All of this makes the transistor count go up to 146 billion, representing the sheer complexity of a such design. For performance figures, AMD provided a comparison to Instinct MI250X GPU. In raw AI performance, the MI300 features an 8x improvement over MI250X, while the performance-per-watt is "reduced" to a 5x increase. While we do not know what benchmark applications were used, there is a probability that some standard benchmarks like MLPerf were used. For availability, AMD targets the end of 2023, when the "El Capitan" exascale supercomputer will arrive using these Instinct MI300 APU accelerators. Pricing is unknown and will be unveiled to enterprise customers first around launch.

AMD Instinct MI300 APU to Power El Capitan Exascale Supercomputer

The Exascale supercomputing race is now well underway, as the US-based Frontier supercomputer got delivered, and now we wait to see the remaining systems join the race. Today, during 79th HPC User Forum at Oak Ridge National Laboratory (ORNL), Terri Quinn at Lawrence Livermore National Laboratory (LLNL) delivered a few insights into what El Capitan exascale machine will look like. And it seems like the new powerhouse will be based on AMD's Instinct MI300 APU. LLNL targets peak performance of over two exaFLOPs and a sustained performance of more than one exaFLOP, under 40 megawatts of power. This should require a very dense and efficient computing solution, just like the MI300 APU is.

As a reminder, the AMD Instinct MI300 is an APU that combines Zen 4 x86-64 CPU cores, CDNA3 compute-oriented graphics, large cache structures, and HBM memory used as DRAM on a single package. This is achieved using a multi-chip module design with 2.5D and 3D chiplet integration using Infinity architecture. The system will essentially utilize thousands of these APUs to become one large Linux cluster. It is slated for installation in 2023, with an operating lifespan from 2024 to 2030.

Alleged AMD Instinct MI300 Exascale APU Features Zen4 CPU and CDNA3 GPU

Today we got information that AMD's upcoming Instinct MI300 will be allegedly available as an Accelerated Processing Unit (APU). AMD APUs are processors that combine CPU and GPU into a single package. AdoredTV managed to get ahold of a slide that indicates that AMD Instinct MI300 accelerator will also come as an APU option that combines Zen4 CPU cores and CDNA3 GPU accelerator in a single, large package. With technologies like 3D stacking, MCM design, and HBM memory, these Instinct APUs are positioned to be a high-density compute the product. At least six HBM dies are going to be placed in a package, with the APU itself being a socketed design.

The leaked slide from AdoredTV indicates that the first tapeout is complete by the end of the month (presumably this month), with the first silicon hitting AMD's labs in Q3 of 2022. If the silicon turns out functional, we could see these APUs available sometime in the first half of 2023. Below, you can see an illustration of the AMD Instinct MI300 GPU. The APU version will potentially be of the same size with Zen4 and CDNA3 cores spread around the package. As Instinct MI300 accelerator is supposed to use eight compute tiles, we could see different combinations of CPU/GPU tiles offered. As we await the launch of the next-generation accelerators, we are yet to see what SKUs AMD will bring.

AMD Introduces Instinct MI210 Data Center Accelerator for Exascale-class HPC and AI in a PCIe Form-Factor

AMD today announced a new addition to the Instinct MI200 family of accelerators. Officially titled Instinct MI210 accelerator, AMD tries to bring exascale-class technologies to mainstream HPC and AI customers with this model. Based on CDNA2 compute architecture built for heavy HPC and AI workloads, the card features 104 compute units (CUs), totaling 6656 Streaming Processors (SPs). With a peak engine clock of 1700 MHz, the card can output 181 TeraFLOPs of FP16 half-precision peak compute, 22.6 TeraFLOPs peak FP32 single-precision, and 22.6 TFLOPs peak FP62 double-precision compute. For single-precision matrix (FP32) compute, the card can deliver a peak of 45.3 TFLOPs. The INT4/INT8 precision settings provide 181 TOPs, while MI210 can compute the bfloat16 precision format with 181 TeraFLOPs at peak.

The card uses a 4096-bit memory interface connecting 64 GBs of HMB2e to the compute silicon. The total memory bandwidth is 1638.4 GB/s, while memory modules run at a 1.6 GHz frequency. It is important to note that the ECC is supported on the entire chip. AMD provides an Instinct MI210 accelerator as a PCIe solution, based on a PCIe 4.0 standard. The card is rated for a TDP of 300 Watts and is cooled passively. There are three infinity fabric links enabled, and the maximum bandwidth of the infinity fabric link is 100 GB/s. Pricing is unknown; however, availability is March 22nd, which is the immediate launch date.

AMD places this card directly aiming at NVIDIA A100 80 GB accelerator as far as the targeted segment, with emphasis on half-precision and INT4/INT8 heavy applications.

NVIDIA to Split Graphics and Compute Architecture Naming, "Blackwell" Architecture Spotted

The recent NVIDIA data-leak springs up information on various upcoming graphics parts. Besides "Ada Lovelace," "Hopper," we come across a new codename, "Blackwell." It turns out that NVIDIA is splitting the the graphics and compute architecture naming with the next generation, not unlike what AMD did, with its RDNA and CDNA series. The current "Ampere" architecture is being used both for compute and graphics, with the streaming multiprocessor for the two being slightly different—the compute "Ampere" has more FP64 and Tensor components, while the graphics "Ampere" does away with these in favor of RT cores and graphics-relevant components.

The graphics architecture to succeed GeForce "Ampere" will be GeForce "Ada Lovelace." GPUs in this series are identified in the leaked code as "AD102," "AD103," "AD104," "AD106," "AD107," and "AD10B," succeeding a similar numbering for parts with the "A" (GeForce Ampere) series. The compute architecture succeeding "Ampere" will be codenamed "Hopper." with parts in the series being codenamed "GH100" and "GH202." Another compute or datacenter architecture is "Blackwell," with parts being codenamed "GB100" and "GB102." From all accounts, NVIDIA is planning to launch the GeForce 40-series "Ada" graphics card lineup in the second half of 2022. The company is in need of a similar refresh for its compute product lineup, and could debut "Hopper" either toward the end of 2022 or next year. "Blackwell" could follow "Hopper."

AMD to Implement TSMC SoIC Tech With Upcoming HPC Chips

AMD will debut TSMC's ambitious System-on-Integrated-Chips (SoIC) technology with its upcoming HPC chips, according to a DigiTimes report. A step toward rivaling Intel's Foveros 3-D chip stacking technology, SoIC will enable AMD to stack logic, memory, and I/O as separate chips within a single package. The article references a next-generation "HPC" chip, although it didn't delve into what this could be. Logically, AMD would want to integrate its EPYC and MI accelerator lines into a single package that can be used in HPCs. Such a product would combine its Zen-series x86-64 serial processing, with CDNA-series scalar processing, expertise in memory, leveraging large on-die victim-caches, and high-bandwidth memory (HBM); along with next-gen I/O.

AMD Instinct MI200 to Launch This Year with MCM Design

AMD is slowly preparing the next-generation of its compute-oriented flagship graphics card design called Instinct MI200 GPU. It is the card of choice for the exascale Frontier supercomputer, which is expected to make a debut later this year at the Oak Ridge Leadership Computing Facility. With the supercomputer planned for the end of this year, AMD Instinct MI200 is also going to get launched eight a bit before or alongside it. The Frontier exascale supercomputer is supposed to bring together AMD's next-generation Trento EPYC CPUs with Instinct MI200 GPU compute accelerators. However, it seems like AMD will utilize some new technologies for the making of this supercomputer. While we do not know what Trento EPYC CPUs will look like, it seems like Instinct MI200 GPU is going to feature a multi-chip-module (MCM) design with the new CDNA 2 GPU architecture. With this being the only information about the GPU, we have to wait a bit to find out more details.
AMD CDNA Die

AMD Announces CDNA Architecture. Radeon MI100 is the World's Fastest HPC Accelerator

AMD today announced the new AMD Instinct MI100 accelerator - the world's fastest HPC GPU and the first x86 server GPU to surpass the 10 teraflops (FP64) performance barrier. Supported by new accelerated compute platforms from Dell, Gigabyte, HPE, and Supermicro, the MI100, combined with AMD EPYC CPUs and the ROCm 4.0 open software platform, is designed to propel new discoveries ahead of the exascale era.

Built on the new AMD CDNA architecture, the AMD Instinct MI100 GPU enables a new class of accelerated systems for HPC and AI when paired with 2nd Gen AMD EPYC processors. The MI100 offers up to 11.5 TFLOPS of peak FP64 performance for HPC and up to 46.1 TFLOPS peak FP32 Matrix performance for AI and machine learning workloads. With new AMD Matrix Core technology, the MI100 also delivers a nearly 7x boost in FP16 theoretical peak floating point performance for AI training workloads compared to AMD's prior generation accelerators.

AMD Eyes Mid-November CDNA Debut with Instinct MI100, "World's Fastest FP64 Accelerator"

AMD is eyeing a mid-November debut for its CDNA compute architecture with the Instinct MI100 compute accelerator card. CDNA is a fork of RDNA for headless GPU compute accelerators with large SIMD resources. An Aroged report pins the launch of the MI100 at November 16, 2020, according to leaked AMD documents it dug up. The Instinct MI100 will eye a slice of the same machine intelligence pie NVIDIA is seeking to dominate with its A100 Tensor Core compute accelerator.

It appears like the first MI100 cards will be built in the add-in-board form-factor with PCI-Express 4.0 x16 interfaces, although older reports do predict AMD creating a socketed variant of its Infinity Fabric interconnect for machines with larger numbers of these compute processors. In the leaked document, AMD claims that the Instinct MI100 is the "world's highest double-precision accelerator for machine learning, HPC, cloud compute, and rendering systems." This is an especially big claim given that the A100 Tensor Core features FP64 CUDA cores based on the "Ampere" architecture. Then again, given that AMD claims that the RDNA2 graphics architecture is clawing back at NVIDIA with performance at the high-end, the competitiveness of the Instinct MI100 against the A100 Tensor Core cannot be discounted.
Return to Keyword Browsing
Nov 21st, 2024 04:42 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts