News Posts matching #Instinct

Return to Keyword Browsing

Next‑Gen HBM4 to HBM8: Toward Multi‑Terabyte Memory on 15,000 W Accelerators

In a joint briefing this week, KAIST's Memory Systems Laboratory and TERA's Interconnection and Packaging group presented a forward-looking roadmap for High Bandwidth Memory (HBM) standards and the accelerator platforms that will employ them. Shared via Wccftech and VideoCardz, the outline covers five successive generations, from HBM4 to HBM8, each promising substantial gains in capacity, bandwidth, and packaging sophistication. First up is HBM4, targeted for a 2026 rollout in AI GPUs and data center accelerators. It will deliver approximately 2 TB/s per stack at an 8 Gbps pin rate over a 2,048-bit interface. Die stacks will reach 12 to 16 layers, yielding 36-48 GB per package with a 75 W power envelope. NVIDIA's upcoming Rubin series and AMD's Instinct MI500 cards are slated to employ HBM4, with Rubin Ultra doubling the number of memory stacks from eight to sixteen and AMD targeting up to 432 GB per device.

Looking to 2029, HBM5 maintains an 8 Gbps speed but doubles the I/O lanes to 4,096 bits, boosting throughput to 4 TB/s per stack. Power rises to 100 W and capacity scales to 80 GB using 16‑high stacks of 40 Gb dies. NVIDIA's tentative Feynman accelerator is expected to be the first HBM5 adopter, packing 400-500 GB of memory into a multi-die package and drawing more than 4,400 W of total power. By 2032, HBM6 will double pin speeds to 16 Gbps and increase bandwidth to 8 TB/s over 4,096 lanes. Stack heights can grow to 20 layers, supporting up to 120 GB per stack at 120 W. Immersion cooling and bumpless copper-copper bonding will become the norm. The roadmap then predicts HBM7 in 2035, which includes 24 Gbps speeds, 8,192-bit interfaces, 24 TB/s throughput, and up to 192 GB per stack at 160 W. NVIDIA is preparing a 15,360 W accelerator to accommodate this monstrous memory.

AMD's Answer to AI Advancement: ROCm 7.0 Is Here

In August, AMD will release ROCm 7, its open computing platform for high‑performance computing, machine learning, and scientific applications. This version will support a range of hardware, from Ryzen AI-equipped laptops to Radeon AI Pro desktop cards and server-grade Instinct GPUs, which have just received an update. Before the end of 2025, ROCm 7 will be integrated directly into Linux and Windows, allowing for a seamless installation process with just a few clicks. AMD isn't planning to update ROCm once every few months, either. Instead, developers will receive day-zero fixes and a major update every two weeks, complete with performance enhancements and new features. Additionally, a dedicated Dev Cloud will provide everyone with instant access to the latest AMD hardware for testing and experimentation.

Early benchmarks are encouraging. On one test, an Instinct MI300X running ROCm 7 reached roughly three times the speed recorded with the original ROCm 6 release. Of course, your mileage will vary depending on model choice, quantization, and other factors. This shift follows comments from AMD's Senior Vice President and Chief Software Officer, Andrej Zdravkovic, whom we interviewed last September. He emphasized ROCm's open-source design and the utility of HIPIFY, a tool that converts CUDA code to run on ROCm. This will enable a full-scale ROCm transition, now accelerated by a 3x performance uplift simply by updating the software version. If ROCm 7 lives up to its promise, AMD could finally unlock the potential of its hardware across devices, both big and small, and provide NVIDIA with good competition in the coming years.

Micron HBM Designed into Leading AMD AI Platform

Micron Technology, Inc. today announced the integration of its HBM3E 36 GB 12-high offering into the upcoming AMD Instinct MI350 Series solutions. This collaboration highlights the critical role of power efficiency and performance in training large AI models, delivering high-throughput inference and handling complex HPC workloads such as data processing and computational modeling. Furthermore, it represents another significant milestone in HBM industry leadership for Micron, showcasing its robust execution and the value of its strong customer relationships.

Micron HBM3E 36 GB 12-high solution brings industry-leading memory technology to AMD Instinct MI350 Series GPU platforms, providing outstanding bandwidth and lower power consumption. The AMD Instinct MI350 Series GPU platforms, built on AMD advanced CDNA 4 architecture, integrate 288 GB of high-bandwidth HBM3E memory capacity, delivering up to 8 TB/s bandwidth for exceptional throughput. This immense memory capacity allows Instinct MI350 series GPUs to efficiently support AI models with up to 520 billion parameters—on a single GPU. In a full platform configuration, Instinct MI350 Series GPUs offers up to 2.3 TB of HBM3E memory and achieves peak theoretical performance of up to 161 PFLOPS at FP4 precision, with leadership energy efficiency and scalability for high-density AI workloads. This tightly integrated architecture, combined with Micron's power-efficient HBM3E, enables exceptional throughput for large language model training, inference and scientific simulation tasks—empowering data centers to scale seamlessly while maximizing compute performance per watt. This joint effort between Micron and AMD has enabled faster time to market for AI solutions.

Giga Computing Joins AMD Advancing AI 2025 to Share Advanced Cooling AI Solutions for AMD Instinct MI355X and MI350X GPUs

Giga Computing, a subsidiary of GIGABYTE and an industry leader in generative AI servers and advanced cooling technologies, today announced participation at AMD Advancing AI 2025 to join conversations with AI thought leaders and to share powerful GIGABYTE servers for AI innovations. This one-day event will be highlighted by the keynote from AMD's Dr. Lisa Su and afterwords attendees will join customer breakout sessions, workshops, and more, including discussions with the Giga Computing team.

At AMD Advancing AI Day, GIGABYTE servers demonstrate powerful solutions for AMD Instinct MI350X and MI355X GPUs. The new server platforms are highly efficient and compute dense, and the GIGABYTE G4L3 series exemplifies this with its support for direct liquid cooling (DLC) technology for the MI355X GPU. In traditional data centers without liquid cooling infrastructure, the GIGABYTE G893 Series provides a reliable air-cooled platform for the MI350X GPU. Together, these platforms showcase GIGABYTE's readiness to meet diverse deployment needs—whether maximizing performance with liquid cooling or ensuring broad compatibility in traditional air-cooled environments. With support for the latest AMD Instinct GPUs, GIGABYTE is driving the next wave of AI innovation.

AMD Previews 432 GB HBM4 Instinct MI400 GPUs and Helios Rack‑Scale AI Solution

At its "Advancing AI 2025" event, AMD rolled out its new Instinct MI350 lineup on the CDNA 4 architecture and teased the upcoming UDNA-based AI accelerator. True to its roughly one‑year refresh rhythm, the company confirmed that the Instinct MI400 series will land in early 2026, showcasing a huge leap in memory, interconnect bandwidth, and raw compute power. Each MI400 card features twelve HBM4 stacks, providing a whopping 432 GB of on-package memory and pushing nearly 19.6 TB/s of memory bandwidth. Those early HBM4 modules deliver approximately 1.6 TB/s each, just shy of the 2 TB/s mark. On the compute front, AMD pegs the MI400 at 20 PetaFLOPS of FP8 throughput and 40 PetaFLOPS of FP4, doubling the sparse-matrix performance of today's MI355X cards. But the real game‑changer is how AMD is scaling those GPUs. Until now, you could connect up to eight cards via Infinity Fabric, and anything beyond that had to go over Ethernet.

The MI400's upgraded fabric link now offers 300 GB/s, nearly twice the speed of the MI350 series, allowing you to build full-rack clusters without relying on slower networks. That upgrade paves the way for "Helios," AMD's fully integrated AI rack solution. It combines upcoming EPYC "Venice" CPUs with MI400 GPUs and trim-to-fit networking gear, offering a turnkey setup for data center operators. AMD didn't shy away from comparisons, either. A Helios rack with 72 MI400 cards delivers approximately 3.1 ExaFLOPS of tensor performance and 31 TB of HBM4 memory. NVIDIA's Vera Rubin system, slated to feature 72 GPUs and 288 GB of memory each, is expected to achieve around 3.6 ExaFLOPS, with AMD's capabilities surpassing it in both bandwidth and capacity. And if that's not enough, whispers of a beefed‑up MI450X IF128 system are already swirling. Due in late 2026, it would directly link 128 GPUs with Infinity Fabric at 1.8 TB/s bidirectional per device, unlocking truly massive rack-scale AI clusters.

Compal Optimizes AI Workloads with AMD Instinct MI355X at AMD Advancing AI 2025 and International Supercomputing Conference 2025

As AI computing accelerates toward higher density and greater energy efficiency, Compal Electronics (Compal; Stock Ticker: 2324.TW), a global leader in IT and computing solutions, unveiled its latest high-performance server platform: SG720-2A/ OG720-2A at both AMD Advancing AI 2025 in the U.S. and the International Supercomputing Conference (ISC) 2025 in Europe. It features the AMD Instinct MI355X GPU architecture and offers both single-phase and two-phase liquid cooling configurations, showcasing Compal's leadership in thermal innovation and system integration. Tailored for next-generation generative AI and large language model (LLM) training, the SG720-2A/OG720-2A delivers exceptional flexibility and scalability for modern data center operations, drawing significant attention across the industry.

With generative AI and LLMs driving increasingly intensive compute demands, enterprises are placing greater emphasis on infrastructure that offers both performance and adaptability. The SG720-2A/OG720-2A emerges as a robust solution, combining high-density GPU integration and flexible liquid cooling options, positioning itself as an ideal platform for next-generation AI training and inference workloads.

Supermicro Delivers Liquid-Cooled and Air-Cooled AI Solutions with AMD Instinct MI350 Series GPUs and Platforms

Supermicro, Inc., a Total IT Solution Provider for AI, Cloud, Storage, and 5G/Edge, is announcing that both liquid-cooled and air-cooled GPU solutions will be available with the new AMD Instinct MI350 series GPUs, optimized for unparalleled performance, maximum scalability, and efficiency. The Supermicro H14 generation of GPU optimized solutions featuring dual AMD EPYC 9005 CPUs along with the AMD Instinct MI350 series GPUs, are designed for organizations seeking maximum performance at scale, while reducing the total cost of ownership for their AI-driven data centers.

"Supermicro continues to lead the industry with the most experience in delivering high-performance systems designed for AI and HPC applications," said Charles Liang, president and CEO of Supermicro. "Our Data Center Building Block Solutions enable us to quickly deploy end-to-end data center solutions to market, bringing the latest technologies for the most demanding applications. The addition of the new AMD Instinct MI350 series GPUs to our GPU server lineup strengthens and expands our industry-leading AI solutions and gives customers greater choice and better performance as they design and build the next generation of data centers."

AMD Instinct MI355X Draws up to 1,400 Watts in OAM Form Factor

Tomorrow evening, AMD will host its "Advancing AI" livestream to introduce the Instinct MI350 series, a new line of GPU accelerators designed for large-scale AI training and inference. First shown in prototype form at ISC 2025 in Hamburg just a day ago, each MI350 card features 288 GB of HBM3E memory, delivering up to 8 TB/s of sustained bandwidth. Customers can choose between the single-card MI350X and the higher-clocked MI355X or opt for a full eight-GPU platform that aggregates to over 2.3 TB of memory. Both chips are built on the CDNA 4 architecture, which now supports four different precision formats: FP16, FP8, FP6, and FP4. The addition of FP6 and FP4 is designed to boost throughput in modern AI workloads, where models of tomorrow with tens of trillions of parameters are trained on FP6 and FP4.

In half-precision tests, the MI350X achieves 4.6 PetaFLOPS on its own and 36.8 PetaFLOPS in eight-GPU platform form, while the MI355X surpasses those numbers, reaching 5.03 PetaFLOPS and just over 40 PetaFLOPS. AMD is also aiming to improve energy efficiency by a factor of thirty compared with its previous generation. The MI350X card runs within a 1,000 Watt power envelope and relies on air cooling, whereas the MI355X steps up to 1,400 Watts and is intended for direct-liquid cooling setups. That 400 Watt increase puts it right at NVIDIA's upcoming GB300 "Grace Blackwell Ultra" superchip, which is also a 1,400 W design. With memory capacity, raw computing, and power efficiency all pushed to new heights, the question remains whether real-world benchmarks will match these ambitious specifications. AMD now only lacks platform scaling beyond eight GPUs, which the Instinct MI400 series will address.

AMD's Open AI Software Ecosystem Strengthened Again, Following Acquisition of Brium

At AMD, we're committed to building a high-performance, open AI software ecosystem that empowers developers and drives innovation. Today, we're excited to take another step forward with the acquisition of Brium, a team of world-class compiler and AI software experts with deep expertise in machine learning, AI inference, and performance optimization. Brium brings advanced software capabilities that strengthen our ability to deliver highly optimized AI solutions across the entire stack. Their work in compiler technology, model execution frameworks, and end-to-end AI inference optimization will play a key role in enhancing the efficiency and flexibility of our AI platform.

This acquisition strengthens our foundation for long-term innovation. It reflects our strategic commitment to AI, particularly to the developers who are building the future of intelligent applications. It is also the latest in a series of targeted investments, following the acquisitions of Silo AI, Nod.ai, and Mipsology, that together advance our ability to support the open-source software ecosystem and deliver optimized performance on AMD hardware.

AMD & HUMAIN Reveal Formation of $10 Billion Strategic Collab, Aimed at Advancing Global AI

AMD and HUMAIN, Saudi Arabia's new AI enterprise, today announced a landmark agreement to build the world's most open, scalable, resilient, and cost-efficient AI infrastructure, that will power the future of global intelligence through a network of AMD-based AI computing centers stretching from the Kingdom of Saudi Arabia to the United States. As part of the agreement, the parties will invest up to $10B to deploy 500 megawatts of AI compute capacity over the next five years. The AI superstructure built by AMD and HUMAIN will be open by design, accessible at scale, and optimized to power AI workloads across enterprise, start-up and sovereign markets. HUMAIN will oversee end-to-end delivery, including hyperscale data center, sustainable power systems, and global fiber interconnects, and AMD will provide the full spectrum of the AMD AI compute portfolio and the AMD ROCm open software ecosystem.

"At AMD, we have a bold vision to enable the future of AI everywhere—bringing open, high-performance computing to every developer, AI start-up and enterprise around the world," said Dr. Lisa Su, Chair and CEO, AMD. "Our investment with HUMAIN is a significant milestone in advancing global AI infrastructure. Together, we are building a globally significant AI platform that delivers performance, openness and reach at unprecedented levels." With initial deployments already underway across key global regions, the collaboration is on track to activate multi-exaflop capacity by early 2026, supported by next-gen AI silicon, modular data center zones, and a developer-enablement focused software platform stack built around open standards and interoperability.

Vultr Cloud Platform Broadened with AMD EPYC 4005 Series Processors

Vultr, the world's largest privately-held cloud infrastructure company, today announced that it is one of the first cloud providers to offer the new AMD EPYC 4005 Series processors. The AMD EPYC 4005 Series processors will be available on the Vultr platform, enabling enterprise-class features and leading performance for businesses and hosted IT service providers. The AMD EPYC 4005 Series processors extend the broad AMD EPYC processor family, powering a new line of cost-effective systems designed for growing businesses and hosted IT services providers that demand performance, advanced technologies, energy efficiency, and affordability. Servers featuring the high-performance AMD EPYC 4005 Series CPUs with streamlined memory and I/O feature sets are designed to deliver compelling system price-to-performance metrics on key customer workloads. Meanwhile, the combination of up to 16 SMT-capable cores and DDR5 memory in the AMD EPYC 4005 Series processors enables smooth execution of business-critical workloads, while maintaining the thermal and power efficiency characteristics crucial for affordable compute environments.

"Vultr is committed to delivering the most advanced cloud infrastructure with unrivaled price-to-performance," said J.J. Kardwell, CEO of Vultr. "The AMD EPYC 4005 Series provides straightforward deployment, scalability, high clock speed, energy efficiency, and best-in-class performance. Whether you are a business striving to scale reliably or a developer crafting the next groundbreaking innovation, these solutions are designed to deliver exceptional value and meet demanding requirements now and in the future." Vultr's launch of systems featuring the AMD EPYC 4245P and AMD EPYC 4345P processors will expand the company's robust line of Bare Metal solutions. Vultr will also feature the AMD EPYC 4345P as part of its High Frequency Compute (HFC) offerings for organizations requiring the highest clock speeds and access to locally-attached NVMe storage.

Report: Customers Show Little Interest in AMD Instinct MI325X Accelerators

AMD's Instinct MI325X accelerator has struggled to gain traction with large customers, according to extensive data from SemiAnalysis. Launched in Q2 2025, the MI325X arrived roughly nine months after NVIDIA's H200 and concurrently with NVIDIA's "Blackwell" mass-production roll-out. That timing proved unfavourable, as many buyers opted instead for Blackwell's superior cost-per-performance ratio. Early interest from Microsoft in 2024 failed to translate into repeat orders. After the initial test purchases, Microsoft did not place any further commitments. In response, AMD reduced its margin expectations in an effort to attract other major clients. Oracle and a handful of additional hyperscalers have since expressed renewed interest, but these purchases remain modest compared with NVIDIA's volume.

A fundamental limitation of the MI325X is its eight-GPU scale-up capacity. By contrast, NVIDIA's rack-scale GB200 NVL72 supports up to 72 GPUs in a single cluster. For large-scale AI inference and frontier-level reasoning workloads, that difference is decisive. AMD positioned the MI325X against NVIDIA's air-cooled HGX B200 NVL8 and HGX B300 NVL16 modules. Even in that non-rack-scale segment, NVIDIA maintains an advantage in both raw performance and total-cost-of-ownership efficiency. Nonetheless, there remains potential for the MI325X in smaller-scale deployments that do not require extensive GPU clusters. Smaller model inference should be sufficient for eight GPU clusters, where lots of memory bandwidth and capacity are the primary needs. AMD continues to improve its software ecosystem and maintain competitive pricing, so AI labs developing mid-sized AI models may find the MI325X appealing.

GIGABYTE to Present End-to-End AI Portfolio at COMPUTEX 2025

GIGABYTE Technology, a global leader in computing innovation, will return to COMPUTEX 2025 from May 20 to 23 under the theme "Omnipresence of Computing: AI Forward." Demonstrating how GIGABYTE's complete spectrum of solutions spanning the AI lifecycle, from data center training to edge deployment and end-user applications reshapes the infrastructure to meet the next-gen AI demands.

⁠As generative AI continues to evolve, so do the demands for handling massive token volumes, real-time data streaming, and high-throughput compute environments. GIGABYTE's end-to-end portfolio - ranging from rack-scale infrastructure to servers, cooling systems, embedded platforms, and personal computing—forms the foundation to accelerate AI breakthroughs across industries.

AMD Discusses "World Changing" LUMI Supercomputer - Powered by EPYC CPUs & Instinct GPUs

If you're a fan of science fiction movies, you've probably seen the story where countries come together to avert or overcome a crisis. These films usually begin with some unexpected dangerous event—maybe an alien invasion, a pandemic or rogue robots. Earth's smartest scientists and engineers work non-stop to discover a solution. Governments pool their resources and, in the end—usually at the very last possible second—humanity triumphs. This might seem like a Hollywood fantasy, but believe it or not, this movie plot is playing out in real life right now. No, we aren't facing an alien invasion or fighting off AI overlords, but the earth does face some pretty serious crises. And nations of the world are working together to develop technology to help address those problems.

For example, the LUMI supercomputer, located in Kajaani, Finland receives a portion of its funding from the European High-Performance Computing Joint Undertaking (EuroHPC JU), an effort that pools EU resources to create/provide exascale computing platforms. Additional funding comes from LUMI consortium countries, which include Finland, Belgium, Czech Republic, Denmark, Estonia, Iceland, the Netherlands, Norway, Poland, Sweden and Switzerland. According to the Top500 list published in November 2024, LUMI is the 8th fastest supercomputer in the world and the fastest supercomputer in Europe. The final configuration of the LUMI supercomputer can sustain 380 petaflops of performance, which is roughly the equivalent of 1.5 million high-end laptops. It's based on the HPE Cray EX platform with AMD EPYC CPUs and AMD Instinct MI250X GPUs. According to the Green500 list, LUMI is also the world's 25th most energy efficient supercomputer. It runs on 100% hydropower and the waste heat from the facility is recaptured to heat about 100 homes in Kajaani.

AMD Announces Press Conference & Livestream at Computex 2025

AMD today announced that it will be hosting a press conference during Computex 2025. The in-person and livestreamed press conference will take place on Wednesday, May 21, 2025, at 11 a.m. UTC+8, Taipei, at the Grand Hyatt, Taipei. The event will showcase the advancements AMD has driven with AI in gaming, PCs and professional workloads.

AMD senior vice president and general manager of the Computing and Graphics Group Jack Huynh, along with industry partners, will discuss how AMD is expanding its leadership across gaming, workstations, and AI PCs, and highlight the breadth of the company's high-performance computing and AI product portfolio. The livestream will start at 8 p.m. PT/11 p.m. ET on Tuesday, May 20 on AMD.com, with replay available after the conclusion of the livestream event.

AMD Announces Advancing AI 2025

Today, AMD (NASDAQ: AMD) announced "Advancing AI 2025," an in-person and livestreamed event on June 12, 2025. The industry event will showcase the company's bold vision for AI, announce the next generation of AMD Instinct GPUs, AMD ROCm open software ecosystem progress, and reveal details on AI solutions for hyperscalers, enterprises, developers, startups and more. AMD executives and AI ecosystem partners, customers and developers will join Chair and CEO Dr. Lisa Su to discuss how AMD products and software are re-shaping the AI and high-performance computing landscape. The live stream will start at 9:30 a.m. PT on Thursday, June 12.

MangoBoost Achieves Record-Breaking MLPerf Inference v5.0 Results with AMD Instinct MI300X

MangoBoost, a provider of cutting-edge system solutions designed to maximize AI data center efficiency, has set a new industry benchmark with its latest MLPerf Inference v5.0 submission. The company's Mango LLMBoost AI Enterprise MLOps software has demonstrated unparalleled performance on AMD Instinct MI300X GPUs, delivering the highest-ever recorded results for Llama2-70B in the offline inference category. This milestone marks the first-ever multi-node MLPerf inference result on AMD Instinct MI300X GPUs. By harnessing the power of 32 MI300X GPUs across four server nodes, Mango LLMBoost has surpassed all previous MLPerf inference results, including those from competitors using NVIDIA H100 GPUs.

Unmatched Performance and Cost Efficiency
MangoBoost's MLPerf submission demonstrates a 24% performance advantage over the best-published MLPerf result from Juniper Networks utilizing 32 NVIDIA H100 GPUs. Mango LLMBoost achieved 103,182 tokens per second (TPS) in the offline scenario and 93,039 TPS in the server scenario on AMD MI300X GPUs, outperforming the previous best result of 82,749 TPS on NVIDIA H100 GPUs. In addition to superior performance, Mango LLMBoost + MI300X offers significant cost advantages. With AMD MI300X GPUs priced between $15,000 and $17,000—compared to the $32,000-$40,000 cost of NVIDIA H100 GPUs (source: Tom's Hardware—H100 vs. MI300X Pricing)—Mango LLMBoost delivers up to 62% cost savings while maintaining industry-leading inference throughput.

MLCommons Releases New MLPerf Inference v5.0 Benchmark Results

Today, MLCommons announced new results for its industry-standard MLPerf Inference v5.0 benchmark suite, which delivers machine learning (ML) system performance benchmarking in an architecture-neutral, representative, and reproducible manner. The results highlight that the AI community is focusing much of its attention and efforts on generative AI scenarios, and that the combination of recent hardware and software advances optimized for generative AI have led to dramatic performance improvements over the past year.

The MLPerf Inference benchmark suite, which encompasses both datacenter and edge systems, is designed to measure how quickly systems can run AI and ML models across a variety of workloads. The open-source and peer-reviewed benchmark suite creates a level playing field for competition that drives innovation, performance, and energy efficiency for the entire industry. It also provides critical technical information for customers who are procuring and tuning AI systems. This round of MLPerf Inference results also includes tests for four new benchmarks: Llama 3.1 405B, Llama 2 70B Interactive for low-latency applications, RGAT, and Automotive PointPainting for 3D object detection.

Advantech Launches Next-Gen Edge AI Solutions Powered by the AMD Compute Portfolio

A global leader in intelligent IoT systems and embedded platforms, is excited to introduce its latest AIR series Edge AI systems, powered by the comprehensive AMD compute portfolio. These next-generation solutions leverage AMD Ryzen and EPYC processors alongside Instinct MI210 accelerators and Radeon PRO GPUs, delivering exceptional AI computing performance for demanding edge applications.

"Advantech and AMD continue to strengthen our collaboration in the Edge AI era, integrating advanced CPU platforms with high-performance AI accelerators and GPU solutions," said Aaron Su, Vice President of Advantech Embedded IoT Group. "This joint effort enables cutting-edge computing power to meet the demands of the rapidly evolving embedded AI applications.

Oracle Plans to Use 30,000 AMD Instinct MI355X GPUs for AI Cloud

AMD's Instinct MI355X accelerators for AI workloads are gaining traction, and Oracle just became one of the bigger customers. According to Oracle's latest financial results, the company noted that it had acquired 30,000 AMD Instinct MI355X accelerators. "In Q3, we signed a multi billion dollar contract with AMD to build a cluster of 30,000 of their latest MI355X GPUs," noted Larry Ellison, adding that "And all four of the leading cloud security companies, CrowdStrike, Cyber Reason, Newfold Digital and Palo Alto, they all decided to move to the Oracle Cloud. But perhaps most importantly, Oracle has developed a new product called the AI data platform that enables our huge install base of database customers to use the latest AI models from OpenAI, XAI and Meta to analyze all of the data they have stored in their millions of existing Oracle databases. By using Oracle version 23 AI's vector capabilities, customers can automatically put all of their existing data into the vector format that is understood by AI models. This allows those AI models to learn, understand and analyze every aspect of your company or government agency, instantly unlocking the value in your data while keeping your data private and secure."

AMD's Instinct MI355X accelerator introduces the CDNA4 architecture on TSMC's N3 process node with a focus on AI workload acceleration. The chiplet-based GPU delivers 2.3 petaflops of FP16 compute and 4.6 petaflops of FP8 compute, marking a 77% performance increase over the MI300X series. The MI355X's key advancement comes through support for reduced-precision FP4 and FP6 numerical formats, enabling up to 9.2 petaflops of FP4 compute. Memory specifications include 288 GB of HBM3E across eight stacks, providing 8 TB/s of total bandwidth. Production timelines place the MI355X's market entry in the second half of 2025, continuing AMD's annual cadence for data center GPU launches. By second half, Oracle will likely prepare data center space for these GPUs and just power them on once AMD ships these accelerators.

AMD Instinct MI400 to Include new Dedicated Multimedia IO Die

AMD's upcoming Instinct MI400 accelerator series, scheduled for 2026 introduction, is set to incorporate a new Multimedia IO Die (MID) architecture alongside significant compute density improvements. According to recent patches discovered in AMD-GFX mailing lists, the accelerator will feature a dual Active Interposer Die (AID) design, with each AID housing four Accelerated Compute Dies (XCDs)—doubling the XCD count per AID compared to the current MI300 series. Introducing dedicated Multimedia IO Dies is a new entry in AMD's accelerator design philosophy. Documentation reveals support for up to two MIDs, with each likely paired to an AID, suggesting a more specialized approach to multimedia processing and interface management.

Specifications from the Register Remapping Table (RRMT) implementation indicate sophisticated die-to-die communication pathways, with support for local and remote transactions across XCDs, AIDs, and the new MIDs. The system enables granular control over eight potential XCD configurations (XCD0 through XCD7), suggesting that AMD can scale compute up and down with SKUs. While AMD has yet to release detailed specifications for the MI400 series, separating multimedia functions into dedicated dies could optimize performance and power efficiency. As the 2026 launch window approaches, AMD will spend the remaining time refining the software stack and ROCm support for its next-generation accelerator based on UDNA architecture. Since designing an accelerator is a year-long effort from the physical implementation standpoint, we expect the Instinct MI400 design to be finalized by now. All left is silicon bring-up, software optimization, and mass production, likely at TSMC's facilities.

AMD Believes EPYC CPUs & Instinct GPUs Will Accelerate AI Advancements

If you're looking for innovative use of AI technology, look to the cloud. Gartner reports, "73% of respondents to the 2024 Gartner CIO and Tech Executive Survey have increased funding for AI." And IDC says that AI: "will have a cumulative global economic impact of $19.9 trillion through 2030." But end users aren't running most of those AI workloads on their own hardware. Instead, they are largely relying on cloud service providers and large technology companies to provide the infrastructure for their AI efforts. This approach makes sense since most organizations are already heavily reliant the cloud. According to O'Reilly, more than 90% of companies are using public cloud services. And they aren't moving just a few workloads to the cloud. That same report shows a 175% growth in cloud-native interest, indicating that companies are committing heavily to the cloud.

As a result of this demand for infrastructure to power AI initiatives, cloud service providers are finding it necessary to rapidly scale up their data centers. IDC predicts: "the surging demand for AI workloads will lead to a significant increase in datacenter capacity, energy consumption, and carbon emissions, with AI datacenter capacity projected to have a compound annual growth rate (CAGR) of 40.5% through 2027." While this surge creates massive opportunities for service providers, it also introduces some challenges. Providing the computing power necessary to support AI initiatives at scale, reliably and cost-effectively is difficult. Many providers have found that deploying AMD EPYC CPUs and Instinct GPUs can help them overcome those challenges. Here's a quick look at three service providers who are using AMD chips to accelerate AI advancements.

AMD Releases ROCm 6.3 with SGLang, Fortran Compiler, Multi-Node FFT, Vision Libraries, and More

AMD has released the new ROCm 6.3 version which introduces several new features and optimizations, including SGLang integration for accelerated AI inferencing, a re-engineered FlashAttention-2 for optimized AI training and inference, the introduction of multi-node Fast Fourier Transform (FFT), new Fortran compiler, and enhanced computer vision libraries like rocDecode, rocJPEG, and rocAL.

According to AMD, the SGLang, a runtime that is now supported by ROCm 6.3, is purpose-built for optimizing inference on models like LLMs and VLMs on AMD Instinct GPUs, and promises 6x higher throughput and much easier usage thanks to Python-integrated and pre-configured ROCm Docker containers. In addition, the AMD ROCm 6.3 also brings further transformer optimizations with FlashAttention-2, which should bring significant improvements in forward and backward pass compared to FlashAttention-1, a whole new AMD Fortran compiler with direct GPU offloading, backward compatibility, and integration with HIP Kernels and ROCm libraries, a whole new multi-node FFT support in rocFFT, which simplifies multi-node scaling and improved scalability, as well as enhanced computer vision libraries, rocDecode, rocJPEG, and rocAL, for AV1 codec support, GPU-accelerated JPEG decoding, and better audio augmentation.

TOP500: El Capitan Achieves Top Spot, Frontier and Aurora Follow Behind

The 64th edition of the TOP500 reveals that El Capitan has achieved the top spot and is officially the third system to reach exascale computing after Frontier and Aurora. Both systems have since moved down to No. 2 and No. 3 spots, respectively. Additionally, new systems have found their way onto the Top 10.

The new El Capitan system at the Lawrence Livermore National Laboratory in California, U.S.A., has debuted as the most powerful system on the list with an HPL score of 1.742 EFlop/s. It has 11,039,616 combined CPU and GPU cores and is based on AMD 4th generation EPYC processors with 24 cores at 1.8 GHz and AMD Instinct MI300A accelerators. El Capitan relies on a Cray Slingshot 11 network for data transfer and achieves an energy efficiency of 58.89 GigaFLOPS/watt. This power efficiency rating helped El Capitan achieve No. 18 on the GREEN500 list as well.

ASUS Presents All-New Storage-Server Solutions to Unleash AI Potential at SC24

ASUS today announced its groundbreaking next-generation infrastructure solutions at SC24, featuring a comprehensive lineup powered by AMD and Intel, as well as liquid-cooling solutions designed to accelerate the future of AI. By continuously pushing the limits of innovation, ASUS simplifies the complexities of AI and high-performance computing (HPC) through adaptive server solutions paired with expert cooling and software-development services, tailored for the exascale era and beyond. As a total-solution provider with a distinguished history in pioneering AI supercomputing, ASUS is committed to delivering exceptional value to its customers.

Comprehensive line-up for AI and HPC success
To fuel enterprise digital transformation through HPC and AI-driven architecture, ASUS provides a full lineup of server systems that powered by AMD and Intel. Startups, research institutions, large enterprises or government organizations all could find the adaptive solutions to unlock value and accelerate business agility from the big data.
Return to Keyword Browsing
Jul 14th, 2025 22:32 CDT change timezone

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts