News Posts matching #H100

Return to Keyword Browsing

AMD Instinct MI300X GPUs Featured in LaminiAI LLM Pods

LaminiAI appears to be one of AMD's first customers to receive a bulk order of Instinct MI300X GPUs—late last week, Sharon Zhou (CEO and co-founder) posted about the "next batch of LaminiAI LLM Pods" up and running with Team Red's cutting-edge CDNA 3 series accelerators inside. Her short post on social media stated: "rocm-smi...like freshly baked bread, 8x MI300X is online—if you're building on open LLMs and you're blocked on compute, lmk. Everyone should have access to this wizard technology called LLMs."

An attached screenshot of a ROCm System Management Interface (ROCm SMI) session showcases an individual Pod configuration sporting eight Instinct MI300X GPUs. According to official blog entries, LaminiAI has utilized bog-standard MI300 accelerators since 2023, so it is not surprising to see their partnership continue to grow with AMD. Industry predictions have the Instinct MI300X and MI300A models placed as great alternatives to NVIDIA's dominant H100 "Hopper" series—AMD stock is climbing due to encouraging financial analyst estimations.

Meta Will Acquire 350,000 H100 GPUs Worth More Than 10 Billion US Dollars

Mark Zuckerberg has shared some interesting insights about Meta's AI infrastructure buildout, which is on track to include an astonishing number of NVIDIA H100 Tensor GPUs. In the post on Instagram, Meta's CEO has noted the following: "We're currently training our next-gen model Llama 3, and we're building massive compute infrastructure to support our future roadmap, including 350k H100s by the end of this year -- and overall almost 600k H100s equivalents of compute if you include other GPUs." That means that the company will enhance its AI infrastructure with 350,000 H100 GPUs on top of the existing GPUs, which is equivalent to 250,000 H100 in terms of computing power, for a total of 600,000 H100-equivalent GPUs.

The raw number of GPUs installed comes at a steep price. With the average selling price of H100 GPU nearing 30,000 US dollars, Meta's investment will settle the company back around $10.5 billion. Other GPUs should be in the infrastructure, but most will comprise the NVIDIA Hopper family. Additionally, Meta is currently training the LLama 3 AI model, which will be much more capable than the existing LLama 2 family and will include better reasoning, coding, and math-solving capabilities. These models will be open-source. Later down the pipeline, as the artificial general intelligence (AGI) comes into play, Zuckerberg has noted that "Our long term vision is to build general intelligence, open source it responsibly, and make it widely available so everyone can benefit." So, expect to see these models in the GitHub repositories in the future.

Indian Client Purchases Additional $500 Million Batch of NVIDIA AI GPUs

Indian data center operator Yotta is reportedly set to spend big with another placed with NVIDIA—a recent Reuters article outlines a $500 million purchase of Team Green AI GPUs. Yotta is in the process of upgrading its AI Cloud infrastructure, and their total tally for this endeavor (involving Hopper and newer Grace Hopper models) is likely to hit $1 billion. An official company statement from December confirmed the existence of an extra procurement of GPUs, but they did not provide any details regarding budget or hardware choices at that point in time. Reuters contacted Sunil Gupta, Yotta's CEO, last week for a comment on the situation. The co-founder elaborated: "that the order would comprise nearly 16,000 of NVIDIA's artificial intelligence chips H100 and GH200 and will be placed by March 2025."

Team Green is ramping up its embrace of the Indian data center market, as US sanctions have made it difficult to conduct business with enterprise customers in nearby Chinese territories. Reuters state that Gupta's firm (Yotta) is: "part of Indian billionaire Niranjan Hiranandani's real estate group, (in turn) a partner firm for NVIDIA in India and runs three data centre campuses, in Mumbai, Gujarat and near New Delhi." Microsoft, Google and Amazon are investing heavily in cloud and data centers situated in India. Shankar Trivedi, an NVIDIA executive, recently attended Vibrant Gujarat Global Summit—the article's reporter conducted a brief interview with him. Trivedi stated that Yotta is targeting a March 2024 start for a new NVIDIA-powered AI data center located in the region's tech hub: Gujarat International Finance Tec-City.

TSMC Plans to Put a Trillion Transistors on a Single Package by 2030

During the recent IEDM conference, TSMC previewed its process roadmap for delivering next-generation chip packages packing over one trillion transistors by 2030. This aligns with similar long-term visions from Intel. Such enormous transistor counts will come through advanced 3D packaging of multiple chipsets. But TSMC also aims to push monolithic chip complexity higher, ultimately enabling 200 billion transistor designs on a single die. This requires steady enhancement of TSMC's planned N2, N2P, N1.4, and N1 nodes, which are slated to arrive between now and the end of the decade. While multi-chipset architectures are currently gaining favor, TSMC asserts both packaging density and raw transistor density must scale up in tandem. Some perspective on the magnitude of TSMC's goals include NVIDIA's 80 billion transistor GH100 GPU—among today's largest chips, excluding wafer-scale designs from Cerebras.

Yet TSMC's roadmap calls for more than doubling that, first with over 100 billion transistor monolithic designs, then eventually 200 billion. Of course, yields become more challenging as die sizes grow, which is where advanced packaging of smaller chiplets becomes crucial. Multi-chip module offerings like AMD's MI300X and Intel's Ponte Vecchio already integrate dozens of tiles, with PVC having 47 tiles. TSMC envisions this expansion to chip packages housing more than a trillion transistors via its CoWoS, InFO, 3D stacking, and many other technologies. While the scaling cadence has recently slowed, TSMC remains confident in achieving both packaging and process breakthroughs to meet future density demands. The foundry's continuous investment ensures progress in unlocking next-generation semiconductor capabilities. But physics ultimately dictates timelines, no matter how aggressive the roadmap.

China Continues to Enhance AI Chip Self-Sufficiency, but High-End AI Chip Development Remains Constrained

Huawei's subsidiary HiSilicon has made significant strides in the independent R&D of AI chips, launching the next-gen Ascend 910B. These chips are utilized not only in Huawei's public cloud infrastructure but also sold to other Chinese companies. This year, Baidu ordered over a thousand Ascend 910B chips from Huawei to build approximately 200 AI servers. Additionally, in August, Chinese company iFlytek, in partnership with Huawei, released the "Gemini Star Program," a hardware and software integrated device for exclusive enterprise LLMs, equipped with the Ascend 910B AI acceleration chip, according to TrendForce's research.

TrendForce conjectures that the next-generation Ascend 910B chip is likely manufactured using SMIC's N+2 process. However, the production faces two potential risks. Firstly, as Huawei recently focused on expanding its smartphone business, the N+2 process capacity at SMIC is almost entirely allocated to Huawei's smartphone products, potentially limiting future capacity for AI chips. Secondly, SMIC remains on the Entity List, possibly restricting access to advanced process equipment.

Dell Partners with Imbue on New AI Compute Cluster Using Nearly 10,000 NVIDIA H100 GPUs

Dell Technologies and Imbue, an independent AI research company, have entered into a $150 million agreement to build a new high-performance computing cluster for training foundation models optimized for reasoning. Imbue is one of the few independent AI labs that develops its own foundation models, and trains them to have more advanced reasoning capabilities—like knowing when to ask for more information, analyzing and critiquing their own outputs, or breaking down a difficult goal into a plan and then executing on it. Imbue trains AI agents on top of those models that can do work for people across diverse fields in ways that are robust, safe, and useful. Imbue's goal is to create practical tools for building agents that could enable workers across a broad set of domains, including helping engineers write new code, analysts understand and draft complex policy proposals, and much more.

AWS and NVIDIA Partner to Deliver 65 ExaFLOP AI Supercomputer, Other Solutions

Amazon Web Services, Inc. (AWS), an Amazon.com, Inc. company (NASDAQ: AMZN), and NVIDIA (NASDAQ: NVDA) today announced an expansion of their strategic collaboration to deliver the most-advanced infrastructure, software and services to power customers' generative artificial intelligence (AI) innovations. The companies will bring together the best of NVIDIA and AWS technologies—from NVIDIA's newest multi-node systems featuring next-generation GPUs, CPUs and AI software, to AWS Nitro System advanced virtualization and security, Elastic Fabric Adapter (EFA) interconnect, and UltraCluster scalability—that are ideal for training foundation models and building generative AI applications.

The expanded collaboration builds on a longstanding relationship that has fueled the generative AI era by offering early machine learning (ML) pioneers the compute performance required to advance the state-of-the-art in these technologies.

Manufacturers Anticipate Completion of NVIDIA's HBM3e Verification by 1Q24; HBM4 Expected to Launch in 2026

TrendForce's latest research into the HBM market indicates that NVIDIA plans to diversify its HBM suppliers for more robust and efficient supply chain management. Samsung's HBM3 (24 GB) is anticipated to complete verification with NVIDIA by December this year. The progress of HBM3e, as outlined in the timeline below, shows that Micron provided its 8hi (24 GB) samples to NVIDIA by the end of July, SK hynix in mid-August, and Samsung in early October.

Given the intricacy of the HBM verification process—estimated to take two quarters—TrendForce expects that some manufacturers might learn preliminary HBM3e results by the end of 2023. However, it's generally anticipated that major manufacturers will have definite results by 1Q24. Notably, the outcomes will influence NVIDIA's procurement decisions for 2024, as final evaluations are still underway.

SK hynix Showcases Next-Gen AI and HPC Solutions at SC23

SK hynix presented its leading AI and high-performance computing (HPC) solutions at Supercomputing 2023 (SC23) held in Denver, Colorado between November 12-17. Organized by the Association for Computing Machinery and IEEE Computer Society since 1988, the annual SC conference showcases the latest advancements in HPC, networking, storage, and data analysis. SK hynix marked its first appearance at the conference by introducing its groundbreaking memory solutions to the HPC community. During the six-day event, several SK hynix employees also made presentations revealing the impact of the company's memory solutions on AI and HPC.

Displaying Advanced HPC & AI Products
At SC23, SK hynix showcased its products tailored for AI and HPC to underline its leadership in the AI memory field. Among these next-generation products, HBM3E attracted attention as the HBM solution meets the industry's highest standards of speed, capacity, heat dissipation, and power efficiency. These capabilities make it particularly suitable for data-intensive AI server systems. HBM3E was presented alongside NVIDIA's H100, a high-performance GPU for AI that uses HBM3 for its memory.

Microsoft Introduces 128-Core Arm CPU for Cloud and Custom AI Accelerator

During its Ignite conference, Microsoft introduced a duo of custom-designed silicon made to accelerate AI and excel in cloud workloads. First of the two is Microsoft's Azure Cobalt 100 CPU, a 128-core design that features a 64-bit Armv9 instruction set, implemented in a cloud-native design that is set to become a part of Microsoft's offerings. While there aren't many details regarding the configuration, the company claims that the performance target is up to 40% when compared to the current generation of Arm servers running on Azure cloud. The SoC has used Arm's Neoverse CSS platform customized for Microsoft, with presumably Arm Neoverse N2 cores.

The next and hottest topic in the server space is AI acceleration, which is needed for running today's large language models. Microsoft hosts OpenAI's ChatGPT, Microsoft's Copilot, and many other AI services. To help make them run as fast as possible, Microsoft's project Athena now has the name of Maia 100 AI accelerator, which is manufactured on TSMC's 5 nm process. It features 105 billion transistors and supports various MX data formats, even those smaller than 8-bit bit, for maximum performance. Currently tested on GPT 3.5 Turbo, we have yet to see performance figures and comparisons with competing hardware from NVIDIA, like H100/H200 and AMD, with MI300X. The Maia 100 has an aggregate bandwidth of 4.8 Terabits per accelerator, which uses a custom Ethernet-based networking protocol for scaling. These chips are expected to appear in Microsoft data centers early next year, and we hope to get some performance numbers soon.

TOP500 Update: Frontier Remains No.1 With Aurora Coming in at No. 2

The 62nd edition of the TOP500 reveals that the Frontier system retains its top spot and is still the only exascale machine on the list. However, five new or upgraded systems have shaken up the Top 10.

Housed at the Oak Ridge National Laboratory (ORNL) in Tennessee, USA, Frontier leads the pack with an HPL score of 1.194 EFlop/s - unchanged from the June 2023 list. Frontier utilizes AMD EPYC 64C 2GHz processors and is based on the latest HPE Cray EX235a architecture. The system has a total of 8,699,904 combined CPU and GPU cores. Additionally, Frontier has an impressive power efficiency rating of 52.59 GFlops/watt and relies on HPE's Slingshot 11 network for data transfer.

Supermicro Expands AI Solutions with the Upcoming NVIDIA HGX H200 and MGX Grace Hopper Platforms Featuring HBM3e Memory

Supermicro, Inc., a Total IT Solution Provider for AI, Cloud, Storage, and 5G/Edge, is expanding its AI reach with the upcoming support for the new NVIDIA HGX H200 built with H200 Tensor Core GPUs. Supermicro's industry leading AI platforms, including 8U and 4U Universal GPU Systems, are drop-in ready for the HGX H200 8-GPU, 4-GPU, and with nearly 2x capacity and 1.4x higher bandwidth HBM3e memory compared to the NVIDIA H100 Tensor Core GPU. In addition, the broadest portfolio of Supermicro NVIDIA MGX systems supports the upcoming NVIDIA Grace Hopper Superchip with HBM3e memory. With unprecedented performance, scalability, and reliability, Supermicro's rack scale AI solutions accelerate the performance of computationally intensive generative AI, large language Model (LLM) training, and HPC applications while meeting the evolving demands of growing model sizes. Using the building block architecture, Supermicro can quickly bring new technology to market, enabling customers to become more productive sooner.

Supermicro is also introducing the industry's highest density server with NVIDIA HGX H100 8-GPUs systems in a liquid cooled 4U system, utilizing the latest Supermicro liquid cooling solution. The industry's most compact high performance GPU server enables data center operators to reduce footprints and energy costs while offering the highest performance AI training capacity available in a single rack. With the highest density GPU systems, organizations can reduce their TCO by leveraging cutting-edge liquid cooling solutions.

GIGABYTE Demonstrates the Future of Computing at Supercomputing 2023 with Advanced Cooling and Scaled Data Centers

GIGABYTE Technology, Giga Computing, a subsidiary of GIGABYTE and an industry leader in high-performance servers, server motherboards, and workstations, continues to be a leader in cooling IT hardware efficiently and in developing diverse server platforms for Arm and x86 processors, as well as AI accelerators. At SC23, GIGABYTE (booth #355) will showcase some standout platforms, including for the NVIDIA GH200 Grace Hopper Superchip and next-gen AMD Instinct APU. To better introduce its extensive lineup of servers, GIGABYTE will address the most important needs in supercomputing data centers, such as how to cool high-performance IT hardware efficiently and power AI that is capable of real-time analysis and fast time to results.

Advanced Cooling
For many data centers, it is becoming apparent that their cooling infrastructure must radically shift to keep pace with new IT hardware that continues to generate more heat and requires rapid heat transfer. Because of this, GIGABYTE has launched advanced cooling solutions that allow IT hardware to maintain ideal performance while being more energy-efficient and maintaining the same data center footprint. At SC23, its booth will have a single-phase immersion tank, the A1P0-EA0, which offers a one-stop immersion cooling solution. GIGABYTE is experienced in implementing immersion cooling with immersion-ready servers, immersion tanks, oil, tools, and services spanning the globe. Another cooling solution showcased at SC23 will be direct liquid cooling (DLC), and in particular, the new GIGABYTE cold plates and cooling modules for the NVIDIA Grace CPU Superchip, NVIDIA Grace Hopper Superchip, AMD EPYC 9004 processor, and 4th Gen Intel Xeon processor.

ASUS Demonstrates AI and Immersion-Cooling Solutions at SC23

ASUS today announced a showcase of the latest AI solutions to empower innovation and push the boundaries of supercomputing, at Supercomputing 2023 (SC23) in Denver, Colorado, from 12-17 November, 2023. ASUS will demonstrate the latest AI advances, including generative-AI solutions and sustainability breakthroughs with Intel, to deliver the latest hybrid immersion-cooling solutions, plus lots more - all at booth number 257.

At SC23, ASUS will showcase the latest NVIDIA-qualified ESC N8A-E12 HGX H100 eight-GPU server empowered by dual-socket AMD EPYC 9004 processors and is designed for enterprise-level generative AI with market-leading integrated capabilities. Related to NVIDIA announcement on the latest NVIDIA H200 Tensor Core GPU at SC23, which is the first GPU to offer HBM3E for faster, larger memory to fuel the acceleration of generative AI and large language models, ASUS will offer an update of H100-based system with an H200-based drop-in replacement in 2024.

NVIDIA Turbocharges Generative AI Training in MLPerf Benchmarks

NVIDIA's AI platform raised the bar for AI training and high performance computing in the latest MLPerf industry benchmarks. Among many new records and milestones, one in generative AI stands out: NVIDIA Eos - an AI supercomputer powered by a whopping 10,752 NVIDIA H100 Tensor Core GPUs and NVIDIA Quantum-2 InfiniBand networking - completed a training benchmark based on a GPT-3 model with 175 billion parameters trained on one billion tokens in just 3.9 minutes. That's a nearly 3x gain from 10.9 minutes, the record NVIDIA set when the test was introduced less than six months ago.

The benchmark uses a portion of the full GPT-3 data set behind the popular ChatGPT service that, by extrapolation, Eos could now train in just eight days, 73x faster than a prior state-of-the-art system using 512 A100 GPUs. The acceleration in training time reduces costs, saves energy and speeds time-to-market. It's heavy lifting that makes large language models widely available so every business can adopt them with tools like NVIDIA NeMo, a framework for customizing LLMs. In a new generative AI test ‌this round, 1,024 NVIDIA Hopper architecture GPUs completed a training benchmark based on the Stable Diffusion text-to-image model in 2.5 minutes, setting a high bar on this new workload. By adopting these two tests, MLPerf reinforces its leadership as the industry standard for measuring AI performance, since generative AI is the most transformative technology of our time.

GIGABYTE Announces New Direct Liquid Cooling (DLC) Multi-Node Servers Ahead of SC23

GIGABYTE Technology, Giga Computing, a subsidiary of GIGABYTE and an industry leader in high-performance servers, server motherboards, and workstations, today announced direct liquid cooling (DLC) multi-node servers for NVIDIA Grace CPU & NVIDIA Grace Hopper Superchip. In addition, a DLC ready Intel-based server for the NVIDIA HGX H100 8-GPU platform and a high-density server for AMD EPYC 9004 processors. For the ultimate in efficiency, is also a new 12U single-phase immersion tank. All these mentioned products will be at GIGABYTE booth #355 at SC23.

Just announced high-density CPU servers include Intel Xeon-based H263-S63-LAN1 and AMD EPYC-based H273-Z80-LAN1. These 2U 4 node servers employ DLC for all eight CPUs, and although it is dense computing CPU performance achieves its full potential. In August, GIGABYTE announced new servers for NVIDIA HGX H100 GPU, and now adds the DLC version to the G593 series, G593-SD0-LAX1, for NVIDIA HGX H100 8-GPU.

NVIDIA NeMo: Designers Tap Generative AI for a Chip Assist

A research paper released this week describes ways generative AI can assist one of the most complex engineering efforts: designing semiconductors. The work demonstrates how companies in highly specialized fields can train large language models (LLMs) on their internal data to build assistants that increase productivity.

Few pursuits are as challenging as semiconductor design. Under a microscope, a state-of-the-art chip like an NVIDIA H100 Tensor Core GPU (above) looks like a well-planned metropolis, built with tens of billions of transistors, connected on streets 10,000x thinner than a human hair. Multiple engineering teams coordinate for as long as two years to construct one of these digital mega cities. Some groups define the chip's overall architecture, some craft and place a variety of ultra-small circuits, and others test their work. Each job requires specialized methods, software programs and computer languages.

Alphacool Unveils Single-Slot ES H100 80GB HBM PCIe Water Block for NVIDIA H100 GPU

Alphacool expands the portfolio of the Enterprise Solutions series for GPU water coolers and presents the new ES H100 80 GB HBM PCIe. In order to dissipate the enormous waste heat of this GPU generation in the best possible way, the cooler is positioned close to the components to be cooled in an exemplary manner. Alphacool uses only copper for all water-bearing parts. Copper has almost twice the thermal conductivity of aluminium, making it a much better choice of material for a water cooler. The fully chrome plated copper base makes it resistant to acids, scratches and damage. The matte carbon finish gives the cooler a classy appearance. At the same time, this makes it interesting for private users who want to do without aRGB lighting.

The water cooler is specially designed for use in narrow server cases. To save space in width and height, the connections have been moved to the back. This also allows for easier hosing inside the server rack. Thanks to the very compact design, only 1 slot is required for mounting the cooler in the server rack instead of 1.5 slots as before. This further reduction of required space is a strong argument for the use of the ES Copper/Carbon GPU watercooler. Smart and efficient.

U.S. Restricts Exports of NVIDIA GeForce RTX 4090 to China

The GeForce RTX 4090 gaming graphics card, both as an NVIDIA first-party Founders Edition, and custom-design by AIC partners, undergoes assembly in China. A new U.S. Government trade regulation restricts NVIDIA from selling it in the Chinese domestic market. The enthusiast-segment graphics card joins several other high performance AI processors, such as the "Hopper" H800, and "Ampere" A800. If you recall, the H800 and A800 are special China-specific variants of the H100 and A100, respectively, which come with performance reductions at the hardware-level, to fly below the AI processor performance limits set by the U.S. Government. The only reasons we can think of why these chips are on the list is if end-users in China have figured out ways around these performance limiters, or are buying in greater scale to achieve the desired performance. The fresh trade embargo released on October 17 covers the A100, A800, H100, H800, L40, L40S, and RTX 4090.

Samsung Notes: HBM4 Memory is Coming in 2025 with New Assembly and Bonding Technology

According to the editorial blog post published on the Samsung blog by SangJoon Hwang, Executive Vice President and Head of the DRAM Product & Technology Team at Samsung Electronics, we have information that High-Bandwidth Memory 4 (HBM4) is coming in 2025. In the recent timeline of HBM development, we saw the first appearance of HBM memory in 2015 with the AMD Radeon R9 Fury X. The second-generation HBM2 appeared with NVIDIA Tesla P100 in 2016, and the third-generation HBM3 saw the light of the day with NVIDIA Hopper GH100 GPU in 2022. Currently, Samsung has developed 9.8 Gbps HBM3E memory, which will start sampling to customers soon.

However, Samsung is more ambitious with development timelines this time, and the company expects to announce HBM4 in 2025, possibly with commercial products in the same calendar year. Interestingly, the HBM4 memory will have some technology optimized for high thermal properties, such as non-conductive film (NCF) assembly and hybrid copper bonding (HCB). The NCF is a polymer layer that enhances the stability of micro bumps and TSVs in the chip, so memory solder bump dies are protected from shock. Hybrid copper bonding is an advanced semiconductor packaging method that creates direct copper-to-copper connections between semiconductor components, enabling high-density, 3D-like packaging. It offers high I/O density, enhanced bandwidth, and improved power efficiency. It uses a copper layer as a conductor and oxide insulator instead of regular micro bumps to increase the connection density needed for HBM-like structures.

EK Fluid Works Enhances Portfolio with NVIDIA H100 GPU Integration

EK, the leading PC liquid cooling solutions provider, has expanded its hardware support for the EK Fluid Works systems by integrating the state-of-the-art NVIDIA H100 PCIe Tensor Core GPU. NVIDIA's latest release, acclaimed for its unprecedented performance, scalability, and security across diverse workloads, has discovered its ultimate home in EK Fluid Works servers and workstations.

Notably, EK's commitment to sustainability transforms these systems into eco-friendlier platforms, unlocking the full potential of Large Language Models (LLM), machine learning, and AI model training. EK Fluid Works systems emerge as the top choice for those seeking the unleashed power of NVIDIA H100 Tensor Core GPUs, offering an impressive array of efficiency benefits, including:
  • Unparalleled returns on investment
  • The lowest total cost of operation (TCO/OpEx)
  • Minimal additional capital expenditure (CapEx)

Microsoft to Unveil Custom AI Chips to Fight NVIDIA's Monopoly

According to sources close to The Information, Microsoft is supposed to unveil details about its upcoming custom silicon design for accelerating AI workloads. Allegedly, the incoming chip announcement is scheduled for November during Microsoft's annual Ignite conference. Held in Seattle from November 14 to 17, the conference is supposed to show all of the work that the company has been doing in the field of AI. The alleged launch of an AI chip will undoubtedly take center stage in the announcement, as the demand for AI accelerators has been so great that companies can't get their hands on GPUs. The sector is mainly dominated by NVIDIA, with its H100 and A100 GPUs powering most of the AI infrastructure worldwide.

With the launch of a custom AI chip codenamed Athena, Microsoft hopes to match or beat the performance of NVIDIA's offerings and reduce the cost of AI infrastructure. As the price of H100 GPU can get up to 30,000 US Dollars, building a data center filled with H100s can cost hundreds of millions. The cost could be winded down using homemade chips, and Microsoft could be less dependent on NVIDIA to provide the backbone of AI servers needed in the coming years. Nevertheless, we are excited to see what the company has prepared, and we will report on the Microsoft Ignite announcement in November.

NVIDIA Announces Collaboration with Anyscale

Large language model development is about to reach supersonic speed thanks to a collaboration between NVIDIA and Anyscale. At its annual Ray Summit developers conference, Anyscale—the company behind the fast growing open-source unified compute framework for scalable computing—announced today that it is bringing NVIDIA AI to Ray open source and the Anyscale Platform. It will also be integrated into Anyscale Endpoints, a new service announced today that makes it easy for application developers to cost-effectively embed LLMs in their applications using the most popular open source models.

These integrations can dramatically speed generative AI development and efficiency while boosting security for production AI, from proprietary LLMs to open models such as Code Llama, Falcon, Llama 2, SDXL and more. Developers will have the flexibility to deploy open-source NVIDIA software with Ray or opt for NVIDIA AI Enterprise software running on the Anyscale Platform for a fully supported and secure production deployment. Ray and the Anyscale Platform are widely used by developers building advanced LLMs for generative AI applications capable of powering intelligent chatbots, coding copilots and powerful search and summarization tools.

Intel Shows Strong AI Inference Performance

Today, MLCommons published results of its MLPerf Inference v3.1 performance benchmark for GPT-J, the 6 billion parameter large language model, as well as computer vision and natural language processing models. Intel submitted results for Habana Gaudi 2 accelerators, 4th Gen Intel Xeon Scalable processors, and Intel Xeon CPU Max Series. The results show Intel's competitive performance for AI inference and reinforce the company's commitment to making artificial intelligence more accessible at scale across the continuum of AI workloads - from client and edge to the network and cloud.

"As demonstrated through the recent MLCommons results, we have a strong, competitive AI product portfolio, designed to meet our customers' needs for high-performance, high-efficiency deep learning inference and training, for the complete spectrum of AI models - from the smallest to the largest - with leading price/performance." -Sandra Rivera, Intel executive vice president and general manager of the Data Center and AI Group

TSMC Prediction: AI Chip Supply Shortage to Last ~18 Months

TSMC Chairman Mark Liu was asked to comment on all things artificial intelligence-related at the SEMICON Taiwan 2023 industry event. According to a Nikkei Asia report, he foresees supply constraints lasting until the tail end of 2024: "It's not the shortage of AI chips. It's the shortage of our chip-on-wafer-on-substrate (COWOS) capacity...Currently, we can't fulfill 100% of our customers' needs, but we try to support about 80%. We think this is a temporary phenomenon. After our expansion of advanced chip packaging capacity, it should be alleviated in one and a half years." He cites a recent and very "sudden" spike in demand for COWOS, with numbers tripling within the span of a year. Market leader NVIDIA relies on TSMC's advanced packaging system—most notably with the production of highly-prized A100 and H100 series Tensor Core compute GPUs.

These issues are deemed a "temporary" problem—it could take around 18 months to eliminate production output "bottlenecks." TSMC is racing to bolster its native activities with new facilities—plans for a new $2.9 billion advanced chip packaging plant (in Miaoli County) were disclosed during summer time. Liu reckons that industry-wide innovation is necessary to meet growing demand through new methods to "connect, package and stack chips." Liu elaborated: "We are now putting together many chips into a tightly integrated massive interconnect system. This is a paradigm shift in semiconductor technology integration." The TSMC boss reckons that processing units fielding over one trillion transistors are viable within the next decade: "it's through packaging with multiple chips that this could be possible.".
Return to Keyword Browsing
May 21st, 2024 19:12 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts