News Posts matching #accelerator

Return to Keyword Browsing

"Jaguar Shores" is Intel's Successor to "Falcon Shores" Accelerator for AI and HPC

Intel has prepared "Jaguar Shores," its "next-next" generation AI and HPC accelerator, successor to its upcoming "Falcon Shores" GPU. Revealed during a technical workshop at the SC2024 conference, the chip was unveiled by Intel's Habana Labs division, albeit unintentionally. This announcement positions Jaguar Shores as the successor to Falcon Shores, which is scheduled to launch next year. While details about Jaguar Shores remain sparse, its designation suggests it could be a general-purpose GPU (GPGPU) aimed at both AI training, inferencing, and HPC tasks. Intel's strategy aligns with its push to incorporate advanced manufacturing nodes, such as the 18A process featuring RibbonFET and backside power delivery, which promise significant efficiency gains, so we can expect to see upcoming AI accelerators incorporating these technologies.

Intel's AI chip lineup has faced numerous challenges, including shifting plans for Falcon Shores, which has transitioned from a CPU-GPU hybrid to a standalone GPU, and cancellation of Ponte Vecchio. Despite financial constraints and job cuts, Intel has maintained its focus on developing cutting-edge AI solutions. "We continuously evaluate our roadmap to ensure it aligns with the evolving needs of our customers. While we don't have any new updates to share, we are committed to providing superior enterprise AI solutions across our CPU and accelerator/GPU portfolio." an Intel spokesperson stated. The announcement of Jaguar Shores shows Intel's determination to remain competitive. However, the company faces steep competition. NVIDIA and AMD continue to set benchmarks with performant designs, while Intel has struggled to capture a significant share of the AI training market. The company's Gaudi lineup ends with third generation, and Gaudi IP will get integrated into Falcon Shores.

ASUS Presents All-New Storage-Server Solutions to Unleash AI Potential at SC24

ASUS today announced its groundbreaking next-generation infrastructure solutions at SC24, featuring a comprehensive lineup powered by AMD and Intel, as well as liquid-cooling solutions designed to accelerate the future of AI. By continuously pushing the limits of innovation, ASUS simplifies the complexities of AI and high-performance computing (HPC) through adaptive server solutions paired with expert cooling and software-development services, tailored for the exascale era and beyond. As a total-solution provider with a distinguished history in pioneering AI supercomputing, ASUS is committed to delivering exceptional value to its customers.

Comprehensive Lineup for AI and HPC Success
To fuel enterprise digital transformation through HPC and AI-driven architecture, ASUS provides a full lineup of server systems that are powered by AMD and Intel. Startups, research institutions, large enterprises or government organizations all could find the adaptive solutions to unlock value and accelerate business agility from the big data.

IBM Expands Its AI Accelerator Offerings; Announces Collaboration With AMD

IBM and AMD have announced a collaboration to deploy AMD Instinct MI300X accelerators as a service on IBM Cloud. This offering, which is expected to be available in the first half of 2025, aims to enhance performance and power efficiency for Gen AI models such as and high-performance computing (HPC) applications for enterprise clients. This collaboration will also enable support for AMD Instinct MI300X accelerators within IBM's watsonx AI and data platform, as well as Red Hat Enterprise Linux AI inferencing support.

"As enterprises continue adopting larger AI models and datasets, it is critical that the accelerators within the system can process compute-intensive workloads with high performance and flexibility to scale," said Philip Guido, executive vice president and chief commercial officer, AMD. "AMD Instinct accelerators combined with AMD ROCm software offer wide support including IBM watsonx AI, Red Hat Enterprise Linux AI and Red Hat OpenShift AI platforms to build leading frameworks using these powerful open ecosystem tools. Our collaboration with IBM Cloud will aim to allow customers to execute and scale Gen AI inferencing without hindering cost, performance or efficiency."

TSMC Cuts Off Chinese Firm For Reportedly Shipping to Sanctioned Huawei

According to a recent Reuters report, TSMC has decided to cut off Chinese firm Sophgo following the discovery of TSMC-manufactured components in Huawei's advanced AI processor. The suspension came after technology research firm TechInsights identified a TSMC-manufactured chip within Huawei's Ascend 910B processor during a detailed analysis. This discovery raised significant concerns, as Huawei has been restricted from accessing such technology under US export controls since 2020. TSMC promptly notified US authorities upon learning of the situation and launched an internal investigation. While being sanctioned by the US, Huawei needed to use a proxy firm to get access to high-end silicon manufacturing to produce its Ascend accelerators.

Sophgo, which has ties to cryptocurrency mining equipment manufacturer Bitmain, strongly denies any business relationship with Huawei. The company states it has provided TSMC with a detailed investigation report asserting its compliance with all applicable laws, saying: "SOPHGO has never been engaged in any direct or indirect business relationship with Huawei. SOPHGO has been conducting business in strict compliance with applicable laws and regulations, including but not limited to all the applicable US national export control laws and regulations, and has never been in violation of any of such laws and regulations. SOPHGO has provided detailed investigation report to TSMC to prove that SOPHGO is not related to the Huawei investigation."

Arm and Partners Develop AI CPU: Neoverse V3 CSS Made on 2 nm Samsung GAA FET

Yesterday, Arm has announced significant progress in its Total Design initiative. The program, launched a year ago, aims to accelerate the development of custom silicon for data centers by fostering collaboration among industry partners. The ecosystem has now grown to include nearly 30 participating companies, with recent additions such as Alcor Micro, Egis, PUF Security, and SEMIFIVE. A notable development is a partnership between Arm, Samsung Foundry, ADTechnology, and Rebellions to create an AI CPU chiplet platform. This collaboration aims to deliver a solution for cloud, HPC, and AI/ML workloads, combining Rebellions' AI accelerator with ADTechnology's compute chiplet, implemented using Samsung Foundry's 2 nm Gate-All-Around (GAA) FET technology. The platform is expected to offer significant efficiency gains for generative AI workloads, with estimates suggesting a 2-3x improvement over the standard CPU design for LLMs like Llama3.1 with 405 billion parameters.

Arm's approach emphasizes the importance of CPU compute in supporting the complete AI stack, including data pre-processing, orchestration, and advanced techniques like Retrieval-augmented Generation (RAG). The company's Compute Subsystems (CSS) are designed to address these requirements, providing a foundation for partners to build diverse chiplet solutions. Several companies, including Alcor Micro and Alphawave, have already announced plans to develop CSS-powered chiplets for various AI and high-performance computing applications. The initiative also focuses on software readiness, ensuring that major frameworks and operating systems are compatible with Arm-based systems. Recent efforts include the introduction of Arm Kleidi technology, which optimizes CPU-based inference for open-source projects like PyTorch and Llama.cpp. Notably, as Google claims, most AI workloads are being inferenced on CPUs, so creating the most efficient and most performant CPUs for AI makes a lot of sense.

AMD Launches New Slim Form Factor Alveo UL3422 Accelerator Card

AMD today announced the AMD Alveo UL3422 accelerator card, the latest addition to its record-breaking family of accelerators1 designed for ultra-low latency electronic trading applications. AMD Alveo UL3422 provides trading firms, market makers and financial institutions with a slim form factor accelerator optimized for rack space, cost and designed for a fast path to deployment in a wide range of servers. The Alveo UL3422 accelerator is powered by an AMD Virtex UltraScale+ FPGA that features a novel transceiver architecture with hardened, optimized network connectivity cores, custom built for high-speed trading. It enables ultra-low latency trade execution, achieving less than 3ns FPGA transceiver latency and breakthrough 'tick-to-trade' performance not achievable with standard off-the-shelf FPGAs.

"Speed is the ultimate advantage in the increasingly competitive world of high-speed trading," said Yousef Khalilollahi, corporate vice president & general manager, Adaptive Computing Group, AMD. "The Alveo UL3422 card provides a lower-cost entry point while still delivering cutting-edge latency performance, making it accessible to firms of all sizes that want to stay competitive in the ultra-low latency trading space."

HP Launches HPE ProLiant Compute XD685 Servers Powered by 5th Gen AMD EPYC Processors and AMD Instinct MI325X Accelerators

Hewlett Packard Enterprise today announced the HPE ProLiant Compute XD685 for complex AI model training tasks, powered by 5th Gen AMD EPYC processors and AMD Instinct MI325X accelerators. The new HPE system is optimized to quickly deploy high-performing, secure and energy-efficient AI clusters for use in large language model training, natural language processing and multi-modal training.

The race is on to unlock the promise of AI and its potential to dramatically advance outcomes in workforce productivity, healthcare, climate sciences and much more. To capture this potential, AI service providers, governments and large model builders require flexible, high-performance solutions that can be brought to market quickly.

AMD Launches Instinct MI325X Accelerator for AI Workloads: 256 GB HBM3E Memory and 2.6 PetaFLOPS FP8 Compute

During its "Advancing AI" conference today, AMD has updated its AI accelerator portfolio with the Instinct MI325X accelerator, designed to succeed its MI300X predecessor. Built on the CDNA 3 architecture, Instinct MI325X brings a suite of improvements over the old SKU. Now, the MI325X features 256 GB of HBM3E memory running at 6 TB/s bandwidth. The capacity memory alone is a 1.8x improvement over the old MI300 SKU, which features 192 GB of regular HBM3 memory. Providing more memory capacity is crucial as upcoming AI workloads are training models with parameter counts measured in trillions, as opposed to billions with current models we have today. When it comes to compute resources, the Instinct MI325X provides 1.3 PetaFLOPS at FP16 and 2.6 PetaFLOPS at FP8 training and inference. This represents a 1.3x improvement over the Instinct MI300.

A chip alone is worthless without a good platform, and AMD decided to make the Instinct MI325X OAM modules a drop-in replacement for the current platform designed for MI300X, as they are both pin-compatible. In systems packing eight MI325X accelerators, there are 2 TB of HBM3E memory running at 48 TB/s memory bandwidth. Such a system achieves 10.4 PetaFLOPS of FP16 and 20.8 PetaFLOPS of FP8 compute performance. The company uses NVIDIA's H200 HGX as reference claims for its performance competitiveness, where the company claims that the Instinct MI325X outperforms NVIDIA H200 HGX system by 1.3x across the board in memory bandwidth, FP16 / FP8 compute performance and 1.8x in memory capacity.

AMD Instinct MI300X Accelerators Available on Oracle Cloud Infrastructure

AMD today announced that Oracle Cloud Infrastructure (OCI) has chosen AMD Instinct MI300X accelerators with ROCm open software to power its newest OCI Compute Supercluster instance called BM.GPU.MI300X.8. For AI models that can comprise hundreds of billions of parameters, the OCI Supercluster with AMD MI300X supports up to 16,384 GPUs in a single cluster by harnessing the same ultrafast network fabric technology used by other accelerators on OCI. Designed to run demanding AI workloads including large language model (LLM) inference and training that requires high throughput with leading memory capacity and bandwidth, these OCI bare metal instances have already been adopted by companies including Fireworks AI.

"AMD Instinct MI300X and ROCm open software continue to gain momentum as trusted solutions for powering the most critical OCI AI workloads," said Andrew Dieckmann, corporate vice president and general manager, Data Center GPU Business, AMD. "As these solutions expand further into growing AI-intensive markets, the combination will benefit OCI customers with high performance, efficiency, and greater system design flexibility."

SK hynix Presents Upgraded AiMX Solution at AI Hardware and Edge AI Summit 2024

SK hynix unveiled an enhanced Accelerator-in-Memory based Accelerator (AiMX) card at the AI Hardware & Edge AI Summit 2024 held September 9-12 in San Jose, California. Organized annually by Kisaco Research, the summit brings together representatives from the AI and machine learning ecosystem to share industry breakthroughs and developments. This year's event focused on exploring cost and energy efficiency across the entire technology stack. Marking its fourth appearance at the summit, SK hynix highlighted how its AiM products can boost AI performance across data centers and edge devices.

Booth Highlights: Meet the Upgraded AiMX
In the AI era, high-performance memory products are vital for the smooth operation of LLMs. However, as these LLMs are trained on increasingly larger datasets and continue to expand, there is a growing need for more efficient solutions. SK hynix addresses this demand with its PIM product AiMX, an AI accelerator card that combines multiple GDDR6-AiMs to provide high bandwidth and outstanding energy efficiency. At the AI Hardware & Edge AI Summit 2024, SK hynix presented its updated 32 GB AiMX prototype which offers double the capacity of the original card featured at last year's event. To highlight the new AiMX's advanced processing capabilities in a multi-batch environment, SK hynix held a demonstration of the prototype card with the Llama 3 70B model, an open source LLM. In particular, the demonstration underlined AiMX's ability to serve as a highly effective attention accelerator in data centers.

AMD to Unify Gaming "RDNA" and Data Center "CDNA" into "UDNA": Singular GPU Architecture Similar to NVIDIA's CUDA

According to new information from Tom's Hardware, AMD has announced plans to unify its consumer-focused gaming RDNA and data center CDNA graphics architectures into a single, unified design called "UDNA." The announcement was made by AMD's Jack Huynh, Senior Vice President and General Manager of the Computing and Graphics Business Group, at IFA 2024 in Berlin. The goal of the new UDNA architecture is to provide a single focus point for developers so that each optimized application can run on consumer-grade GPU like Radeon RX 7900XTX as well as high-end data center GPU like Instinct MI300. This will create a unification similar to NVIDIA's CUDA, which enables CUDA-focused developers to run applications on everything ranging from laptops to data centers.
Jack HuynhSo, part of a big change at AMD is today we have a CDNA architecture for our Instinct data center GPUs and RDNA for the consumer stuff. It's forked. Going forward, we will call it UDNA. There'll be one unified architecture, both Instinct and client [consumer]. We'll unify it so that it will be so much easier for developers versus today, where they have to choose and value is not improving.

Microsoft Unveils New Details on Maia 100, Its First Custom AI Chip

Microsoft provided a detailed view of Maia 100 at Hot Chips 2024, their initial specialized AI chip. This new system is designed to work seamlessly from start to finish, with the goal of improving performance and reducing expenses. It includes specially made server boards, unique racks, and a software system focused on increasing the effectiveness and strength of sophisticated AI services, such as Azure OpenAI. Microsoft introduced Maia at Ignite 2023, sharing that they had created their own AI accelerator chip. More information was provided earlier this year at the Build developer event. The Maia 100 is one of the biggest processors made using TSMC's 5 nm technology, designed for handling extensive AI tasks on Azure platform.

Maia 100 SoC architecture features:
  • A high-speed tensor unit (16xRx16) offers rapid processing for training and inferencing while supporting a wide range of data types, including low precision data types such as the MX data format, first introduced by Microsoft through the MX Consortium in 2023.
  • The vector processor is a loosely coupled superscalar engine built with custom instruction set architecture (ISA) to support a wide range of data types, including FP32 and BF16.
  • A Direct Memory Access (DMA) engine supports different tensor sharding schemes.
  • Hardware semaphores enable asynchronous programming on the Maia system.

Japan Unveils Plans for Zettascale Supercomputer: 100 PFLOPs of AI Compute per Node

The zettascale era is officially on the map, as Japan has announced plans to develop a successor to its renowned Fugaku supercomputer. The Ministry of Education, Culture, Sports, Science and Technology (MEXT) has set its sights on creating a machine capable of unprecedented processing power, aiming for 50 ExaFLOPS of peak AI performance with zettascale capabilities. The ambitious "Fugaku Next" project, slated to begin development next year, will be headed by RIKEN, one of Japan's leading research institutions, in collaboration with tech giant Fujitsu. With a target completion date of 2030, the new supercomputer aims to surpass current technological boundaries, potentially becoming the world's fastest once again. MEXT's vision for the "Fugaku Next" includes groundbreaking specifications for each computational node.

The ministry anticipates peak performance of several hundred FP64 TFLOPS for double-precision computations, around 50 FP16 PFLOPS for AI-oriented half-precision calculations, and approximately 100 PFLOPS for AI-oriented 8-bit precision calculations. These figures represent a major leap from Fugaku's current capabilities. The project's initial funding is set at ¥4.2 billion ($29.06 million) for the first year, with total government investment expected to exceed ¥110 billion ($761 million). While the specific architecture remains undecided, MEXT suggests the use of CPUs with special-purpose accelerators or a CPU-GPU combination. The semiconductor node of choice will likely be a 1 nm node or even more advanced nodes available at the time, with advanced packaging also used. The supercomputer will also feature an advanced storage system to handle traditional HPC and AI workloads efficiently. We already have an insight into Monaka, Fujitsu's upcoming CPU design with 150 Armv9 cores. However, Fugaku Next will be powered by the Monaka Next design, which will likely be much more capable.

FuriosaAI Unveils RNGD Power-Efficient AI Processor at Hot Chips 2024

Today at Hot Chips 2024, FuriosaAI is pulling back the curtain on RNGD (pronounced "Renegade"), our new AI accelerator designed for high-performance, highly efficient large language model (LLM) and multimodal model inference in data centers. As part of his Hot Chips presentation, Furiosa co-founder and CEO June Paik is sharing technical details and providing the first hands-on look at the fully functioning RNGD card.

With a TDP of 150 watts, a novel chip architecture, and advanced memory technology like HBM3, RNGD is optimized for inference with demanding LLMs and multimodal models. It's built to deliver high performance, power efficiency, and programmability all in a single product - a trifecta that the industry has struggled to achieve in GPUs and other AI chips.

Strong AI Chip Demand Pushes TSMC's July Revenue by 45% Year-over-Year

The demand for AI accelerators is going strong, and the world's largest semiconductor manufacturer, TSMC, has just confirmed that with its July 2024 revenue report. According to its latest July 2024 data, TSMC has reported a consolidated revenue of NT$256.95 billion, or about $7.94 billion at the time of writing. This represents a massive 23.6% jump from June 2024 and a 44.7% from July 2023, when revenue came in at NT$207.869 billion and NT$177.616 billion, respectively. For revenue throughout the year, measured from January to July, TSMC booked NT$1.523 trillion, or about $47 billion at the current rate. For this 7-month period, TSMC's revenue has increased by 30.5% Year-on-Year (YoY), showing great demand and an uptick in the company's production capabilities.

Of course, this is possible thanks to the massive demand driving AI chip sales from various startups and established giants like NVIDIA and AMD. Another vital customer for TSMC is Apple, which produces smartphone and Mac chips at Taiwanese facilities. The solid financial results from TSMC suggest that other fabless chip designers in its ecosystem may also experience positive outcomes in their earnings. It's worth noting that the semiconductor supply chain operates on a long-term planning basis, with arrangements made months in advance. As such, we can expect advanced silicon solutions to reach new customers in the coming months, further driving growth in the sector.

Particle Unveils Tachyon, the First 5G All-Purpose AI-Enabled SBC

Particle, a leading IoT edge-to-cloud platform provider, has launched Tachyon, its first Qualcomm Snapdragon powered single-board computer (SBC) designed to make cutting-edge chipsets and AI tooling widely accessible to consumers and businesses.

Tachyon brings the power of a modern smartphone to the far corners of the world with speedy hardware, a powerful AI accelerator, built-in high-bandwidth 5G and Wi-Fi 6E connectivity, and a Linux-powered Ubuntu operating system. By providing a complete edge-to-cloud infrastructure, Particle enables customers to focus on what matters most: their application.

MaxLinear to Showcase Panther III at Future of Memory and Storage 2024 Trade Show

MaxLinear, Inc., a leading provider of data storage acceleration solutions for enterprise and data center applications, today announced it will demonstrate the advanced compression, encryption, and security performance of its storage acceleration solution, Panther III, at the Future of Memory and Storage (FMS) 2024 trade show from August 6-8, 2024. The demos will show that Panther III can achieve up to 40 times more throughput, up to 190 times better latency, and up to 1000 times less CPU utilization than a software-only solution, leading to significant cost savings in terms of flash drives and needed CPU cores.

MaxLinear's Panther III creates a bold new product category for maximizing the performance of data storage systems - a comprehensive, all-in-one "storage accelerator." Unlike encryption and/or compression solutions, MaxLinear's Panther III consolidates a comprehensive suite of storage acceleration functions, including compression, deduplication, encryption, data protection, and real-time validation, in a single hardware-based solution. Panther III is engineered to offload and expedite specific data processing tasks, thus providing a significant performance boost, storage cost savings, and energy savings compared to traditional software-only, FPGA, and other competitive solutions.

Marvell Introduces Breakthrough Structera CXL Product Line to Address Server Memory Bandwidth and Capacity Challenges in Cloud Data Centers

Marvell Technology, Inc., a leader in data infrastructure semiconductor solutions, today launched the Marvell Structera product line of Compute Express Link (CXL) devices that enable cloud data center operators to overcome memory performance and scaling challenges in general-purpose servers.

To address memory-intensive applications, data center operators add extra servers to get higher memory bandwidth and higher memory capacity. The compute capabilities from the added processors are typically not utilized for these applications, making the servers inefficient from cost and power perspectives. The CXL industry standard addresses this challenge by enabling new architectures that can efficiently add memory to general-purpose servers.

OpenAI in Talks with Broadcom About Developing Custom AI Chips to Power Next Generation Models

According to The Information, OpenAI is reportedly in talks with Broadcom about developing a custom AI accelerator to power OpenAI's growing demand for high-performance solutions. Broadcom is a fabless chip designer known for a wide range of silicon solutions spanning from networking, PCIe, SSD controllers, and PHYs all the way up to custom ASICs. The latter part is what OpenAI wants to focus on, but all the aforementioned IP developed by Broadcom is of use in a data center. Suppose OpenAI decides to use Broadcom's solutions. In that case, the fabless silicon designer offers a complete vertical stack of products for inter-system communication using various protocols such as PCIe, system-to-system communication using Ethernet networking with Broadcom Tomahawk 6 and future revisions, alongside storage solutions and many other complimentary elements of a data center.

As a company skilled in making various IPs, it also makes ASIC solutions for other companies and has assisted Google in the making of its Tensor Processing Unit (TPU), which is now in its sixth generation. Google TPUs are massively successful as Google deploys millions of them and provides AI solutions to billions of users across the globe. Now, OpenAI wants to be part of the AI chip game, and Broadcom could come to the rescue with its already-established AI success and various other data center componentry to help make a custom AI accelerator to power OpenAI's infrastructure needed for the next generation of AI models. With each new AI model released by OpenAI, compute demand spikes by several orders of magnitude, and having an AI accelerator that exactly matches their need will help the company move faster and run even bigger AI models.

AMD Plans to Use Glass Substrates in its 2025/2026 Lineup of High-Performance Processors

AMD reportedly plans to incorporate glass substrates into its high-performance system-in-packages (SiPs) sometimes between 2025 and 2026. Glass substrates offer several advantages over traditional organic substrates, including superior flatness, thermal properties, and mechanical strength. These characteristics make them well-suited for advanced SiPs containing multiple chiplets, especially in data center applications where performance and durability are critical. The adoption of glass substrates aligns with the industry's broader trend towards more complex chip designs. As leading-edge process technologies become increasingly expensive and yield gains diminish, manufacturers turn to multi-chiplet designs to improve performance. AMD's current EPYC server processors already incorporate up to 13 chiplets, while its Instinct AI accelerators feature 22 pieces of silicon. A more extreme testament is Intel's Ponte Vecchio, which utilized 63 tiles in a single package.

Glass substrates could enable AMD to create even more complex designs without relying on costly interposers, potentially reducing overall production expenses. This technology could further boost the performance of AI and HPC accelerators, which are a growing market and require constant innovation. The glass substrate market is heating up, with major players like Intel, Samsung, and LG Innotek also investing heavily in this technology. Market projections suggest explosive growth, from $23 million in 2024 to $4.2 billion by 2034. Last year, Intel committed to investing up to 1.3 trillion Won (almost one billion USD) to start applying glass substrates to its processors by 2028. Everything suggests that glass substrates are the future of chip design, and we await to see first high-volume production designs.

NVIDIA to Sell Over One Million H20 GPUs to China, Taking Home $12 Billion

When NVIDIA started preparing the H20 GPU for China, the company anticipated great demand from sanction-obeying GPUs. However, we now know precisely what the company makes from its Chinese venture: an astonishing $12 billion in take-home revenue. Due to the massive demand for NVIDIA GPUs, Chinese AI research labs are acquiring as many as they can get their hands on. According to a report from Financial Times, citing SemiAnalysis as its source, NVIDIA will sell over one million H20 GPUs in China. This number far outweighs the number of home-grown Huawei Ascend 910B accelerators that the Chinese companies plan to source, with numbers being "only" 550,000 Ascend 910B chips. While we don't know if Chinese semiconductor makers like SMIC are capable of producing more chips or if the demand isn't as high, we know why NVIDIA H20 chips are the primary target.

The Huawei Ascend 910B features Total Processing Performance (TPP), a metric developed by US Govt. to track GPU performance measuring TeraFLOPS times bit-length of over 5,000, while the NVIDIA H20 comes to 2,368 TPP, which is half of the Huawei accelerator. That is the performance on paper, where SemiAnalysis notes that the real-world performance is actually ahead for the H20 GPU due to better memory configuration of the GPU, including higher HBM3 memory bandwidth. All of this proves to be a better alternative than Ascend 910B accelerator, accounting for an estimate of over one million GPUs shipped this year in China. With an average price of $12,000 per NVIDIA H20 GPU, China's $12 billion revenue will undoubtedly help raise NVIDIA's 2024 profits even further.

AI Startup Etched Unveils Transformer ASIC Claiming 20x Speed-up Over NVIDIA H100

A new startup emerged out of stealth mode today to power the next generation of generative AI. Etched is a company that makes an application-specific integrated circuit (ASIC) to process "Transformers." The transformer is an architecture for designing deep learning models developed by Google and is now the powerhouse behind models like OpenAI's GPT-4o in ChatGPT, Anthropic Claude, Google Gemini, and Meta's Llama family. Etched wanted to create an ASIC for processing only the transformer models, making a chip called Sohu. The claim is Sohu outperforms NVIDIA's latest and greatest by an entire order of magnitude. Where a server configuration with eight NVIDIA H100 GPU clusters pushes Llama-3 70B models at 25,000 tokens per second, and the latest eight B200 "Blackwell" GPU cluster pushes 43,000 tokens/s, the eight Sohu clusters manage to output 500,000 tokens per second.

Why is this important? Not only does the ASIC outperform Hopper by 20x and Blackwell by 10x, but it also serves so many tokens per second that it enables an entirely new fleet of AI applications requiring real-time output. The Sohu architecture is so efficient that 90% of the FLOPS can be used, while traditional GPUs boast a 30-40% FLOP utilization rate. This translates into inefficiency and waste of power, which Etched hopes to solve by building an accelerator dedicated to power transformers (the "T" in GPT) at massive scales. Given that the frontier model development costs more than one billion US dollars, and hardware costs are measured in tens of billions of US Dollars, having an accelerator dedicated to powering a specific application can help advance AI faster. AI researchers often say that "scale is all you need" (resembling the legendary "attention is all you need" paper), and Etched wants to build on that.

US Government Considers Tighter Restriction on China's Access to GAA Transistors and HBM Memory

According to sources familiar with the matter and reported by Bloomberg, the Biden administration is considering imposing further export controls to limit China's ability to acquire advanced semiconductor technologies crucial for developing AI systems. Gate-all-around (GAA) transistor technology and high-bandwidth memory (HBM) chips are at the center of the proposed restrictions. These cutting-edge components play a pivotal role in creating powerful AI accelerators. GAA transistors, a key feature in next-generation chips, promise substantial improvements in power efficiency and processing speeds. Meanwhile, HBM chips enable high-speed data transfer between a processor and memory. While existing sanctions prevent American firms from supplying Chinese companies with equipment for manufacturing leading-edge chips, concerns persist that China could still attain advanced capabilities through other means.

For instance, China's leading chipmaker, SMIC, could potentially integrate GAA transistors into its existing 7 nm process node, markedly enhancing performance. Access to HBM would further augment China's ability to develop AI accelerators on par with cutting-edge offerings from US firms. The reflections within the Biden administration show a strategic effort to preserve America's technological edge by denying China access to key semiconductor innovations. However, implementing such stringent export controls is a delicate balancing act, as it risks heightening tensions and prompting Chinese retaliation. No final decision has been made, and officials continue weighing the proposed restrictions' pros and cons. Nonetheless, the discussions highlight the pivotal role that semiconductor technology plays in the great power rivalry between the US and China, especially in the AI era.

AMD Adds RDNA 4 Generation Navi 44 and MI300X1 GPUs to ROCm Software

AMD has quietly added some interesting codenames to its ROCm hardware support list. The biggest surprise is the appearance of "RDNA 4" and "Navi 44" codenames, hinting at a potential successor to the current RDNA 3 GPU architecture powering AMD's Radeon RX 7000 series graphics cards. The upcoming Radeon RX 8000 series could see Navi 44 SKU with a codename "gfx1200". While details are scarce, the inclusion of RDNA 4 and Navi 44 in the ROCm list suggests AMD is working on a new GPU microarchitecture that could bring significant performance and efficiency gains. While RDNA 4 may be destined for future Radeon gaming GPUs, in the data center GPU compute market, AMD is preparing a CDNA 4 based successors to the MI300 series. However, it appears that we haven't seen all the MI300 variants first. Equally intriguing is the "MI300X1" codename, which appears to reference an upcoming AI-focused accelerator from AMD.

While we wait for more information, we can't decipher whether the Navi 44 GPU SKU is for the high-end or low-end segment. If previous generations are for reference, then the Navi 44 SKU would target the low end of the GPU performance spectrum. The previous generation RDNA 3 had Navi 33 as an entry-level model, whereas the RDNA 2 had a Navi 24 SKU for entry-level GPUs. We have reported on RDNA 4 merely being a "bug correction" generation to fix the perf/Watt curve and offer better efficiency overall. What happens finally, we have to wait and see. AMD could announce more details in its upcoming Computex keynote.

Intel AI Platforms Accelerate Microsoft Phi-3 GenAI Models

Intel has validated and optimized its AI product portfolio across client, edge and data center for several of Microsoft's Phi-3 family of open models. The Phi-3 family of small, open models can run on lower-compute hardware, be more easily fine-tuned to meet specific requirements and enable developers to build applications that run locally. Intel's supported products include Intel Gaudi AI accelerators and Intel Xeon processors for data center applications and Intel Core Ultra processors and Intel Arc graphics for client.

"We provide customers and developers with powerful AI solutions that utilize the industry's latest AI models and software. Our active collaboration with fellow leaders in the AI software ecosystem, like Microsoft, is key to bringing AI everywhere. We're proud to work closely with Microsoft to ensure Intel hardware - spanning data center, edge and client - actively supports several new Phi-3 models," said Pallavi Mahajan, Intel corporate vice president and general manager, Data Center and AI Software.
Return to Keyword Browsing
Nov 21st, 2024 12:06 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts