News Posts matching #inference

Return to Keyword Browsing

Microsoft Acquired Nearly 500,000 NVIDIA "Hopper" GPUs This Year

Microsoft is heavily investing in enabling its company and cloud infrastructure to support the massive AI expansion. The Redmond giant has acquired nearly half a million of the NVIDIA "Hopper" family of GPUs to support this effort. According to market research company Omdia, Microsoft was the biggest hyperscaler, with data center CapEx and GPU expenditure reaching a record high. The company acquired precisely 485,000 NVIDIA "Hopper" GPUs, including H100, H200, and H20, resulting in more than $30 billion spent on servers alone. To put things into perspective, this is about double that of the next-biggest GPU purchaser, Chinese ByteDance, who acquired about 230,000 sanction-abiding H800 GPUs and regular H100s sources from third parties.

Regarding US-based companies, the only ones that have come close to the GPU acquisition rate are Meta, Tesla/xAI, Amazon, and Google. They have acquired around 200,000 GPUs on average while significantly boosting their in-house chip design efforts. "NVIDIA GPUs claimed a tremendously high share of the server capex," Vlad Galabov, director of cloud and data center research at Omdia, noted, adding, "We're close to the peak." Hyperscalers like Amazon, Google, and Meta have been working on their custom solutions for AI training and inference. For example, Google has its TPU, Amazon has its Trainium and Inferentia chips, and Meta has its MTIA. Hyperscalers are eager to develop their in-house solutions, but NVIDIA's grip on the software stack paired with timely product updates seems hard to break. The latest "Blackwell" chips are projected to get even bigger orders, so only the sky (and the local power plant) is the limit.

Axelera AI Partners with Arduino for Edge AI Solutions

Axelera AI - a leading edge-inference company - and Arduino, the global leader in open-source hardware and software, today announced a strategic partnership to make high-performance AI at the edge more accessible than ever, building advanced technology solutions based on inference and an open ecosystem. This furthers Axelera AI's strategy to democratize artificial intelligence everywhere.

The collaboration will combine the strengths of Axelera AI's Metis AI Platform with the powerful SOMs from the Arduino Pro range to provide customers with easy-to-use hardware and software to innovate around AI. Users will enjoy the freedom to dictate their own AI journey, thanks to tools that provide unique digital in-memory computing and RISC-V controlled dataflow technology, delivering high performance and usability at a fraction of the cost and power of other solutions available today.

Advantech Unveils Hailo-8 Powered AI Acceleration Modules for High-Efficiency Vision AI Applications

Advantech, a leading provider of AIoT platforms and services, proudly unveils its latest AI acceleration modules: the EAI-1200 and EAI-3300, powered by Hailo-8 AI processors. These modules deliver AI performance of up to 52 TOPS while achieving more than 12 times the power efficiency of comparable AI modules and GPU cards. Designed in standard M.2 and PCIe form factors, the EAI-1200 and EAI-3300 can be seamlessly integrated with diverse x86 and Arm-based platforms, enabling quick upgrades of existing systems and boards to incorporate AI capabilities. With these AI acceleration modules, developers can run inference efficiently on the Hailo-8 NPU while handling application processing primarily on the CPU, optimizing resource allocation. The modules are paired with user-friendly software toolkits, including the Edge AI SDK for seamless integration with HailoRT, the Dataflow Compiler for converting existing models, and TAPPAS, which offers pre-trained application examples. These features accelerate the development of edge-based vision AI applications.

EAI-1200 M.2 AI Module: Accelerating Development for Vision AI Security
The EAI-1200 is an M.2 AI module powered by a single Hailo-8 VPU, delivering up to 26 TOPS of computing performance while consuming approximately 5 watts of power. An optional heatsink supports operation in temperatures ranging from -40 to 65°C, ensuring easy integration. This cost-effective module is especially designed to bundle with Advantech's systems and boards, such as the ARK-1221L, AIR-150, and AFE-R770, enhancing AI applications including baggage screening, workforce safety, and autonomous mobile robots (AMR).

SPEC Delivers Major SPECworkstation 4.0 Benchmark Update, Adds AI/ML Workloads

The Standard Performance Evaluation Corporation (SPEC), the trusted global leader in computing benchmarks, today announced the availability of the SPECworkstation 4.0 benchmark, a major update to SPEC's comprehensive tool designed to measure all key aspects of workstation performance. This significant upgrade from version 3.1 incorporates cutting-edge features to keep pace with the latest workstation hardware and the evolving demands of professional applications, including the increasing reliance on data analytics, AI and machine learning (ML).

The new SPECworkstation 4.0 benchmark provides a robust, real-world measure of CPU, graphics, accelerator, and disk performance, ensuring professionals have the data they need to make informed decisions about their hardware investments. The benchmark caters to the diverse needs of engineers, scientists, and developers who rely on workstation hardware for daily tasks. It includes real-world applications like Blender, Handbrake, LLVM and more, providing a comprehensive performance measure across seven different industry verticals, each focusing on specific use cases and subsystems critical to workstation users. SPECworkstation 4.0 benchmark marks a significant milestone for measuring workstation AI performance, providing an unbiased, real-world, application-driven tool for measuring how workstations handle AI/ML workloads.

Amazon AWS Announces General Availability of Trainium2 Instances, Reveals Details of Next Gen Trainium3 Chip

At AWS re:Invent, Amazon Web Services, Inc. (AWS), an Amazon.com, Inc. company, today announced the general availability of AWS Trainium2-powered Amazon Elastic Compute Cloud (Amazon EC2) instances, introduced new Trn2 UltraServers, enabling customers to train and deploy today's latest AI models as well as future large language models (LLM) and foundation models (FM) with exceptional levels of performance and cost efficiency, and unveiled next-generation Trainium3 chips.

"Trainium2 is purpose built to support the largest, most cutting-edge generative AI workloads, for both training and inference, and to deliver the best price performance on AWS," said David Brown, vice president of Compute and Networking at AWS. "With models approaching trillions of parameters, we understand customers also need a novel approach to train and run these massive workloads. New Trn2 UltraServers offer the fastest training and inference performance on AWS and help organizations of all sizes to train and deploy the world's largest models faster and at a lower cost."

AMD Releases ROCm 6.3 with SGLang, Fortran Compiler, Multi-Node FFT, Vision Libraries, and More

AMD has released the new ROCm 6.3 version which introduces several new features and optimizations, including SGLang integration for accelerated AI inferencing, a re-engineered FlashAttention-2 for optimized AI training and inference, the introduction of multi-node Fast Fourier Transform (FFT), new Fortran compiler, and enhanced computer vision libraries like rocDecode, rocJPEG, and rocAL.

According to AMD, the SGLang, a runtime that is now supported by ROCm 6.3, is purpose-built for optimizing inference on models like LLMs and VLMs on AMD Instinct GPUs, and promises 6x higher throughput and much easier usage thanks to Python-integrated and pre-configured ROCm Docker containers. In addition, the AMD ROCm 6.3 also brings further transformer optimizations with FlashAttention-2, which should bring significant improvements in forward and backward pass compared to FlashAttention-1, a whole new AMD Fortran compiler with direct GPU offloading, backward compatibility, and integration with HIP Kernels and ROCm libraries, a whole new multi-node FFT support in rocFFT, which simplifies multi-node scaling and improved scalability, as well as enhanced computer vision libraries, rocDecode, rocJPEG, and rocAL, for AV1 codec support, GPU-accelerated JPEG decoding, and better audio augmentation.

Corsair by d-Matrix Enables GPU-Free AI Inference

d-Matrix today unveiled Corsair, an entirely new computing paradigm designed from the ground-up for the next era of AI inference in modern datacenters. Corsair leverages d-Matrix's innovative Digital In-Memory Compute architecture (DIMC), an industry first, to accelerate AI inference workloads with industry-leading real-time performance, energy efficiency, and cost savings as compared to GPUs and other alternatives.

The emergence of reasoning agents and interactive video generation represents the next level of AI capabilities. These leverage more inference computing power to enable models to "think" more and produce higher quality outputs. Corsair is the ideal inference compute solution with which enterprises can unlock new levels of automation and intelligence without compromising on performance, cost or power.

Hypertec Introduces the World's Most Advanced Immersion-Born GPU Server

Hypertec proudly announces the launch of its latest breakthrough product, the TRIDENT iG series, an immersion-born GPU server line that brings extreme density, sustainability, and performance to the AI and HPC community. Purpose-built for the most demanding AI applications, this cutting-edge server is optimized for generative AI, machine learning (ML), deep learning (DL), large language model (LLM) training, inference, and beyond. With up to six of the latest NVIDIA GPUs in a 2U form factor, a staggering 8 TB of memory with enhanced RDMA capabilities, and groundbreaking density supporting up to 200 GPUs per immersion tank, the TRIDENT iG server line is a game-changer for AI infrastructure.

Additionally, the server's innovative design features a single or dual root complex, enabling greater flexibility and efficiency for GPU usage in complex workloads.

ASUS Presents All-New Storage-Server Solutions to Unleash AI Potential at SC24

ASUS today announced its groundbreaking next-generation infrastructure solutions at SC24, featuring a comprehensive lineup powered by AMD and Intel, as well as liquid-cooling solutions designed to accelerate the future of AI. By continuously pushing the limits of innovation, ASUS simplifies the complexities of AI and high-performance computing (HPC) through adaptive server solutions paired with expert cooling and software-development services, tailored for the exascale era and beyond. As a total-solution provider with a distinguished history in pioneering AI supercomputing, ASUS is committed to delivering exceptional value to its customers.

Comprehensive Lineup for AI and HPC Success
To fuel enterprise digital transformation through HPC and AI-driven architecture, ASUS provides a full lineup of server systems that are powered by AMD and Intel. Startups, research institutions, large enterprises or government organizations all could find the adaptive solutions to unlock value and accelerate business agility from the big data.

Q.ANT Introduces First Commercial Photonic Processor

Q.ANT, the leading startup for photonic computing, today announced the launch of its first commercial product - a photonics-based Native Processing Unit (NPU) built on the company's compute architecture LENA - Light Empowered Native Arithmetics. The product is fully compatible with today's existing computing ecosystem as it comes on the industry-standard PCI-Express. The Q.ANT NPU executes complex, non-linear mathematics natively using light instead of electrons, promising to deliver at least 30 times greater energy efficiency and significant computational speed improvements over traditional CMOS technology. Designed for compute-intensive applications such as AI Inference, machine learning, and physics simulation, the Q.ANT NPU has been proven to solve real-world challenges, including number recognition for deep neural network inference (see the recent press release regarding Cloud Access to NPU).

"With our photonic chip technology now available on the standard PCIe interface, we're bringing the incredible power of photonics directly into real-world applications. For us, this is not just a processor—it's a statement of intent: Sustainability and performance can go hand in hand," said Dr. Michael Förtsch, CEO of Q.ANT. "For the first time, developers can create AI applications and explore the capabilities of photonic computing, particularly for complex, nonlinear calculations. For example, experts calculated that one GPT-4 query today uses 10 times more electricity than a regular internet search request. Our photonic computing chips offer the potential to reduce the energy consumption for that query by a factor of 30."

Etched Introduces AI-Powered Games Without GPUs, Displays Minecraft Replica

The gaming industry is about to get massively disrupted. Instead of using game engines to power games, we are now witnessing an entirely new and crazy concept. A startup specializing in designing ASICs specifically for Transformer architecture, the foundation behind generative AI models like GPT/Claude/Stable Diffusion, has showcased a demo in partnership with Decart of a Minecraft clone being entirely generated and operated by AI instead of the traditional game engine. While we use AI to create images and videos based on specific descriptions and output pretty realistic content, having an AI model spit out an entire playable game is something different. Oasis is the first playable, real-time, real-time, open-world AI model that takes users' input and generates real-time gameplay, including physics, game rules, and graphics.

An interesting thing to point out is the hardware that powers this setup. Using a single NVIDIA H100 GPU, this 500-million parameter Oasis model can run at 720p resolution at 20 generated frames per second. Due to limitations of accelerators like NVIDIA's H100/B200, gameplay at 4K is almost impossible. However, Etched has its own accelerator called Sohu, which is specialized in accelerating transformer architectures. Eight NVIDIA H100 GPUs can power five Oasis models to five users, while the eight Sohu cards are capable of serving 65 Oasis runs to 65 users. This is more than a 10x increase in inference capability compared to NVIDIA's hardware on a single-use case alone. The accelerator is designed to run much larger models like future 100 billion-parameter generative AI video game models that can output 4K 30 FPS, all thanks to 144 GB of HBM3E memory, yielding 1,152 GB in eight-accelerator server configuration.

Cisco Unveils Plug-and-Play AI Solutions Powered by NVIDIA H100 and H200 Tensor Core GPUs

Today, Cisco announced new additions to its data center infrastructure portfolio: an AI server family purpose-built for GPU-intensive AI workloads with NVIDIA accelerated computing, and AI PODs to simplify and de-risk AI infrastructure investment. They give organizations an adaptable and scalable path to AI, supported by Cisco's industry-leading networking capabilities.

"Enterprise customers are under pressure to deploy AI workloads, especially as we move toward agentic workflows and AI begins solving problems on its own," said Jeetu Patel, Chief Product Officer, Cisco. "Cisco innovations like AI PODs and the GPU server strengthen the security, compliance, and processing power of those workloads as customers navigate their AI journeys from inferencing to training."

Meta Shows Open-Architecture NVIDIA "Blackwell" GB200 System for Data Center

During the Open Compute Project (OCP) Summit 2024, Meta, one of the prime members of the OCP project, showed its NVIDIA "Blackwell" GB200 systems for its massive data centers. We previously covered Microsoft's Azure server rack with GB200 GPUs featuring one-third of the rack space for computing and two-thirds for cooling. A few days later, Google showed off its smaller GB200 system, and today, Meta is showing off its GB200 system—the smallest of the bunch. To train a dense transformer large language model with 405B parameters and a context window of up to 128k tokens, like the Llama 3.1 405B, Meta must redesign its data center infrastructure to run a distributed training job on two 24,000 GPU clusters. That is 48,000 GPUs used for training a single AI model.

Called "Catalina," it is built on the NVIDIA Blackwell platform, emphasizing modularity and adaptability while incorporating the latest NVIDIA GB200 Grace Blackwell Superchip. To address the escalating power requirements of GPUs, Catalina introduces the Orv3, a high-power rack capable of delivering up to 140kW. The comprehensive liquid-cooled setup encompasses a power shelf supporting various components, including a compute tray, switch tray, the Orv3 HPR, Wedge 400 fabric switch with 12.8 Tbps switching capacity, management switch, battery backup, and a rack management controller. Interestingly, Meta also upgraded its "Grand Teton" system for internal usage, such as deep learning recommendation models (DLRMs) and content understanding with AMD Instinct MI300X. Those are used to inference internal models, and MI300X appears to provide the best performance per Dollar for inference. According to Meta, the computational demand stemming from AI will continue to increase exponentially, so more NVIDIA and AMD GPUs is needed, and we can't wait to see what the company builds.

Intel Won't Compete Against NVIDIA's High-End AI Dominance Soon, Starts Laying Off Over 2,200 Workers Across US

Intel's taking a different path with its Gaudi 3 accelerator chips. It's staying away from the high-demand market for training big AI models, which has made NVIDIA so successful. Instead, Intel wants to help businesses that need cheaper AI solutions to train and run smaller specific models and open-source options. At a recent event, Intel talked up Gaudi 3's "price performance advantage" over NVIDIA's H100 GPU for inference tasks. Intel says Gaudi 3 is faster and more cost-effective than the H100 when running Llama 3 and Llama 2 models of different sizes.

Intel also claims that Gaudi 3 is as power-efficient as the H100 for large language model (LLM) inference with small token outputs and does even better with larger outputs. The company even suggests Gaudi 3 beats NVIDIA's newer H200 in LLM inference throughput for large token outputs. However, Gaudi 3 doesn't match up to the H100 in overall floating-point operation throughput for 16-bit and 8-bit formats. For bfloat16 and 8-bit floating-point precision matrix math, Gaudi 3 hits 1,835 TFLOPS in each format, while the H100 reaches 1,979 TFLOPS for BF16 and 3,958 TFLOPS for FP8.

Supermicro Introduces New Servers and GPU Accelerated Systems with AMD EPYC 9005 Series CPUs and AMD Instinct MI325X GPUs

Supermicro, Inc., a Total IT Solution Provider for AI, Cloud, Storage, and 5G/Edge, announces the launch of a new series of servers, GPU-accelerated systems, and storage servers featuring the AMD EPYC 9005 Series processors and AMD Instinct MI325X GPUs. The new H14 product line represents one of the most extensive server families in the industry, including Supermicro's Hyper systems, the Twin multi-node servers, and AI inferencing GPU systems, all available with air or liquid cooling options. The new "Zen 5" processor core architecture implements full data path AVX-512 vector instructions for CPU-based AI inference and provides 17% better instructions per cycle (IPC) than the previous 4th generation EPYC processor, enabling more performance per core.

Supermicro's new H14 family uses the latest 5th Gen AMD EPYC processors which enable up to 192 cores per CPU with up to 500 W TDP (thermal design power). Supermicro has designed new H14 systems including the Hyper and FlexTwin systems which can accommodate the higher thermal requirements. The H14 family also includes three systems for AI training and inference workloads supporting up to 10 GPUs which feature the AMD EPYC 9005 Series CPU as the host processor and two which support the AMD Instinct MI325X GPU.

Astera Labs Introduces New Portfolio of Fabric Switches Purpose-Built for AI Infrastructure at Cloud-Scale

Astera Labs, Inc, a global leader in semiconductor-based connectivity solutions for AI and cloud infrastructure, today announced a new portfolio of fabric switches, including the industry's first PCIe 6 switch, built from the ground up for demanding AI workloads in accelerated computing platforms deployed at cloud-scale. The Scorpio Smart Fabric Switch portfolio is optimized for AI dataflows to deliver maximum predictable performance per watt, high reliability, easy cloud-scale deployment, reduced time-to-market, and lower total cost of ownership.

The Scorpio Smart Fabric Switch portfolio features two application-specific product lines with a multi-generational roadmap:
  • Scorpio P-Series for GPU-to-CPU/NIC/SSD PCIe 6 connectivity- architected to support mixed traffic head-node connectivity across a diverse ecosystem of PCIe hosts and endpoints.
  • Scorpio X-Series for back-end GPU clustering-architected to deliver the highest back-end GPU-to-GPU bandwidth with platform-specific customization.

Thundercomm Launches RUBIK Pi on Qualcomm Platforms

At an industry event in Austin today, Thundercomm announces RUBIK Pi, the first Pi built on Qualcomm SoC platforms for developers. RUBIK Pi is an innovative tool that aims to lower the barriers application development with AI inference, allowing developers to access high-performance, easy-to-deploy AI R&D tools.

The Pi product is a must-have for electronics enthusiasts and developers. It can be seen as a microcomputer, integrating a processor, memory, storage, and various interfaces on a credit card-sized board. Thundercomm, a world-leading IoT product and solution provider, building on its expertise in ICT technologies and developer workflows, launched RUBIK Pi, aiming to create the most user-friendly AI R&D tools.

AMD Introduces the Radeon PRO V710 to Microsoft Azure

AMD today introduced the Radeon PRO V710, the newest member of AMD's family of visual cloud GPUs. Available today in private preview on Microsoft Azure, the Radeon PRO V710 brings new capabilities to the public cloud. The AMD Radeon PRO V710's 54 Compute Units, along with 28 GB of VRAM, 448 GB/s memory transfer rate, and 54 MB of L3 AMD Infinity Cache technology, support small to medium ML inference workloads and small model training using open-source AMD ROCm software.

With support for hardware virtualization implemented in compliance with the PCI Express SR-IOV standard, instances based on the Radeon PRO V710 can provide robust isolation between multiple virtual machines running on the same physical GPU and between the host and guest environments. The efficient RDNA 3 architecture provides excellent performance per watt, enabling a single slot, passively cooled form factor compliant with the PCIe CEM spec.

Foxconn to Build Taiwan's Fastest AI Supercomputer With NVIDIA Blackwell

NVIDIA and Foxconn are building Taiwan's largest supercomputer, marking a milestone in the island's AI advancement. The project, Hon Hai Kaohsiung Super Computing Center, revealed Tuesday at Hon Hai Tech Day, will be built around NVIDIA's groundbreaking Blackwell architecture and feature the GB200 NVL72 platform, which includes a total of 64 racks and 4,608 Tensor Core GPUs. With an expected performance of over 90 exaflops of AI performance, the machine would easily be considered the fastest in Taiwan.

Foxconn plans to use the supercomputer, once operational, to power breakthroughs in cancer research, large language model development and smart city innovations, positioning Taiwan as a global leader in AI-driven industries. Foxconn's "three-platform strategy" focuses on smart manufacturing, smart cities and electric vehicles. The new supercomputer will play a pivotal role in supporting Foxconn's ongoing efforts in digital twins, robotic automation and smart urban infrastructure, bringing AI-assisted services to urban areas like Kaohsiung.

NVIDIA Cancels Dual-Rack NVL36x2 in Favor of Single-Rack NVL72 Compute Monster

NVIDIA has reportedly discontinued its dual-rack GB200 NVL36x2 GPU model, opting to focus on the single-rack GB200 NVL72 and NVL36 models. This shift, revealed by industry analyst Ming-Chi Kuo, aims to simplify NVIDIA's offerings in the AI and HPC markets. The decision was influenced by major clients like Microsoft, who prefer the NVL72's improved space efficiency and potential for enhanced inference performance. While both models perform similarly in AI large language model (LLM) training, the NVL72 is expected to excel in non-parallelizable inference tasks. As a reminder, the NVL72 features 36 Grace CPUs, delivering 2,592 Arm Neoverse V2 cores with 17 TB LPDDR5X memory with 18.4 TB/s aggregate bandwidth. Additionally, it includes 72 Blackwell GB200 SXM GPUs that have a massive 13.5 TB of HBM3e combined, running at 576 TB/s aggregate bandwidth.

However, this shift presents significant challenges. The NVL72's power consumption of around 120kW far exceeds typical data center capabilities, potentially limiting its immediate widespread adoption. The discontinuation of the NVL36x2 has also sparked concerns about NVIDIA's execution capabilities and may disrupt the supply chain for assembly and cooling solutions. Despite these hurdles, industry experts view this as a pragmatic approach to product planning in the dynamic AI landscape. While some customers may be disappointed by the dual-rack model's cancellation, NVIDIA's long-term outlook in the AI technology market remains strong. The company continues to work with clients and listen to their needs, to position itself as a leader in high-performance computing solutions.

Huawei Starts Shipping "Ascend 910C" AI Accelerator Samples to Large NVIDIA Customers

Huawei has reportedly started shipping its Ascend 910C accelerator—the company's domestic alternative to NVIDIA's H100 accelerator for AI training and inference. As the report from China South Morning Post notes, Huawei is shipping samples of its accelerator to large NVIDIA customers. This includes companies like Alibaba, Baidu, and Tencent, which have ordered massive amounts of NVIDIA accelerators. However, Huawei is on track to deliver 70,000 chips, potentially worth $2 billion. With NVIDIA working on a B20 accelerator SKU that complies with US government export regulations, the Huawei Ascend 910C accelerator could potentially outperform NVIDIA's B20 processor, per some analyst expectations.

If the Ascend 910C receives positive results from Chinese tech giants, it could be the start of Huawei's expansion into data center accelerators, once hindered by the company's ability to manufacture advanced chips. Now, with foundries like SMIC printing 7 nm designs and possibly 5 nm coming soon, Huawei will leverage this technology to satisfy the domestic demand for more AI processing power. Competing on a global scale, though, remains a challenge. Companies like NVIDIA, AMD, and Intel have access to advanced nodes, which gives their AI accelerators more efficiency and performance.

Cerebras Launches the World's Fastest AI Inference

Today, Cerebras Systems, the pioneer in high performance AI compute, announced Cerebras Inference, the fastest AI inference solution in the world. Delivering 1,800 tokens per second for Llama3.1 8B and 450 tokens per second for Llama3.1 70B, Cerebras Inference is 20 times faster than NVIDIA GPU-based solutions in hyperscale clouds. Starting at just 10c per million tokens, Cerebras Inference is priced at a fraction of GPU solutions, providing 100x higher price-performance for AI workloads.

Unlike alternative approaches that compromise accuracy for performance, Cerebras offers the fastest performance while maintaining state of the art accuracy by staying in the 16-bit domain for the entire inference run. Cerebras Inference is priced at a fraction of GPU-based competitors, with pay-as-you-go pricing of 10 cents per million tokens for Llama 3.1 8B and 60 cents per million tokens for Llama 3.1 70B.

AI SSD Procurement Capacity Estimated to Exceed 45 EB in 2024; NAND Flash Suppliers Accelerate Process Upgrades

TrendForce's latest report on enterprise SSDs reveals that a surge in demand for AI has led AI server customers to significantly increase their orders for enterprise SSDs over the past two quarters. Upstream suppliers have been accelerating process upgrades and planning for 2YY products—slated to enter mass production in 2025—in order to meet the growing demand for SSDs in AI applications.

TrendForce observes that increased orders for enterprise SSDs from AI server customers have resulted in contract prices for this category rising by over 80% from 4Q23 to 3Q24. SSDs play a crucial role in AI development. In AI model training, SSDs primarily store model parameters, including evolving weights and deviations.

NVIDIA Accelerates Humanoid Robotics Development

To accelerate humanoid development on a global scale, NVIDIA today announced it is providing the world's leading robot manufacturers, AI model developers and software makers with a suite of services, models and computing platforms to develop, train and build the next generation of humanoid robotics.

Among the offerings are new NVIDIA NIM microservices and frameworks for robot simulation and learning, the NVIDIA OSMO orchestration service for running multi-stage robotics workloads, and an AI- and simulation-enabled teleoperation workflow that allows developers to train robots using small amounts of human demonstration data.

Intel Submits Gaudi 2 Results on MLCommons' Newest Benchmark

Today, MLCommons published results of its industry AI performance benchmark, MLPerf Training v4.0. Intel's results demonstrate the choice that Intel Gaudi 2 AI accelerators give enterprises and customers. Community-based software simplifies generative AI (GenAI) development and industry-standard Ethernet networking enables flexible scaling of AI systems. For the first time on the MLPerf benchmark, Intel submitted results on a large Gaudi 2 system (1,024 Gaudi 2 accelerators) trained in Intel Tiber Developer Cloud to demonstrate Gaudi 2 performance and scalability and Intel's cloud capacity for training MLPerf's GPT-3 175B1 parameter benchmark model.

"The industry has a clear need: address the gaps in today's generative AI enterprise offerings with high-performance, high-efficiency compute options. The latest MLPerf results published by MLCommons illustrate the unique value Intel Gaudi brings to market as enterprises and customers seek more cost-efficient, scalable systems with standard networking and open software, making GenAI more accessible to more customers," said Zane Ball, Intel corporate vice president and general manager, DCAI Product Management.
Return to Keyword Browsing
Dec 20th, 2024 22:28 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts