News Posts matching #ML

Return to Keyword Browsing

AMD Introduces GAIA - an Open-Source Project That Runs Local LLMs on Ryzen AI NPUs

AMD has launched a new open-source project called, GAIA (pronounced /ˈɡaɪ.ə/), an awesome application that leverages the power of Ryzen AI Neural Processing Unit (NPU) to run private and local large language models (LLMs). In this blog, we'll dive into the features and benefits of GAIA, while introducing how you can take advantage of GAIA's open-source project to adopt into your own applications.

Introduction to GAIA
GAIA is a generative AI application designed to run local, private LLMs on Windows PCs and is optimized for AMD Ryzen AI hardware (AMD Ryzen AI 300 Series Processors). This integration allows for faster, more efficient processing - i.e. lower power- while keeping your data local and secure. On Ryzen AI PCs, GAIA interacts with the NPU and iGPU to run models seamlessly by using the open-source Lemonade (LLM-Aid) SDK from ONNX TurnkeyML for LLM inference. GAIA supports a variety of local LLMs optimized to run on Ryzen AI PCs. Popular models like Llama and Phi derivatives can be tailored for different use cases, such as Q&A, summarization, and complex reasoning tasks.

Google Making Vulkan the Official Graphics API on Android

We're stepping up our multiplatform gaming offering with exciting news dropping at this year's Game Developers Conference (GDC). We're bringing users more games, more ways to play your games across devices, and improved gameplay. You can read all about the updates for users from The Keyword. At GDC, we'll be diving into all of the latest games coming to Play, plus new developer tools that'll help improve gameplay across the Android ecosystem.

We're sharing a closer look at what's new from Android. We're making Vulkan the official graphics API on Android, enabling you to build immersive visuals, and we're enhancing the Android Dynamic Performance Framework (ADPF) to help you deliver longer, more stable gameplays. Check out our video, or keep reading below.

ASUS Introduces New "AI Cache Boost" BIOS Feature - R&D Team Claims Performance Uplift

Large language models (LLMs) love large quantities of memory—so much so, in fact, that AI enthusiasts are turning to multi-GPU setups to make even more VRAM available for their AI apps. But since many current LLMs are extremely large, even this approach has its limits. At times, the GPU will decide to make use of CPU processing power for this data, and when it does, the performance of your CPU cache and DRAM comes into play. All this means that when it comes to the performance of AI applications, it's not just the GPU that matters, but the entire pathway that connects the GPU to the CPU to the I/O die to the DRAM modules. It stands to reason, then, that there are opportunities to boost AI performance by optimizing these elements.

That's exactly what we've found as we've spent time in our R&D labs with the latest AMD Ryzen CPUs. AMD just launched two new Ryzen CPUs with AMD 3D V-Cache Technology, the AMD Ryzen 9 9950X3D and Ryzen 9 9900X3D, pushing the series into new performance territory. After testing a wide range of optimizations in a variety of workloads, we uncovered a range of settings that offer tangible benefits for AI enthusiasts. Now, we're ready to share these optimizations with you through a new BIOS feature: AI Cache Boost. Available through an ASUS AMD 800 Series motherboard and our most recent firmware update, AI Cache Boost can accelerate performance up to 12.75% when you're working with massive LLMs.

AMD Recommends EPYC Processors for Everyday AI Server Tasks

Ask a typical IT professional today whether they're leveraging AI, and there's a good chance they'll say yes-after all, they have reputations to protect! Kidding aside, many will report that their teams may use Web-based tools like ChatGPT or even have internal chatbots that serve their employee base on their intranet, but for that not much AI is really being implemented at the infrastructure level. As it turns out, the true answer is a bit different. AI tools and techniques have embedded themselves firmly into standard enterprise workloads and are a more common, everyday phenomena than even many IT people may realize. Assembly line operations now include computer vision-powered inspections. Supply chains use AI for demand forecasting making business move faster and of course, AI note-taking and meeting summary is embedded on virtually all the variants of collaboration and meeting software.

Increasingly, critical enterprise software tools incorporate built-in recommendation systems, virtual agents or some other form of AI-enabled assistance. AI is truly becoming a pervasive, complementary tool for everyday business. At the same time, today's enterprises are navigating a hybrid landscape where traditional, mission-critical workloads coexist with innovative AI-driven tasks. This "mixed enterprise and AI" workload environment calls for infrastructure that can handle both types of processing seamlessly. Robust, general-purpose CPUs like the AMD EPYC processors are designed to be powerful and secure and flexible to address this need. They handle everyday tasks—running databases, web servers, ERP systems—and offer strong security features crucial for enterprise operations augmented with AI workloads. In essence, modern enterprise infrastructure is about creating a balanced ecosystem. AMD EPYC CPUs play a pivotal role in creating this balance, delivering high performance, efficiency, and security features that underpin both traditional enterprise workloads and advanced AI operations.

EA Details How ML & AI Bolstered Development of Latest Madden & College Football Titles

On June 1, 1988, the very first Madden video game was released to the world. Players needed to load up either a Commodore 64/Commodore 128, Apple II, or MS-DOS to launch the game. When they did, they were greeted with 8-bit animations of the NFL's most popular teams and found themselves controlling their favorite players to try and win themselves a Super Bowl. And at that time, it was amazing. Thirty-seven years later and EA SPORTS hasn't stopped advancing Madden and our American Football games.

Most recently, we launched EA SPORTS Madden NFL 25 and College Football 25, which are tentpoles of our beloved American Football Ecosystem. Yet our football games are no longer blocky pixels and four-directional controls. They're among the most realistic sports simulation titles on the planet. We even celebrated the recent Super Bowl weekend with these titles and our very own Madden Bowl, featuring championship games and incredible music all in the heart of New Orleans. This is in no small part due to the incredible teams and their mission to make our games better every single year. And technology plays a critical role in making this happen.

Arm Intros Cortex-A320 Armv9 CPU for IoT and Edge AI Applications

Arm's new Cortex-A320 represents its first ultra-efficient CPU using the advanced Armv9 architecture dedicated to the needs of IoT and AI applications. The processor achieves over 50% higher efficiency compared to the Cortex-A520 through several microarchitecture optimizations, together with a narrow fetch and decode data path, densely banked L1 caches, and a reduced-port integer register file. It also delivers 30% improved scalar performance compared with its predecessor, the Cortex-A35, via efficient branch predictors, pre-fetchers, and memory system improvements.

The Cortex-A320 is a single-issue, in-order CPU with a 32-bit instruction fetch and 8-stage pipeline. The processor offers scalability by supporting single-core to quad-core configurations. It features DSU-120T, a streamlined DynamIQ Shared Unit (DSU) which enables Cortex-A320-only clusters. Cortex-A320 supports up to 64 KB L1 caches and up to 512 KB L2, with a 256-bit AMBA5 AXI interface to external memory. The L2 cache and the L2 TLB can be shared between the Cortex-A320 CPUs. The vector processing unit, which implements the NEON and SVE2 SIMD (Single Instruction, Multiple Data) technologies, can be either private in a single core complex or shared between cores in dual-core or quad-core implementations.

SanDisk Develops HBM Killer: High-Bandwidth Flash (HBF) Allows 4 TB of VRAM for AI GPUs

During its first post-Western Digital spinoff investor day, SanDisk showed something it has been working on to tackle the AI sector. High-bandwidth flash (HBF) is a new memory architecture that combines 3D NAND flash storage with bandwidth capabilities comparable to high-bandwidth memory (HBM). The HBF design stacks 16 3D NAND BiCS8 dies using through-silicon vias, with a logic layer enabling parallel access to memory sub-arrays. This configuration achieves 8 to 16 times greater capacity per stack than current HBM implementations. A system using eight HBF stacks can provide 4 TB of VRAM to store large AI models like GPT-4 directly on GPU hardware. The architecture breaks from conventional NAND design by implementing independently accessible memory sub-arrays, moving beyond traditional multi-plane approaches. While HBF surpasses HBM's capacity specifications, it maintains higher latency than DRAM, limiting its application to specific workloads.

SanDisk has not disclosed its solution for NAND's inherent write endurance limitations, though using pSLC NAND makes it possible to balance durability and cost. The bandwidth of HBF is also unknown, as the company hasn't put out details yet. SanDisk Memory Technology Chief Alper Ilkbahar confirmed the technology targets read-intensive AI inference tasks rather than latency-sensitive applications. The company is developing HBF as an open standard, incorporating mechanical and electrical interfaces similar to HBM to simplify integration. Some challenges remain, including NAND's block-level addressing limitations and writing endurance constraints. While these factors make HBF unsuitable for gaming applications, the technology's high capacity and throughput characteristics align with AI model storage and inference requirements. SanDisk has announced plans for three generations of HBF development, indicating a long-term commitment to the technology.

OnLogic Reveals the Axial AX300 Edge Server

OnLogic, a leading provider of edge computing solutions, has launched the Axial AX300, a highly customizable and powerful edge server. The AX300 is engineered to help businesses of any size better leverage their on-site data and unlock the potential of AI by placing powerful computing capabilities on-site.

The Axial AX300 empowers organizations to seamlessly move computing resources closer to the data source, providing significant advantages in performance, latency, operational efficiency, and total cost of ownership over cloud-based data management. With its robust design, flexible configuration options, and advanced security features, the Axial AX300 is the ideal platform for a wide range of highly-impactful edge computing applications, including:
  • AI/ML inference and training: Leveraging the power of AI/ML at the edge for real-time insights, predictive maintenance, and improved decision-making.
  • Data analytics: Processing and analyzing data generated by IoT devices and sensors in real-time to improve operational efficiency.
  • Virtualization: Consolidating multiple workloads onto a single server, optimizing resource utilization and simplifying deployment and management.

Supermicro Empowers AI-driven Capabilities for Enterprise, Retail, and Edge Server Solutions

Supermicro, Inc. (SMCI), a Total IT Solution Provider for AI/ML, HPC, Cloud, Storage, and 5G/Edge, is showcasing the latest solutions for the retail industry in collaboration with NVIDIA at the National Retail Federation (NRF) annual show. As generative AI (GenAI) grows in capability and becomes more easily accessible, retailers are leveraging NVIDIA NIM microservices, part of the NVIDIA AI Enterprise software platform, for a broad spectrum of applications.

"Supermicro's innovative server, storage, and edge computing solutions improve retail operations, store security, and operational efficiency," said Charles Liang, president and CEO of Supermicro. "At NRF, Supermicro is excited to introduce retailers to AI's transformative potential and to revolutionize the customer's experience. Our systems here will help resolve day-to-day concerns and elevate the overall buying experience."

Graid Technology Unveils SupremeRAID(TM) AE: The AI Edition Designed for GPU-Driven AI Workloads

Graid Technology, the global leader in innovative storage performance solutions, is proud to announce the launch of SupremeRAID AE (AI Edition), the most resilient RAID data protection solution for enterprises leveraging GPU servers and AI workloads. Featuring GPUDirect Storage support and an intelligent data offload engine, SupremeRAID AE redefines how AI applications manage data, delivering unmatched performance, flexibility, and efficiency.

SupremeRAID AE's cutting-edge technology empowers organizations to accelerate AI workflows by reducing data access latency and increasing I/O efficiency, while protecting mission-critical datasets with enterprise-grade reliability. Its seamless scalability enables enterprises to meet future AI demands without overhauling existing infrastructure. Designed for a wide range of users, SupremeRAID AE benefits AI/ML teams by delivering faster training and inference for data-intensive models, enterprises with GPU servers by optimizing GPU performance for critical workloads, and data scientists and researchers by providing seamless access to vast datasets without bottlenecks. IT teams also gain resilient, scalable RAID storage that integrates effortlessly into existing systems without requiring additional hardware.

Emotiv Launches MW20 EEG Active Noise-Cancelling Earphones at CES

Emotiv, a global leader in EEG technology, announces its next-generation EEG Active Noise-Cancelling Earphones. These smart earphones enhance personal wellness by integrating advanced EEG technology to provide insights into cognitive performance and overall well-being—alongside exceptional sound quality.

Building on Emotiv's MN8 earphones launched in 2018 (the world's first EEG-enabled earphones), the MW20 marks the next evolution of wearable technology. Designed with precision, the product merges premium audio with neurotechnology to deliver actionable wellness insights and BCI capabilities in an intuitive form factor. Made of machined aluminium and sapphire glass, the earphones feature an ergonomic design engineered for optimal fit and precise acoustics.

SPEC Delivers Major SPECworkstation 4.0 Benchmark Update, Adds AI/ML Workloads

The Standard Performance Evaluation Corporation (SPEC), the trusted global leader in computing benchmarks, today announced the availability of the SPECworkstation 4.0 benchmark, a major update to SPEC's comprehensive tool designed to measure all key aspects of workstation performance. This significant upgrade from version 3.1 incorporates cutting-edge features to keep pace with the latest workstation hardware and the evolving demands of professional applications, including the increasing reliance on data analytics, AI and machine learning (ML).

The new SPECworkstation 4.0 benchmark provides a robust, real-world measure of CPU, graphics, accelerator, and disk performance, ensuring professionals have the data they need to make informed decisions about their hardware investments. The benchmark caters to the diverse needs of engineers, scientists, and developers who rely on workstation hardware for daily tasks. It includes real-world applications like Blender, Handbrake, LLVM and more, providing a comprehensive performance measure across seven different industry verticals, each focusing on specific use cases and subsystems critical to workstation users. SPECworkstation 4.0 benchmark marks a significant milestone for measuring workstation AI performance, providing an unbiased, real-world, application-driven tool for measuring how workstations handle AI/ML workloads.

Amazon AWS Announces General Availability of Trainium2 Instances, Reveals Details of Next Gen Trainium3 Chip

At AWS re:Invent, Amazon Web Services, Inc. (AWS), an Amazon.com, Inc. company, today announced the general availability of AWS Trainium2-powered Amazon Elastic Compute Cloud (Amazon EC2) instances, introduced new Trn2 UltraServers, enabling customers to train and deploy today's latest AI models as well as future large language models (LLM) and foundation models (FM) with exceptional levels of performance and cost efficiency, and unveiled next-generation Trainium3 chips.

"Trainium2 is purpose built to support the largest, most cutting-edge generative AI workloads, for both training and inference, and to deliver the best price performance on AWS," said David Brown, vice president of Compute and Networking at AWS. "With models approaching trillions of parameters, we understand customers also need a novel approach to train and run these massive workloads. New Trn2 UltraServers offer the fastest training and inference performance on AWS and help organizations of all sizes to train and deploy the world's largest models faster and at a lower cost."

US to Implement Semiconductor Restrictions on Chinese Equipment Makers

The Biden administration is set to announce new, targeted restrictions on China's semiconductor industry, focusing primarily on emerging chip manufacturing equipment companies rather than broad industry-wide limitations. According to Bloomberg, these new restrictions are supposed to take effect on Monday. The new rules will specifically target two manufacturing facilities owned by Semiconductor Manufacturing International Corp. (SMIC) and will add select companies to the US Entity List, restricting their access to American technology. However, most of Huawei's suppliers can continue their operations, suggesting a more mild strategy. The restrictions will focus on over 100 emerging Chinese semiconductor equipment manufacturers, many of which receive government funding. These companies are developing tools intended to replace those currently supplied by industry leaders such as ASML, Applied Materials, and Tokyo Electron.

The moderated approach comes after significant lobbying efforts from American semiconductor companies, who argued that stricter restrictions could disadvantage them against international competitors. Major firms like Applied Materials, KLA, and Lam Research voiced concerns about losing market share to companies in Japan and the Netherlands, where similar but less stringent export controls are in place. Notably, Japanese companies like SUMCO are already seeing the revenue impacts of Chinese independence. Lastly, the restrictions will have a limited effect on China's memory chip sector. The new measures will not directly affect ChangXin Memory Technologies (CXMT), a significant Chinese DRAM manufacturer capable of producing high-bandwidth memory for AI applications.

Interview with RISC-V International: High-Performance Chips, AI, Ecosystem Fragmentation, and The Future

RISC-V is an industry standard instruction set architecture (ISA) born in UC Berkeley. RISC-V is the fifth iteration in the lineage of historic RISC processors. The core value of the RISC-V ISA is the freedom of usage it offers. Any organization can leverage the ISA to design the best possible core for their specific needs, with no regional restrictions or licensing costs. It attracts a massive ecosystem of developers and companies building systems using the RISC-V ISA. To support these efforts and grow the ecosystem, the brains behind RISC decided to form RISC-V International—a non-profit foundation that governs the ISA and guides the ecosystem.

We had the privilege of talking with Andrea Gallo, Vice President of Technology at RISC-V International. Andrea oversees the technological advancement of RISC-V, collaborating with vendors and institutions to overcome challenges and expand its global presence. Andrea's career in technology spans several influential roles at major companies. Before joining RISC-V International, he worked at Linaro, where he pioneered Arm data center engineering initiatives, later overseeing diverse technological sectors as Vice President of Segment Groups, and ultimately managing crucial business development activities as executive Vice President. During his earlier tenure as a Fellow at ST-Ericsson, he focused on smartphone and application processor technology, and at STMicroelectronics he optimized hardware-software architectures and established international development teams.

Emteq Labs Unveils World's First Emotion-Sensing Eyewear

Emteq Labs, the market leader in emotion-recognition wearable technology, today announced the forthcoming introduction of Sense, the world's first emotion-sensing eyewear. Alongside the unveiling of Sense, the company is pleased to announce the appointment of Steen Strand, former head of the hardware division of Snap Inc., as its new Chief Executive Officer.

Over the past decade, Emteq Labs - led by renowned surgeon and facial musculature expert, Dr. Charles Nduka - has been at the forefront of engineering advanced technologies for sensing facial movements and emotions. This data has significant implications on health and well-being, but has never been available outside of a laboratory, healthcare facility, or other controlled setting. Now, Emteq Labs has developed Sense: a patented, AI-powered eyewear platform that provides lab-quality insights in real life and in real time. This includes comprehensive measurement and analysis of the wearer's facial expressions, dietary habits, mood, posture, attention levels, physical activity, and additional health-related metrics.

Western Digital Enterprise SSDs Certified to Support NVIDIA GB200 NVL72 System for Compute-Intensive AI Environments

Western Digital Corp. today announced that its PCIe Gen 5 DC SN861 E.1S enterprise-class NVMe SSDs have been certified to support the NVIDIA GB200 NVL72 rack-scale system.

The rapid rise of AI, ML, and large language models (LLMs) is creating a challenge for companies with two opposing forces. Data generation and consumption are accelerating, while organizations face pressure to quickly derive value from this data. Performance, scalability, and efficiency are essential for AI technology stacks as storage demands rise. Certified to be compatible with the GB200 NVL72 system, Western Digital's enterprise SSD addresses the growing needs of the AI market for high-speed accelerated computing combined with low latency to serve compute-intensive AI environments.

AMD Launches New Slim Form Factor Alveo UL3422 Accelerator Card

AMD today announced the AMD Alveo UL3422 accelerator card, the latest addition to its record-breaking family of accelerators1 designed for ultra-low latency electronic trading applications. AMD Alveo UL3422 provides trading firms, market makers and financial institutions with a slim form factor accelerator optimized for rack space, cost and designed for a fast path to deployment in a wide range of servers. The Alveo UL3422 accelerator is powered by an AMD Virtex UltraScale+ FPGA that features a novel transceiver architecture with hardened, optimized network connectivity cores, custom built for high-speed trading. It enables ultra-low latency trade execution, achieving less than 3ns FPGA transceiver latency and breakthrough 'tick-to-trade' performance not achievable with standard off-the-shelf FPGAs.

"Speed is the ultimate advantage in the increasingly competitive world of high-speed trading," said Yousef Khalilollahi, corporate vice president & general manager, Adaptive Computing Group, AMD. "The Alveo UL3422 card provides a lower-cost entry point while still delivering cutting-edge latency performance, making it accessible to firms of all sizes that want to stay competitive in the ultra-low latency trading space."

Lenovo Accelerates Business Transformation with New ThinkSystem Servers Engineered for Optimal AI and Powered by AMD

Today, Lenovo announced its industry-leading ThinkSystem infrastructure solutions powered by AMD EPYC 9005 Series processors, as well as AMD Instinct MI325X accelerators. Backed by 225 of AMD's world-record performance benchmarks, the Lenovo ThinkSystem servers deliver an unparalleled combination of AMD technology-based performance and efficiency to tackle today's most demanding edge-to-cloud workloads, including AI training, inferencing and modeling.

"Lenovo is helping organizations of all sizes and across various industries achieve AI-powered business transformations," said Vlad Rozanovich, Senior Vice President, Lenovo Infrastructure Solutions Group. "Not only do we deliver unmatched performance, we offer the right mix of solutions to change the economics of AI and give customers faster time-to-value and improved total value of ownership."

Supermicro Currently Shipping Over 100,000 GPUs Per Quarter in its Complete Rack Scale Liquid Cooled Servers

Supermicro, Inc., a Total IT Solution Provider for Cloud, AI/ML, Storage, and 5G/Edge, is announcing a complete liquid cooling solution that includes powerful Coolant Distribution Units (CDUs), cold plates, Coolant Distribution Manifolds (CDMs), cooling towers and end to end management software. This complete solution reduces ongoing power costs and Day 0 hardware acquisition and data center cooling infrastructure costs. The entire end-to-end data center scale liquid cooling solution is available directly from Supermicro.

"Supermicro continues to innovate, delivering full data center plug-and-play rack scale liquid cooling solutions," said Charles Liang, CEO and president of Supermicro. "Our complete liquid cooling solutions, including SuperCloud Composer for the entire life-cycle management of all components, are now cooling massive, state-of-the-art AI factories, reducing costs and improving performance. The combination of Supermicro deployment experience and delivering innovative technology is resulting in data center operators coming to Supermicro to meet their technical and financial goals for both the construction of greenfield sites and the modernization of existing data centers. Since Supermicro supplies all the components, the time to deployment and online are measured in weeks, not months."

Apple Introduces the iPhone 16 and iPhone 16 Plus

Apple today announced iPhone 16 and iPhone 16 Plus, built for Apple Intelligence, the easy-to-use personal intelligence system that understands personal context to deliver intelligence that is helpful and relevant while protecting user privacy. The iPhone 16 lineup also introduces Camera Control, which brings new ways to capture memories, and will help users quickly access visual intelligence to learn about objects or places around them faster than ever before. The powerful camera system features a 48MP Fusion camera with a 2x Telephoto option, giving users two cameras in one, while a new Ultra Wide camera enables macro photography. Next-generation Photographic Styles help users personalize their images, and spatial photo and video capture allows users to relive life's precious memories with remarkable depth on Apple Vision Pro. The new A18 chip delivers a huge leap in performance and efficiency, enabling demanding AAA games, as well as a big boost in battery life.

iPhone 16 and iPhone 16 Plus will be available in five bold colors: black, white, pink, teal, and ultramarine. Pre-orders begin Friday, September 13, with availability beginning Friday, September 20.

Apple Debuts the iPhone 16 Pro and iPhone 16 Pro Max - Now with a Camera Button

Apple today introduced iPhone 16 Pro and iPhone 16 Pro Max, featuring Apple Intelligence, larger display sizes, new creative capabilities with innovative pro camera features, stunning graphics for immersive gaming, and more—all powered by the A18 Pro chip. With Apple Intelligence, powerful Apple-built generative models come to iPhone in the easy-to-use personal intelligence system that understands personal context to deliver intelligence that is helpful and relevant while protecting user privacy. Camera Control unlocks a fast, intuitive way to tap into visual intelligence and easily interact with the advanced camera system. Featuring a new 48MP Fusion camera with a faster quad-pixel sensor that enables 4K120 FPS video recording in Dolby Vision, these new Pro models achieve the highest resolution and frame-rate combination ever available on iPhone. Additional advancements include a new 48MP Ultra Wide camera for higher-resolution photography, including macro; a 5x Telephoto camera on both Pro models; and studio-quality mics to record more true-to-life audio. The durable titanium design is strong yet lightweight, with larger display sizes, the thinnest borders on any Apple product, and a huge leap in battery life—with iPhone 16 Pro Max offering the best battery life on iPhone ever.

iPhone 16 Pro and iPhone 16 Pro Max will be available in four stunning finishes: black titanium, natural titanium, white titanium, and desert titanium. Pre-orders begin Friday, September 13, with availability beginning Friday, September 20.

Efficient Teams Up with GlobalFoundries to Develop Ultra-Low Power MRAM Processors

Today, Efficient announced a strategic partnership with GlobalFoundries (GF) to bring to market a new high-performance computer processor that is up to 166x more energy-efficient than industry-standard embedded CPUs. Efficient is already working with select customers for early access and customer sampling by summer 2025. The official introduction of the category-creating processor will mark a new era in computing, free from restrictive energy limitations.

The partnership will combine Efficient's novel architecture and technology with GF's U.S.-based manufacturing, global reach and market expertise to enable a quantum leap in edge device capabilities and battery lifetime. Through this partnership, Efficient will provide the computing power to smarter, longer-lasting devices and applications across the Internet of Things, wearable and implantable health devices, space systems, and security and defense.

NVIDIA Blackwell Sets New Standard for Generative AI in MLPerf Inference Benchmark

As enterprises race to adopt generative AI and bring new services to market, the demands on data center infrastructure have never been greater. Training large language models is one challenge, but delivering LLM-powered real-time services is another. In the latest round of MLPerf industry benchmarks, Inference v4.1, NVIDIA platforms delivered leading performance across all data center tests. The first-ever submission of the upcoming NVIDIA Blackwell platform revealed up to 4x more performance than the NVIDIA H100 Tensor Core GPU on MLPerf's biggest LLM workload, Llama 2 70B, thanks to its use of a second-generation Transformer Engine and FP4 Tensor Cores.

The NVIDIA H200 Tensor Core GPU delivered outstanding results on every benchmark in the data center category - including the latest addition to the benchmark, the Mixtral 8x7B mixture of experts (MoE) LLM, which features a total of 46.7 billion parameters, with 12.9 billion parameters active per token. MoE models have gained popularity as a way to bring more versatility to LLM deployments, as they're capable of answering a wide variety of questions and performing more diverse tasks in a single deployment. They're also more efficient since they only activate a few experts per inference - meaning they deliver results much faster than dense models of a similar size.

Cerebras Launches the World's Fastest AI Inference

Today, Cerebras Systems, the pioneer in high performance AI compute, announced Cerebras Inference, the fastest AI inference solution in the world. Delivering 1,800 tokens per second for Llama3.1 8B and 450 tokens per second for Llama3.1 70B, Cerebras Inference is 20 times faster than NVIDIA GPU-based solutions in hyperscale clouds. Starting at just 10c per million tokens, Cerebras Inference is priced at a fraction of GPU solutions, providing 100x higher price-performance for AI workloads.

Unlike alternative approaches that compromise accuracy for performance, Cerebras offers the fastest performance while maintaining state of the art accuracy by staying in the 16-bit domain for the entire inference run. Cerebras Inference is priced at a fraction of GPU-based competitors, with pay-as-you-go pricing of 10 cents per million tokens for Llama 3.1 8B and 60 cents per million tokens for Llama 3.1 70B.
Return to Keyword Browsing
Mar 25th, 2025 07:01 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts