News Posts matching #Generative AI

Return to Keyword Browsing

EdgeCortix SAKURA-II Enables GenAI on Raspberry Pi 5 and Arm Systems

EdgeCortix Inc., a leading fabless semiconductor company specializing in energy-efficient Artificial Intelligence (AI) processing at the edge, today announced that its industry leading AI accelerator, SAKURA-II M.2 Module is now available with Arm-based platforms, including Raspberry Pi 5 and AETINA's Rockchip (RK3588) platform, delivering unprecedented performance and efficiency for edge AI computing applications.

This powerful integration marks a major leap in democratizing real-time Generative AI capabilities at the edge. Designed with a focus on low power consumption and high AI throughput, the EdgeCortix SAKURA-II M.2 module enables developers to run advanced deep learning models directly on compact, affordable platforms like the Raspberry Pi 5—without relying on cloud infrastructure.

AnythingLLM App Best Experienced on NVIDIA RTX AI PCs

Large language models (LLMs), trained on datasets with billions of tokens, can generate high-quality content. They're the backbone for many of the most popular AI applications, including chatbots, assistants, code generators and much more. One of today's most accessible ways to work with LLMs is with AnythingLLM, a desktop app built for enthusiasts who want an all-in-one, privacy-focused AI assistant directly on their PC. With new support for NVIDIA NIM microservices on NVIDIA GeForce RTX and NVIDIA RTX PRO GPUs, AnythingLLM users can now get even faster performance for more responsive local AI workflows.

What Is AnythingLLM?
AnythingLLM is an all-in-one AI application that lets users run local LLMs, retrieval-augmented generation (RAG) systems and agentic tools. It acts as a bridge between a user's preferred LLMs and their data, and enables access to tools (called skills), making it easier and more efficient to use LLMs for specific tasks.

NVIDIA & Microsoft Accelerate Agentic AI Innovation - From Cloud to PC

Agentic AI is redefining scientific discovery and unlocking research breakthroughs and innovations across industries. Through deepened collaboration, NVIDIA and Microsoft are delivering advancements that accelerate agentic AI-powered applications from the cloud to the PC. At Microsoft Build, Microsoft unveiled Microsoft Discovery, an extensible platform built to empower researchers to transform the entire discovery process with agentic AI. This will help research and development departments across various industries accelerate the time to market for new products, as well as speed and expand the end-to-end discovery process for all scientists.

Microsoft Discovery will integrate the NVIDIA ALCHEMI NIM microservice, which optimizes AI inference for chemical simulations, to accelerate materials science research with property prediction and candidate recommendation. The platform will also integrate NVIDIA BioNeMo NIM microservices, tapping into pretrained AI workflows to speed up AI model development for drug discovery. These integrations equip researchers with accelerated performance for faster scientific discoveries. In testing, researchers at Microsoft used Microsoft Discovery to detect a novel coolant prototype with promising properties for immersion cooling in data centers in under 200 hours, rather than months or years with traditional methods.

NVIDIA and Microsoft Advance Development on RTX AI PCs

Generative AI is transforming PC software into breakthrough experiences - from digital humans to writing assistants, intelligent agents and creative tools. NVIDIA RTX AI PCs are powering this transformation with technology that makes it simpler to get started experimenting with generative AI and unlock greater performance on Windows 11. NVIDIA TensorRT has been reimagined for RTX AI PCs, combining industry-leading TensorRT performance with just-in-time, on-device engine building and an 8x smaller package size for seamless AI deployment to more than 100 million RTX AI PCs.

Announced at Microsoft Build, TensorRT for RTX is natively supported by Windows ML - a new inference stack that provides app developers with both broad hardware compatibility and state-of-the-art performance. For developers looking for AI features ready to integrate, NVIDIA software development kits (SDKs) offer a wide array of options, from NVIDIA DLSS to multimedia enhancements like NVIDIA RTX Video. This month, top software applications from Autodesk, Bilibili, Chaos, LM Studio and Topaz Labs are releasing updates to unlock RTX AI features and acceleration.

IBM & Oracle Expand Partnership - Aim to Advance Agentic AI and Hybrid Cloud

IBM is working with Oracle to bring the power of watsonx, IBM's flagship portfolio of AI products, to Oracle Cloud Infrastructure (OCI). Leveraging OCI's native AI services, the latest milestone in IBM's technology partnership with Oracle is designed to fuel a new era of multi-agentic, AI-driven productivity and efficiency across the enterprise. Organizations today are deploying AI throughout their operations, looking to take advantage of the extraordinary advancements in generative AI models, tools, and agents. AI agents that can provide a single, easy-to-use interface to complete tasks are emerging as key tools to help simplify the deployment and use of AI across enterprise operations and functions. "AI delivers the most impactful value when it works seamlessly across an entire business," said Greg Pavlik, executive vice president, AI and Data Management Services, Oracle Cloud Infrastructure. "IBM and Oracle have been collaborating to drive customer success for decades, and our expanded partnership will provide customers new ways to help transform their businesses with AI."

Watsonx Orchestrate to support multi-agent workflows
To give customers a consistent way to build and manage agents across multi-agent, multi-system business processes, spanning both Oracle and non-Oracle applications and data sources, IBM is making its watsonx Orchestrate AI agent offerings available on OCI in July. This multi-agent approach using wastonx Orchestrate is designed to work with the expansive AI agent offerings embedded within the Oracle AI Agent Studio for Fusion Applications, as well as OCI Generative AI Agents, and OCI's other AI services. It extends the ecosystem around Oracle Fusion Applications to enable further functionality across third-party and custom applications and data sources. The first use cases being addressed are in human resources. The watsonx Orchestrate agents will perform AI inferencing on OCI, which many customers use to host their data, AI, and other applications. IBM agents run in watsonx Orchestrate on Red Hat OpenShift on OCI, including in public, sovereign, government, and Oracle Alloy regions, to enable customers to address specific regulatory and privacy requirements. The agents can also be hosted on-premises or in multicloud environments for true hybrid cloud capabilities.

NVIDIA AI Blueprint for 3D-Guided Generative AI Allows Controlled Composition

AI-powered image generation has progressed at a remarkable pace—from early examples of models creating images of humans with too many fingers to now producing strikingly photorealistic visuals. Even with such leaps, one challenge remains: achieving creative control. Creating scenes using text has gotten easier, no longer requiring complex descriptions—and models have improved alignment to prompts. But describing finer details like composition, camera angles and object placement with text alone is hard, and making adjustments is even more complex.

Advanced workflows using ControlNets—tools that enhance image generation by providing greater control over the output—offer solutions, but their setup complexity limits broader accessibility. To help overcome these challenges and fast-track access to advanced AI capabilities, NVIDIA at the CES trade show earlier this year announced the NVIDIA AI Blueprint for 3D-guided generative AI for RTX PCs. This sample workflow includes everything needed to start generating images with full composition control. Users can download the new Blueprint today.

NVIDIA's Project G-Assist Plug-In Builder Explained: Anyone Can Customize AI on GeForce RTX AI PCs

AI is rapidly reshaping what's possible on a PC—whether for real-time image generation or voice-controlled workflows. As AI capabilities grow, so does their complexity. Tapping into the power of AI can entail navigating a maze of system settings, software and hardware configurations. Enabling users to explore how on-device AI can simplify and enhance the PC experience, Project G-Assist—an AI assistant that helps tune, control and optimize GeForce RTX systems—is now available as an experimental feature in the NVIDIA app. Developers can try out AI-powered voice and text commands for tasks like monitoring performance, adjusting settings and interacting with supporting peripherals. Users can even summon other AIs powered by GeForce RTX AI PCs.

And it doesn't stop there. For those looking to expand Project G-Assist capabilities in creative ways, the AI supports custom plug-ins. With the new ChatGPT-based G-Assist Plug-In Builder, developers and enthusiasts can create and customize G-Assist's functionality, adding new commands, connecting external tools and building AI workflows tailored to specific needs. With the plug-in builder, users can generate properly formatted code with AI, then integrate the code into G-Assist—enabling quick, AI-assisted functionality that responds to text and voice commands.

Qualcomm Announces Acquisition of VinAI Division, Aims to Expand GenAI Capabilities

Qualcomm today announced the acquisition of MovianAI Artificial Intelligence (AI) Application and Research JSC (MovianAI), the former generative AI division of VinAI Application and Research JSC (VinAI) and a part of the Vingroup ecosystem. As a leading AI research company, VinAI is renowned for its expertise in generative AI, machine learning, computer vision, and natural language processing. Combining VinAI's advanced generative AI research and development (R&D) capabilities with Qualcomm's decades of extensive R&D will expand its ability to drive extraordinary inventions.

For more than 20 years, Qualcomm has been working closely with the Vietnamese technology ecosystem to create and deliver innovative solutions. Qualcomm's innovations in the areas of 5G, AI, IoT and automotive have helped to fuel the extraordinary growth and success of Vietnam's information and communication technology (ICT) industry and assisted the entry of Vietnamese companies into the global marketplace.

NVIDIA NIM Microservices Now Available to Streamline Agentic Workflows on RTX AI PCs and Workstations

Generative AI is unlocking new capabilities for PCs and workstations, including game assistants, enhanced content-creation and productivity tools and more. NVIDIA NIM microservices, available now, and AI Blueprints, in the coming weeks, accelerate AI development and improve its accessibility. Announced at the CES trade show in January, NVIDIA NIM provides prepackaged, state-of-the-art AI models optimized for the NVIDIA RTX platform, including the NVIDIA GeForce RTX 50 Series and, now, the new NVIDIA Blackwell RTX PRO GPUs. The microservices are easy to download and run. They span the top modalities for PC development and are compatible with top ecosystem applications and tools.

The experimental System Assistant feature of Project G-Assist was also released today. Project G-Assist showcases how AI assistants can enhance apps and games. The System Assistant allows users to run real-time diagnostics, get recommendations on performance optimizations, or control system software and peripherals - all via simple voice or text commands. Developers and enthusiasts can extend its capabilities with a simple plug-in architecture and new plug-in builder.

Qualcomm and IBM Scale Enterprise-grade Generative AI from Edge to Cloud

Ahead of Mobile World Congress 2025, Qualcomm Technologies, Inc. and IBM (NYSE: IBM) announced an expanded collaboration to drive enterprise-grade generative artificial intelligence (AI) solutions across edge and cloud devices designed to enable increased immediacy, privacy, reliability, personalization, and reduced cost and energy consumption. Through this collaboration, the companies plan to integrate watsonx.governance for generative AI solutions powered by Qualcomm Technologies' platforms, and enable support for IBM's Granite models through the Qualcomm AI Inference Suite and Qualcomm AI Hub.

"At Qualcomm Technologies, we are excited to join forces with IBM to deliver cutting-edge, enterprise-grade generative AI solutions for devices across the edge and cloud," said Durga Malladi, senior vice president and general manager, technology planning and edge solutions, Qualcomm Technologies, Inc. "This collaboration enables businesses to deploy AI solutions that are not only fast and personalized but also come with robust governance, monitoring, and decision-making capabilities, with the ability to enhance the overall reliability of AI from edge to cloud."

NVIDIA & Partners Will Discuss Supercharging of AI Development at GTC 2025

Generative AI is redefining computing, unlocking new ways to build, train and optimize AI models on PCs and workstations. From content creation and large and small language models to software development, AI-powered PCs and workstations are transforming workflows and enhancing productivity. At GTC 2025, running March 17-21 in the San Jose Convention Center, experts from across the AI ecosystem will share insights on deploying AI locally, optimizing models and harnessing cutting-edge hardware and software to enhance AI workloads—highlighting key advancements in RTX AI PCs and workstations.

Develop and Deploy on RTX
RTX GPUs are built with specialized AI hardware called Tensor Cores that provide the compute performance needed to run the latest and most demanding AI models. These high-performance GPUs can help build digital humans, chatbots, AI-generated podcasts and more. With more than 100 million GeForce RTX and NVIDIA RTX GPUs users, developers have a large audience to target when new AI apps and features are deployed. In the session "Build Digital Humans, Chatbots, and AI-Generated Podcasts for RTX PCs and Workstations," Annamalai Chockalingam, senior product manager at NVIDIA, will showcase the end-to-end suite of tools developers can use to streamline development and deploy incredibly fast AI-enabled applications.

NVIDIA Recommends GeForce RTX 5070 Ti GPU to AI Content Creators

The NVIDIA GeForce RTX 5070 Ti graphics cards—built on the NVIDIA Blackwell architecture—are out now, ready to power generative AI content creation and accelerate creative performance. GeForce RTX 5070 Ti GPUs feature fifth-generation Tensor Cores with support for FP4, doubling performance and reducing VRAM requirements to run generative AI models.

In addition, the GPU comes equipped with two ninth-generation encoders and a sixth-generation decoder that add support for the 4:2:2 pro-grade color format and increase encoding quality for HEVC and AV1. This combo accelerates video editing workflows, reducing export times by 8x compared with single encoder GPUs without 4:2:2 support like the GeForce RTX 3090. The GeForce RTX 5070 Ti GPU also includes 16G B of fast GDDR7 memory and 896 GB/sec of total memory bandwidth—a 78% increase over the GeForce RTX 4070 Ti GPU.

Xbox Introduces Muse: a Generative AI Model for Gameplay

In nearly every corner of our lives, the buzz about AI is impossible to ignore. It's destined to revolutionize how we work, learn, and play. For those of us immersed in the world of gaming—whether as players or creators—the question isn't just how AI will change the game, but how it will ignite new possibilities.

At Xbox, we're all about using AI to make things better (and more fun!) for players and game creators. We want to bring more games to more people around the world and always stay true to the creative vision and artistry of game developers. We believe generative AI can boost this creativity and open up new possibilities. We're excited to announce a generative AI breakthrough, published today in the journal Nature and announced by Microsoft Research, that shows this potential to open up new possibilities—including the opportunity to make older games accessible to future generations of players across new devices and in new ways.

NVIDIA's Latest "State of AI in Telecommunications" Survey Highlights Increased Integration

The telecom industry's efforts to drive efficiencies with AI are beginning to show fruit. An increasing focus on deploying AI into radio access networks (RANs) was among the key findings of NVIDIA's third annual "State of AI in Telecommunications" survey, as more than a third of respondents indicated they're investing or planning to invest in AI-RAN.

The survey polled more than 450 telecommunications professionals worldwide, revealing continued momentum for AI adoption—including growth in generative AI use cases—and how the technology is helping optimize customer experiences and increase employee productivity. Of the telecommunications professionals surveyed, almost all stated that their company is actively deploying or assessing AI projects.

IBM & Lenovo Expand Strategic AI Technology Partnership in Saudi Arabia

IBM and Lenovo today announced at LEAP 2025 a planned expansion of their strategic technology partnership designed to help scale the impact of generative AI for clients in the Kingdom of Saudi Arabia. IDC expects annual worldwide spending on AI-centric systems to surpass $300 billion by 2026, with many leading organizations in Saudi Arabia exploring and investing in generative AI use cases as they prepare for the emergence of an "AI everywhere" world.

Building upon their 20-year partnership, IBM and Lenovo will collaborate to deliver AI solutions comprised of technology from the IBM watsonx portfolio of AI products, including the Saudi Data and Artificial Intelligence Authority (SDAIA) open-source Arabic Large Language Model (ALLaM), and Lenovo infrastructure. These solutions are expected to help government and business clients in the Kingdom to accelerate their use of AI to improve public services and make data-driven decisions in areas such as fraud detection, public safety, customer service, code modernization, and IT operations.

KIOXIA Releases AiSAQ as Open-Source Software to Reduce DRAM Needs in AI Systems

Kioxia Corporation, a world leader in memory solutions, today announced the open-source release of its new All-in-Storage ANNS with Product Quantization (AiSAQ) technology. A novel "approximate nearest neighbor" search (ANNS) algorithm optimized for SSDs, KIOXIA AiSAQ software delivers scalable performance for retrieval-augmented generation (RAG) without placing index data in DRAM - and instead searching directly on SSDs.

Generative AI systems demand significant compute, memory and storage resources. While they have the potential to drive transformative breakthroughs across various industries, their deployment often comes with high costs. RAG is a critical phase of AI that refines large language models (LLMs) with data specific to the company or application.

ADLINK Launches the DLAP Supreme Series

ADLINK Technology Inc., a global leader in edge computing, unveiled its new "DLAP Supreme Series", an edge generative AI platform. By integrating Phison's innovative aiDAPTIV+ AI solution, this series overcomes memory limitations in edge generative AI applications, significantly enhancing AI computing capabilities on edge devices. Without increasing high hardware costs, the DLAP Supreme series achieves notable AI performance improvements, helping enterprises reduce the cost barriers of AI deployment and accelerating the adoption of generative AI across various industries, especially in edge computing.

Lower AI Computing Costs and Significantly Improved Performance
As generative AI continues to penetrate various industries, many edge devices encounter performance bottlenecks due to insufficient DRAM capacity when executing large language models, affecting model operations and even causing issues such as inadequate token length. The DLAP Supreme series, leveraging aiDAPTIV+ technology, effectively overcomes these limitations and significantly enhances computing performance. Additionally, it supports edge devices in conducting generative language model training, enabling them with AI model training capabilities and improving their autonomous learning and adaptability.

MAINGEAR Launches Desktops and Laptops with NVIDIA GeForce RTX 50 Series GPUs Based on Blackwell Architecture

MAINGEAR, the leader in premium-quality, high-performance gaming PCs, today announced its lineup of desktops and laptops equipped with NVIDIA GeForce RTX 50 Series GPUs. Powered by the NVIDIA Blackwell architecture, GeForce RTX 50 Series GPUs bring groundbreaking capabilities to gamers and creators. Equipped with a massive level of AI horsepower, the GeForce RTX 50 Series enables new experiences and next-level graphics fidelity. Users can multiply performance with NVIDIA DLSS 4, generate images at unprecedented speed, and unleash creativity with the NVIDIA Studio platform.

Plus, NVIDIA NIM microservices - state-of-the-art AI models that let enthusiasts and developers build AI assistants, agents, and workflows - are available with peak performance on NIM-ready systems.

NVIDIA NIM Microservices and AI Blueprints Usher in New Era of Local AI

Over the past year, generative AI has transformed the way people live, work and play, enhancing everything from writing and content creation to gaming, learning and productivity. PC enthusiasts and developers are leading the charge in pushing the boundaries of this groundbreaking technology. Countless times, industry-defining technological breakthroughs have been invented in one place—a garage. This week marks the start of the RTX AI Garage series, which will offer routine content for developers and enthusiasts looking to learn more about NVIDIA NIM microservices and AI Blueprints, and how to build AI agents, creative workflow, digital human, productivity apps and more on AI PCs. Welcome to the RTX AI Garage.

This first installment spotlights announcements made earlier this week at CES, including new AI foundation models available on NVIDIA RTX AI PCs that take digital humans, content creation, productivity and development to the next level. These models—offered as NVIDIA NIM microservices—are powered by new GeForce RTX 50 Series GPUs. Built on the NVIDIA Blackwell architecture, RTX 50 Series GPUs deliver up to 3,352 trillion AI operations per second of performance, 32 GB of VRAM and feature FP4 compute, doubling AI inference performance and enabling generative AI to run locally with a smaller memory footprint.

CyberLink Brings On-Device Generative AI and Creative Editing to Next-Gen AI PCs at CES 2025

CyberLink Corp., a leading provider of digital creative editing software and artificial intelligence (AI) solutions, is showcasing the cutting-edge AI capabilities and NPU (Neural Processing Unit) optimizations of their Generative AI digital marketing design software, Promeo, and award-winning video and photo editing software, PowerDirector and PhotoDirector, this week during CES 2025 in Las Vegas.

Cognizant of the expanding adoption from PC makers of the latest Intel LunarLake platform and capabilities for AI, CyberLink's close partnership with Intel is ensuring PC makers releasing LunarLake enabled hardware, from laptops to mini-PCs, will be able to take advantage of CyberLink's creative editing software's AI functionalities.

NVIDIA Unveils New Jetson Orin Nano Super Developer Kit

NVIDIA is taking the wraps off a new compact generative AI supercomputer, offering increased performance at a lower price with a software upgrade. The new NVIDIA Jetson Orin Nano Super Developer Kit, which fits in the palm of a hand, provides everyone from commercial AI developers to hobbyists and students, gains in generative AI capabilities and performance. And the price is now $249, down from $499.

Available today, it delivers as much as a 1.7x leap in generative AI inference performance, a 70% increase in performance to 67 INT8 TOPS, and a 50% increase in memory bandwidth to 102 GB/s compared with its predecessor. Whether creating LLM chatbots based on retrieval-augmented generation, building a visual AI agent, or deploying AI-based robots, the Jetson Orin Nano Super is an ideal solution to fetch.

Google Genie 2 Promises AI-Generated Interactive Worlds With Realistic Physics and AI-Powered NPCs

For better or worse, generative AI has been a disruptive force in many industries, although its reception in video games has been lukewarm at best, with attempts at integrating AI-powered NPCs into games failing to impress most gamers. Now, Google's DeepMind AI has a new model called Genie 2, which can supposedly be used to generate "action-controllable, playable, 3D environments for training and evaluating embodied agents." All the environments generated by Genie 2 can supposedly be interacted with, whether by a human piloting a character with a mouse and keyboard or an AI-controlled NPC, although it's unclear what the behind-the-scenes code and optimizations look like, both aspects of which will be key to any real-world applications of the tech. Google says worlds created by Genie 2 can simulate consequences of actions in addition to the world itself, all in real-time. This means that when a player interacts with a world generated by Genie 2, the AI will respond with what its model suggests is the result of that action (like stepping on a leaf resulting in the destruction of said leaf). This extends to things like lighting, reflections, and physics, with Google showing off some impressively accurate water, volumetric effects, and accurate gravity.

In a demo video, Google showed a number of different AI-generated worlds, each with their own interactive characters, from a spaceship interior being explored by an astronaut to a robot taking a stroll in a futuristic cyberpunk urban environment, and even a sailboat sailing over water and a cowboy riding through some grassy plains on horseback. What's perhaps most interesting about Genie 2's generated environments is that Genie has apparently given each world a different perspective and camera control scheme. Some of the examples shown are first-person, while others are third-person with the camera either locked to the character or free-floating around the character. Of course, being generative AI, there is some weirdness, and Google clearly chose its demo clips carefully to avoid graphical anomalies from taking center stage. What's more, at least a few clips seem to very strongly resemble worlds from popular video games, Assassin's Creed, Red Dead Redemption, Sony's Horizon franchise, and what appears to be a mix of various sci-fi games, including Warframe, Destiny, Mass Effect, and Subnautica. This isn't surprising, since the worlds Google used to showcase the AI are all generated with an image and text prompt as inputs, and, given what Google says it used as training data used, it seems likely that gaming clips from those games made it into the AI model's training data.

Aetina Debuts at SC24 With NVIDIA MGX Server for Enterprise Edge AI

Aetina, a subsidiary of the Innodisk Group and an expert in edge AI solutions, is pleased to announce its debut at Supercomputing (SC24) in Atlanta, Georgia, showcasing the innovative SuperEdge NVIDIA MGX short-depth edge AI server, AEX-2UA1. By integrating an enterprise-class on-premises large language model (LLM) with the advanced retrieval-augmented generation (RAG) technique, Aetina NVIDIA MGX short-depth server demonstrates exceptional enterprise edge AI performance, setting a new benchmark in Edge AI innovation. The server is powered by the latest Intel Xeon 6 processor and dual high-end double-width NVIDIA GPUs, delivering ultimate AI computing power in a compact 2U form factor, accelerating Gen AI at the edge.

The SuperEdge NVIDIA MGX server expands Aetina's product portfolio from specialized edge devices to comprehensive AI server solutions, propelling a key milestone in Innodisk Group's AI roadmap, from sensors and storage to AI software, computing platforms, and now AI edge servers.

ASUS Unveils ProArt Displays, Laptops and PC Solutions at IBC 2024

ASUS today announced its participation in the upcoming IBC 2024, showcasing the theme A Glimpse into Tomorrow's Tech.Visitors to the ASUS booth (Hall 2 Booth #A29 RAI Amsterdam) will be able to enjoy the ProArt Masters' Talks featuring industry experts from renowned companies Adobe, NVIDIA and Scan Computers, as well as professional filmmaker Bas Goossens, professional senior trainer Leon Barnard, and co-founder and CEO of Redshark Media, Matt Gregory.

As well, through the full run of IBC from September 13-16, 2024, ASUS will highlight a range of cutting-edge technology ideal for professionals, including ProArt Display PA32KCX, the world's first 8K Mini LED professional monitor; ProArt Display OLED PA32UCDM, which brings 4K QD-OLED to creators; ProArt Display 5K PA27JCV, featuring a stunning 5120 x 2880 resolution for unparalleled clarity; and ProArt Display PA32UCE and PA27UCGE, the latest 4K monitors with built-in calibration. The latest ASUS AI-powered laptops and workstations will also be on show.

NVIDIA Blackwell Sets New Standard for Generative AI in MLPerf Inference Benchmark

As enterprises race to adopt generative AI and bring new services to market, the demands on data center infrastructure have never been greater. Training large language models is one challenge, but delivering LLM-powered real-time services is another. In the latest round of MLPerf industry benchmarks, Inference v4.1, NVIDIA platforms delivered leading performance across all data center tests. The first-ever submission of the upcoming NVIDIA Blackwell platform revealed up to 4x more performance than the NVIDIA H100 Tensor Core GPU on MLPerf's biggest LLM workload, Llama 2 70B, thanks to its use of a second-generation Transformer Engine and FP4 Tensor Cores.

The NVIDIA H200 Tensor Core GPU delivered outstanding results on every benchmark in the data center category - including the latest addition to the benchmark, the Mixtral 8x7B mixture of experts (MoE) LLM, which features a total of 46.7 billion parameters, with 12.9 billion parameters active per token. MoE models have gained popularity as a way to bring more versatility to LLM deployments, as they're capable of answering a wide variety of questions and performing more diverse tasks in a single deployment. They're also more efficient since they only activate a few experts per inference - meaning they deliver results much faster than dense models of a similar size.
Return to Keyword Browsing
Jul 11th, 2025 23:55 CDT change timezone

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts