News Posts matching #inference

Return to Keyword Browsing

IBM Power11 Raises the Bar for Enterprise IT

Today, IBM revealed IBM Power11, the next generation of IBM Power servers. Redesigned with innovations across its processor, hardware architecture, and virtualization software stack, Power11 is designed to deliver the availability, resiliency, performance, and scalability enterprises demand, for seamless hybrid deployment on-premises or in IBM Cloud.

Organizations across industries have long run their most mission-critical, data-intensive workloads on IBM Power, most notably those within the banking, healthcare, retail, and government spaces. Now, enterprises face an onslaught of new technologies and solutions as they transition into the age of AI. IDC found that one billion new logical applications are expected by 2028, and the proliferation of these systems poses new complexities for companies. IBM built Power11 to deliver simplified, always-on operations with hybrid cloud flexibility for enterprises to maintain competitiveness in the AI era.

Intel "Diamond Rapids" Xeon CPU to Feature up to 192 P-Cores and 500 W TDP

Intel's next-generation "Oak Stream" platform is preparing to accommodate the upcoming "Diamond Rapids" Xeon CPU generation, and we are receiving more interesting details about the top-end configurations Intel will offer. According to the HEPiX TechWatch working group, the Diamond Rapids Intel Xeon 7 will feature up to 192 P-cores in the top-end SKU, split across four 48-core tiles. Intel has dedicated two primary SKU separators, where some models use eight-channel DDR5 memory, and the top SKUs will arrive with 16-channel DDR5 memory. Using MRDIMM Gen 2 for memory will enable Intel to push transfer rates to 12,800 MT/s per DIMM, providing massive bandwidth across 16 channels and keeping the "Panther Cove" cores busy with sufficient data. Intel planned the SoC to reach up to 500 W in a single socket.

As one of the first mass-produced 18A node products, Diamond Rapids will be the first to support Intel's APX, also featuring numerous improvements to the efficiency of AMX. Intel also plans to embed native support for more floating-point number formats, such as NVIDIA's TF32, and lower-precision FP8. As most of the world's inference is good enough to run on a CPU, Intel aims to accelerate basic inference operations for smaller models, enabling power users to run advanced workloads on CPUs alone. With a 1S, 2S, and 4S LGA 9324 configuration, Diamond Rapids will offer 768 cores in a single server rack, with a power usage of only 2000 W. Supporting external accelerators will be provided via the PCIe Gen 6 connector. Scheduled for arrival in 2026, Intel could time the launch to coincide with its upcoming "Jaguar Shores" AI accelerators, making a perfect pair for a complete AI system.

Micron HBM Designed into Leading AMD AI Platform

Micron Technology, Inc. today announced the integration of its HBM3E 36 GB 12-high offering into the upcoming AMD Instinct MI350 Series solutions. This collaboration highlights the critical role of power efficiency and performance in training large AI models, delivering high-throughput inference and handling complex HPC workloads such as data processing and computational modeling. Furthermore, it represents another significant milestone in HBM industry leadership for Micron, showcasing its robust execution and the value of its strong customer relationships.

Micron HBM3E 36 GB 12-high solution brings industry-leading memory technology to AMD Instinct MI350 Series GPU platforms, providing outstanding bandwidth and lower power consumption. The AMD Instinct MI350 Series GPU platforms, built on AMD advanced CDNA 4 architecture, integrate 288 GB of high-bandwidth HBM3E memory capacity, delivering up to 8 TB/s bandwidth for exceptional throughput. This immense memory capacity allows Instinct MI350 series GPUs to efficiently support AI models with up to 520 billion parameters—on a single GPU. In a full platform configuration, Instinct MI350 Series GPUs offers up to 2.3 TB of HBM3E memory and achieves peak theoretical performance of up to 161 PFLOPS at FP4 precision, with leadership energy efficiency and scalability for high-density AI workloads. This tightly integrated architecture, combined with Micron's power-efficient HBM3E, enables exceptional throughput for large language model training, inference and scientific simulation tasks—empowering data centers to scale seamlessly while maximizing compute performance per watt. This joint effort between Micron and AMD has enabled faster time to market for AI solutions.

Compal Optimizes AI Workloads with AMD Instinct MI355X at AMD Advancing AI 2025 and International Supercomputing Conference 2025

As AI computing accelerates toward higher density and greater energy efficiency, Compal Electronics (Compal; Stock Ticker: 2324.TW), a global leader in IT and computing solutions, unveiled its latest high-performance server platform: SG720-2A/ OG720-2A at both AMD Advancing AI 2025 in the U.S. and the International Supercomputing Conference (ISC) 2025 in Europe. It features the AMD Instinct MI355X GPU architecture and offers both single-phase and two-phase liquid cooling configurations, showcasing Compal's leadership in thermal innovation and system integration. Tailored for next-generation generative AI and large language model (LLM) training, the SG720-2A/OG720-2A delivers exceptional flexibility and scalability for modern data center operations, drawing significant attention across the industry.

With generative AI and LLMs driving increasingly intensive compute demands, enterprises are placing greater emphasis on infrastructure that offers both performance and adaptability. The SG720-2A/OG720-2A emerges as a robust solution, combining high-density GPU integration and flexible liquid cooling options, positioning itself as an ideal platform for next-generation AI training and inference workloads.

AMD Unveils Vision for an Open AI Ecosystem, Detailing New Silicon, Software and Systems at Advancing AI 2025

AMD delivered its comprehensive, end-to-end integrated AI platform vision and introduced its open, scalable rack-scale AI infrastructure built on industry standards at its 2025 Advancing AI event.

AMD and its partners showcased:
  • How they are building the open AI ecosystem with the new AMD Instinct MI350 Series accelerators
  • The continued growth of the AMD ROCm ecosystem
  • The company's powerful, new, open rack-scale designs and roadmap that bring leadership rack-scale AI performance beyond 2027

AMD Instinct MI355X Draws up to 1,400 Watts in OAM Form Factor

Tomorrow evening, AMD will host its "Advancing AI" livestream to introduce the Instinct MI350 series, a new line of GPU accelerators designed for large-scale AI training and inference. First shown in prototype form at ISC 2025 in Hamburg just a day ago, each MI350 card features 288 GB of HBM3E memory, delivering up to 8 TB/s of sustained bandwidth. Customers can choose between the single-card MI350X and the higher-clocked MI355X or opt for a full eight-GPU platform that aggregates to over 2.3 TB of memory. Both chips are built on the CDNA 4 architecture, which now supports four different precision formats: FP16, FP8, FP6, and FP4. The addition of FP6 and FP4 is designed to boost throughput in modern AI workloads, where models of tomorrow with tens of trillions of parameters are trained on FP6 and FP4.

In half-precision tests, the MI350X achieves 4.6 PetaFLOPS on its own and 36.8 PetaFLOPS in eight-GPU platform form, while the MI355X surpasses those numbers, reaching 5.03 PetaFLOPS and just over 40 PetaFLOPS. AMD is also aiming to improve energy efficiency by a factor of thirty compared with its previous generation. The MI350X card runs within a 1,000 Watt power envelope and relies on air cooling, whereas the MI355X steps up to 1,400 Watts and is intended for direct-liquid cooling setups. That 400 Watt increase puts it right at NVIDIA's upcoming GB300 "Grace Blackwell Ultra" superchip, which is also a 1,400 W design. With memory capacity, raw computing, and power efficiency all pushed to new heights, the question remains whether real-world benchmarks will match these ambitious specifications. AMD now only lacks platform scaling beyond eight GPUs, which the Instinct MI400 series will address.

NVIDIA Partners With Europe Model Builders and Cloud Providers to Accelerate Region's Leap Into AI

NVIDIA GTC Paris at VivaTech -- NVIDIA today announced that it is teaming with model builders and cloud providers across Europe and the Middle East to optimize sovereign large language models (LLMs), providing a springboard to accelerate enterprise AI adoption for the region's industries.

Model builders and AI consortiums Barcelona Supercomputing Center (BSC), Bielik.AI, Dicta, H Company, Domyn, LightOn, the National Academic Infrastructure for Supercomputing in Sweden (NAISS) together with KBLab at the National Library of Sweden, the Slovak Republic, the Technology Innovation Institute (TII), the University College of London, the University of Ljubljana and UTTER are teaming with NVIDIA to optimize their models with NVIDIA Nemotron techniques to maximize cost efficiency and accuracy for enterprise AI workloads, including agentic AI.

ASUS Announces Key Milestone with Nebius and Showcases NVIDIA GB300 NVL72 System at GTC Paris 2025

ASUS today joined GTC Paris at VivaTech 2025 as a Gold Sponsor, highlighting its latest portfolio of AI infrastructure solutions and reinforcing its commitment to advancing the AI Factory vision with a full range of NVIDIA Blackwell Ultra solutions, delivering breakthrough performance from large-scale datacenter to personal desktop.

ASUS is also excited to announce a transformative partnership milestone in its partnership with Nebius. Together, the two companies are enabling a new era of AI innovation built on NVIDIA's advanced platforms. Building on the success of the NVIDIA GB200 NVL72 platform deployment, ASUS and Nebius are now moving forward with strategic collaborations featuring the next-generation NVIDIA GB300 NVL72 platform. This ongoing initiative underscores ASUS's role as a key enabler in AI infrastructure, committed to delivering scalable, high-performance solutions that help enterprises accelerate AI adoption and innovation.

NVIDIA Blackwell Delivers Breakthrough Performance in Latest MLPerf Training Results

NVIDIA is working with companies worldwide to build out AI factories—speeding the training and deployment of next-generation AI applications that use the latest advancements in training and inference. The NVIDIA Blackwell architecture is built to meet the heightened performance requirements of these new applications. In the latest round of MLPerf Training—the 12th since the benchmark's introduction in 2018—the NVIDIA AI platform delivered the highest performance at scale on every benchmark and powered every result submitted on the benchmark's toughest large language model (LLM)-focused test: Llama 3.1 405B pretraining.

The NVIDIA platform was the only one that submitted results on every MLPerf Training v5.0 benchmark—underscoring its exceptional performance and versatility across a wide array of AI workloads, spanning LLMs, recommendation systems, multimodal LLMs, object detection and graph neural networks. The at-scale submissions used two AI supercomputers powered by the NVIDIA Blackwell platform: Tyche, built using NVIDIA GB200 NVL72 rack-scale systems, and Nyx, based on NVIDIA DGX B200 systems. In addition, NVIDIA collaborated with CoreWeave and IBM to submit GB200 NVL72 results using a total of 2,496 Blackwell GPUs and 1,248 NVIDIA Grace CPUs.

Kioxia Unveils Plans for Ultra-High Performance 10+ Million IOPS SSD

Kioxia announced its medium to long-term growth strategy and as a part of its growth strategy, the company is betting on advanced SSD technology to capture more market share. The company's most ambitious project is a breakthrough SSD that combines their XL-FLASH memory with a brand-new controller design. "We're taking our ultra-fast XL-Flash memory chips, which use single-level cells, and pairing them with a completely new controller," a company representative explained. "This combination should give us unprecedented performance for small-scale data operations. We're targeting over 10 million IOPS, and we plan to have samples ready by the second half of 2026." The company is also working closely with major GPU manufacturers to optimize performance for AI and graphics-intensive applications.

Meanwhile, Kioxia is rolling out its current generation of SSDs built on 8th generation BiCS FLASH technology. The CM9 series targets AI systems that need both blazing speed and rock-solid reliability to get the most out of expensive GPU hardware. On the other end, the LC9 series focuses on massive storage capacity, hitting 122 terabytes per drive for applications like large-scale databases that power AI inference systems.

AMD's Open AI Software Ecosystem Strengthened Again, Following Acquisition of Brium

At AMD, we're committed to building a high-performance, open AI software ecosystem that empowers developers and drives innovation. Today, we're excited to take another step forward with the acquisition of Brium, a team of world-class compiler and AI software experts with deep expertise in machine learning, AI inference, and performance optimization. Brium brings advanced software capabilities that strengthen our ability to deliver highly optimized AI solutions across the entire stack. Their work in compiler technology, model execution frameworks, and end-to-end AI inference optimization will play a key role in enhancing the efficiency and flexibility of our AI platform.

This acquisition strengthens our foundation for long-term innovation. It reflects our strategic commitment to AI, particularly to the developers who are building the future of intelligent applications. It is also the latest in a series of targeted investments, following the acquisitions of Silo AI, Nod.ai, and Mipsology, that together advance our ability to support the open-source software ecosystem and deliver optimized performance on AMD hardware.

AMD Celebrates Four Decades of FPGA Innovation - From Invention to AI Acceleration

This year marks the 40th anniversary of the first commercially available field-programmable gate array (FPGA), introducing the idea of reprogrammable hardware. By creating "hardware as flexible as software," FPGA reprogrammable logic changed the face of semiconductor design. For the first time, developers could design a chip, and if specs or requirements changed mid-stream, or even after manufacturing, they could redefine its functionality to perform a different task. This flexibility enabled more rapid development of new chip designs, accelerating time to market for new products and providing an alternative to ASICs.

The impact on the market has been phenomenal. FPGAs launched a $10+ billion industry and over the past four decades we have shipped more than 3 billion FPGAs and adaptive SoCs (devices combining FPGA fabric with a system-on-chip and other processing engines) to more than 7,000 customers across diverse market segments. In fact, we've been the programmable logic market share leader for the past 25 consecutive years, and we believe we are well positioned for continued market leadership based on the strength of our product portfolio and roadmap.

NVIDIA on AI Factories: The More You Buy, the More You Make

How NVIDIA's AI factory platform balances maximum performance and minimum latency, optimizing AI inference to power the next industrial revolution. When we prompt generative AI to answer a question or create an image, large language models generate tokens of intelligence that combine to provide the result. One prompt. One set of tokens for the answer. This is called AI inference. Agentic AI uses reasoning to complete tasks. AI agents aren't just providing one-shot answers. They break tasks down into a series of steps, each one a different inference technique. One prompt. Many sets of tokens to complete the job.

The engines of AI inference are called AI factories—massive infrastructures that serve AI to millions of users at once. AI factories generate AI tokens. Their product is intelligence. In the AI era, this intelligence grows revenue and profits. Growing revenue over time depends on how efficient the AI factory can be as it scales. AI factories are the machines of the next industrial revolution.

AnythingLLM App Best Experienced on NVIDIA RTX AI PCs

Large language models (LLMs), trained on datasets with billions of tokens, can generate high-quality content. They're the backbone for many of the most popular AI applications, including chatbots, assistants, code generators and much more. One of today's most accessible ways to work with LLMs is with AnythingLLM, a desktop app built for enthusiasts who want an all-in-one, privacy-focused AI assistant directly on their PC. With new support for NVIDIA NIM microservices on NVIDIA GeForce RTX and NVIDIA RTX PRO GPUs, AnythingLLM users can now get even faster performance for more responsive local AI workflows.

What Is AnythingLLM?
AnythingLLM is an all-in-one AI application that lets users run local LLMs, retrieval-augmented generation (RAG) systems and agentic tools. It acts as a bridge between a user's preferred LLMs and their data, and enables access to tools (called skills), making it easier and more efficient to use LLMs for specific tasks.

NVIDIA & SAP Partnership Will Bring AI Agents to the Physical World

As robots increasingly make their way to the largest enterprises' manufacturing plants and warehouses, the need for access to critical business and operational data has never been more crucial. At its Sapphire conference, SAP announced it is collaborating with NEURA Robotics and NVIDIA to enable its SAP Joule agents to connect enterprise data and processes with NEURA's advanced cognitive robots. The integration will enable robots to support tasks including adaptive manufacturing, autonomous replenishment, compliance monitoring and predictive maintenance. Using the Mega NVIDIA Omniverse Blueprint, SAP customers will be able to simulate and validate large robotic fleets in digital twins before deploying them in real-world facilities.

Virtual Assistants Become Physical Helpers
AI agents are traditionally confined to the digital world, which means they're unable to take actionable steps for physical tasks and do real-world work in warehouses, factories and other industrial workplaces. SAP's collaboration with NVIDIA and NEURA Robotics shows how enterprises will be able to use Joule to plan and simulate complex and dynamic scenarios that include physical AI and autonomous humanoid robots to address critical planning, safety and project requirements, streamline operations and embody business intelligence in the physical world.

Red Hat & AMD Strengthen Strategic Collaboration - Leading to More Efficient GenAI

Red Hat, the world's leading provider of open source solutions, and AMD today announced a strategic collaboration to propel AI capabilities and optimize virtualized infrastructure. With this deepened alliance, Red Hat and AMD will expand customer choice across the hybrid cloud, from deploying optimized, efficient AI models to more cost-effectively modernizing traditional virtual machines (VMs). As workload demand and diversity continue to rise with the introduction of AI, organizations must have the capacity and resources to meet these escalating requirements. The average datacenter, however, is dedicated primarily to traditional IT systems, leaving little room to support intensive workloads such as AI. To answer this need, Red Hat and AMD are bringing together the power of Red Hat's industry-leading open source solutions with the comprehensive portfolio of AMD high-performance computing architectures.

AMD and Red Hat: Driving to more efficient generative AI
Red Hat and AMD are combining the power of Red Hat AI with the AMD portfolio of x86-based processors and GPU architectures to support optimized, cost-efficient and production-ready environments for AI-enabled workloads. AMD Instinct GPUs are now fully enabled on Red Hat OpenShift AI, empowering customers with the high-performing processing power necessary for AI deployments across the hybrid cloud without extreme resource requirements. In addition, using AMD Instinct MI300X GPUs with Red Hat Enterprise Linux AI, Red Hat and AMD conducted testing on Microsoft Azure ND MI300X v5 to successfully demonstrate AI inferencing for scaling small language models (SLMs) as well as large language models (LLM) deployed across multiple GPUs on a single VM, reducing the need to deploy across multiple VMs and reducing performance costs.

NVIDIA & Microsoft Accelerate Agentic AI Innovation - From Cloud to PC

Agentic AI is redefining scientific discovery and unlocking research breakthroughs and innovations across industries. Through deepened collaboration, NVIDIA and Microsoft are delivering advancements that accelerate agentic AI-powered applications from the cloud to the PC. At Microsoft Build, Microsoft unveiled Microsoft Discovery, an extensible platform built to empower researchers to transform the entire discovery process with agentic AI. This will help research and development departments across various industries accelerate the time to market for new products, as well as speed and expand the end-to-end discovery process for all scientists.

Microsoft Discovery will integrate the NVIDIA ALCHEMI NIM microservice, which optimizes AI inference for chemical simulations, to accelerate materials science research with property prediction and candidate recommendation. The platform will also integrate NVIDIA BioNeMo NIM microservices, tapping into pretrained AI workflows to speed up AI model development for drug discovery. These integrations equip researchers with accelerated performance for faster scientific discoveries. In testing, researchers at Microsoft used Microsoft Discovery to detect a novel coolant prototype with promising properties for immersion cooling in data centers in under 200 hours, rather than months or years with traditional methods.

Lenovo Unveils ThinkStation PGX - Offering Big AI Innovation in an SFF Package

Lenovo has announced the ThinkStation PGX, a compact, personal AI developer workstation. The ThinkStation PGX is ideal for AI researchers and developers, data scientists, practitioners, students, and application engineers who need a purpose-built, compact, and powerful AI desktop solution that works immediately out of the box. Built on the NVIDIA GB10 Grace Blackwell Superchip providing up to 1 PetaFlop (1000 TOPS) of AI performance, the ThinkStation PGX can tackle large generative AI models of up to 200 billion parameters. With 128 GB of coherent unified system memory, developers can experiment, fine-tune, or inference the latest generation of reasoning AI models. To double down on computing power, developers can connect two ThinkStation PGX systems together to work with even larger AI models up to 405 billion parameters.

The ThinkStation PGX comes preconfigured with the NVIDIA DGX OS, and the NVIDIA AI software stack, along with familiar tools and frameworks like PyTorch and Jupyter. Developers can instantly prototype, fine-tune, and inference large AI models from the desktop and seamlessly deploy to the data center or cloud. "By collaborating with NVIDIA to deliver a high-performance, yet compact device, Lenovo is empowering AI developers, researchers, data scientists, and students to accelerate their workloads and adoption of breakthrough innovation in generative AI."—Rob Herman, Vice President, Worldwide Workstation and Client AI Business at Lenovo.

AMD & HUMAIN Reveal Formation of $10 Billion Strategic Collab, Aimed at Advancing Global AI

AMD and HUMAIN, Saudi Arabia's new AI enterprise, today announced a landmark agreement to build the world's most open, scalable, resilient, and cost-efficient AI infrastructure, that will power the future of global intelligence through a network of AMD-based AI computing centers stretching from the Kingdom of Saudi Arabia to the United States. As part of the agreement, the parties will invest up to $10B to deploy 500 megawatts of AI compute capacity over the next five years. The AI superstructure built by AMD and HUMAIN will be open by design, accessible at scale, and optimized to power AI workloads across enterprise, start-up and sovereign markets. HUMAIN will oversee end-to-end delivery, including hyperscale data center, sustainable power systems, and global fiber interconnects, and AMD will provide the full spectrum of the AMD AI compute portfolio and the AMD ROCm open software ecosystem.

"At AMD, we have a bold vision to enable the future of AI everywhere—bringing open, high-performance computing to every developer, AI start-up and enterprise around the world," said Dr. Lisa Su, Chair and CEO, AMD. "Our investment with HUMAIN is a significant milestone in advancing global AI infrastructure. Together, we are building a globally significant AI platform that delivers performance, openness and reach at unprecedented levels." With initial deployments already underway across key global regions, the collaboration is on track to activate multi-exaflop capacity by early 2026, supported by next-gen AI silicon, modular data center zones, and a developer-enablement focused software platform stack built around open standards and interoperability.

NVIDIA Wins Multiple COMPUTEX Best Choice Awards

NVIDIA today received multiple accolades at COMPUTEX's Best Choice Awards, in recognition of innovation across the company. The NVIDIA GeForce RTX 5090 GPU won the Gaming and Entertainment category award; the NVIDIA Quantum-X Photonics InfiniBand switch system won the Networking and Communication category award; NVIDIA DGX Spark won the Computer and System category award; and the NVIDIA GB200 NVL72 system and NVIDIA Cosmos world foundation model development platform won Golden Awards. The awards recognize the outstanding functionality, innovation and market promise of technologies in each category. Jensen Huang, founder and CEO of NVIDIA, will deliver a keynote at COMPUTEX on Monday, May 19, at 11 a.m. Taiwan time.

GB200 NVL72 and NVIDIA Cosmos Go Gold
NVIDIA GB200 NVL72 and NVIDIA Cosmos each won Golden Awards. The NVIDIA GB200 NVL72 system connects 36 NVIDIA Grace CPUs and 72 NVIDIA Blackwell GPUs in a rack-scale design. It delivers 1.4 exaflops of AI performance and 30 terabytes of fast memory, as well as 30x faster real-time trillion-parameter large language model inference with 25x energy efficiency compared with the NVIDIA H100 GPU. By design, the GB200 NVL72 accelerates the most compute-intensive AI and high-performance computing workloads, including AI training and data processing for engineering design and simulation. NVIDIA Cosmos accelerates physical AI development by enabling developers to build and deploy world foundation models with unprecedented speed and scale.

NVIDIA & ServiceNow CEOs Jointly Present "Super Genius" Open-source Apriel Nemotron 15B LLM

ServiceNow is accelerating enterprise AI with a new reasoning model built in partnership with NVIDIA—enabling AI agents that respond in real time, handle complex workflows and scale functions like IT, HR and customer service teams worldwide. Unveiled today at ServiceNow's Knowledge 2025—where NVIDIA CEO and founder Jensen Huang joined ServiceNow chairman and CEO Bill McDermott during his keynote address—Apriel Nemotron 15B is compact, cost-efficient and tuned for action. It's designed to drive the next step forward in enterprise large language models (LLMs).

Apriel Nemotron 15B was developed with NVIDIA NeMo, the open NVIDIA Llama Nemotron Post-Training Dataset and ServiceNow domain-specific data, and was trained on NVIDIA DGX Cloud running on Amazon Web Services (AWS). The news follows the April release of the NVIDIA Llama Nemotron Ultra model, which harnesses the NVIDIA open dataset that ServiceNow used to build its Apriel Nemotron 15B model. Ultra is among the strongest open-source models at reasoning, including scientific reasoning, coding, advanced math and other agentic AI tasks.

Semidynamics Announces Cervell All-in-One RISC-V NPU

Semidynamics, the only provider of fully customizable RISC-V processor IP, announces Cervell, a scalable and fully programmable Neural Processing Unit (NPU) built on RISC-V. Cervell combines CPU, vector, and tensor capabilities in a single, unified all-in-one architecture, unlocking zero-latency AI compute across applications from edge AI to datacenter-scale LLMs.

Delivering up to 256 TOPS (Tera Operations Per Second) at 2 GHz, Cervell scales from C8 to C64 configurations, allowing designers to tune performance to application needs—from 8 TOPS INT8 at 1 GHz in compact edge deployments to 256 TOPS INT4 in high-end AI inference.

NVIDIA AI Blueprint for 3D-Guided Generative AI Allows Controlled Composition

AI-powered image generation has progressed at a remarkable pace—from early examples of models creating images of humans with too many fingers to now producing strikingly photorealistic visuals. Even with such leaps, one challenge remains: achieving creative control. Creating scenes using text has gotten easier, no longer requiring complex descriptions—and models have improved alignment to prompts. But describing finer details like composition, camera angles and object placement with text alone is hard, and making adjustments is even more complex.

Advanced workflows using ControlNets—tools that enhance image generation by providing greater control over the output—offer solutions, but their setup complexity limits broader accessibility. To help overcome these challenges and fast-track access to advanced AI capabilities, NVIDIA at the CES trade show earlier this year announced the NVIDIA AI Blueprint for 3D-guided generative AI for RTX PCs. This sample workflow includes everything needed to start generating images with full composition control. Users can download the new Blueprint today.

VSORA Raises $46 Million to Produce its Jotunn8 AI Chip in 2025

SORA, a French innovator and the only European provider of ultra-high-performance artificial intelligence (AI) inference chips, today announced that it has successfully raised $46 million in a new fundraising round.

The investment was led by Otium and a French family office with additional participation from Omnes Capital, Adélie Capital and co-financing from the European Innovation Council (EIC) Fund.

Oracle Cloud Infrastructure Bolstered by Thousands of NVIDIA Blackwell GPUs

Oracle has stood up and optimized its first wave of liquid-cooled NVIDIA GB200 NVL72 racks in its data centers. Thousands of NVIDIA Blackwell GPUs are now being deployed and ready for customer use on NVIDIA DGX Cloud and Oracle Cloud Infrastructure (OCI) to develop and run next-generation reasoning models and AI agents. Oracle's state-of-the-art GB200 deployment includes high-speed NVIDIA Quantum-2 InfiniBand and NVIDIA Spectrum-X Ethernet networking to enable scalable, low-latency performance, as well as a full stack of software and database integrations from NVIDIA and OCI.

OCI, one of the world's largest and fastest-growing cloud service providers, is among the first to deploy NVIDIA GB200 NVL72 systems. The company has ambitious plans to build one of the world's largest Blackwell clusters. OCI Superclusters will scale beyond 100,000 NVIDIA Blackwell GPUs to meet the world's skyrocketing need for inference tokens and accelerated computing. The torrid pace of AI innovation continues as several companies including OpenAI have released new reasoning models in the past few weeks.
Return to Keyword Browsing
Jul 12th, 2025 11:10 CDT change timezone

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts