News Posts matching #inference

Return to Keyword Browsing

Etched Introduces AI-Powered Games Without GPUs, Displays Minecraft Replica

The gaming industry is about to get massively disrupted. Instead of using game engines to power games, we are now witnessing an entirely new and crazy concept. A startup specializing in designing ASICs specifically for Transformer architecture, the foundation behind generative AI models like GPT/Claude/Stable Diffusion, has showcased a demo in partnership with Decart of a Minecraft clone being entirely generated and operated by AI instead of the traditional game engine. While we use AI to create images and videos based on specific descriptions and output pretty realistic content, having an AI model spit out an entire playable game is something different. Oasis is the first playable, real-time, real-time, open-world AI model that takes users' input and generates real-time gameplay, including physics, game rules, and graphics.

An interesting thing to point out is the hardware that powers this setup. Using a single NVIDIA H100 GPU, this 500-million parameter Oasis model can run at 720p resolution at 20 generated frames per second. Due to limitations of accelerators like NVIDIA's H100/B200, gameplay at 4K is almost impossible. However, Etched has its own accelerator called Sohu, which is specialized in accelerating transformer architectures. Eight NVIDIA H100 GPUs can power five Oasis models to five users, while the eight Sohu cards are capable of serving 65 Oasis runs to 65 users. This is more than a 10x increase in inference capability compared to NVIDIA's hardware on a single-use case alone. The accelerator is designed to run much larger models like future 100 billion-parameter generative AI video game models that can output 4K 30 FPS, all thanks to 144 GB of HBM3E memory, yielding 1,152 GB in eight-accelerator server configuration.

Cisco Unveils Plug-and-Play AI Solutions Powered by NVIDIA H100 and H200 Tensor Core GPUs

Today, Cisco announced new additions to its data center infrastructure portfolio: an AI server family purpose-built for GPU-intensive AI workloads with NVIDIA accelerated computing, and AI PODs to simplify and de-risk AI infrastructure investment. They give organizations an adaptable and scalable path to AI, supported by Cisco's industry-leading networking capabilities.

"Enterprise customers are under pressure to deploy AI workloads, especially as we move toward agentic workflows and AI begins solving problems on its own," said Jeetu Patel, Chief Product Officer, Cisco. "Cisco innovations like AI PODs and the GPU server strengthen the security, compliance, and processing power of those workloads as customers navigate their AI journeys from inferencing to training."

Meta Shows Open-Architecture NVIDIA "Blackwell" GB200 System for Data Center

During the Open Compute Project (OCP) Summit 2024, Meta, one of the prime members of the OCP project, showed its NVIDIA "Blackwell" GB200 systems for its massive data centers. We previously covered Microsoft's Azure server rack with GB200 GPUs featuring one-third of the rack space for computing and two-thirds for cooling. A few days later, Google showed off its smaller GB200 system, and today, Meta is showing off its GB200 system—the smallest of the bunch. To train a dense transformer large language model with 405B parameters and a context window of up to 128k tokens, like the Llama 3.1 405B, Meta must redesign its data center infrastructure to run a distributed training job on two 24,000 GPU clusters. That is 48,000 GPUs used for training a single AI model.

Called "Catalina," it is built on the NVIDIA Blackwell platform, emphasizing modularity and adaptability while incorporating the latest NVIDIA GB200 Grace Blackwell Superchip. To address the escalating power requirements of GPUs, Catalina introduces the Orv3, a high-power rack capable of delivering up to 140kW. The comprehensive liquid-cooled setup encompasses a power shelf supporting various components, including a compute tray, switch tray, the Orv3 HPR, Wedge 400 fabric switch with 12.8 Tbps switching capacity, management switch, battery backup, and a rack management controller. Interestingly, Meta also upgraded its "Grand Teton" system for internal usage, such as deep learning recommendation models (DLRMs) and content understanding with AMD Instinct MI300X. Those are used to inference internal models, and MI300X appears to provide the best performance per Dollar for inference. According to Meta, the computational demand stemming from AI will continue to increase exponentially, so more NVIDIA and AMD GPUs is needed, and we can't wait to see what the company builds.

Intel Won't Compete Against NVIDIA's High-End AI Dominance Soon, Starts Laying Off Over 2,200 Workers Across US

Intel's taking a different path with its Gaudi 3 accelerator chips. It's staying away from the high-demand market for training big AI models, which has made NVIDIA so successful. Instead, Intel wants to help businesses that need cheaper AI solutions to train and run smaller specific models and open-source options. At a recent event, Intel talked up Gaudi 3's "price performance advantage" over NVIDIA's H100 GPU for inference tasks. Intel says Gaudi 3 is faster and more cost-effective than the H100 when running Llama 3 and Llama 2 models of different sizes.

Intel also claims that Gaudi 3 is as power-efficient as the H100 for large language model (LLM) inference with small token outputs and does even better with larger outputs. The company even suggests Gaudi 3 beats NVIDIA's newer H200 in LLM inference throughput for large token outputs. However, Gaudi 3 doesn't match up to the H100 in overall floating-point operation throughput for 16-bit and 8-bit formats. For bfloat16 and 8-bit floating-point precision matrix math, Gaudi 3 hits 1,835 TFLOPS in each format, while the H100 reaches 1,979 TFLOPS for BF16 and 3,958 TFLOPS for FP8.

Supermicro Introduces New Servers and GPU Accelerated Systems with AMD EPYC 9005 Series CPUs and AMD Instinct MI325X GPUs

Supermicro, Inc., a Total IT Solution Provider for AI, Cloud, Storage, and 5G/Edge, announces the launch of a new series of servers, GPU-accelerated systems, and storage servers featuring the AMD EPYC 9005 Series processors and AMD Instinct MI325X GPUs. The new H14 product line represents one of the most extensive server families in the industry, including Supermicro's Hyper systems, the Twin multi-node servers, and AI inferencing GPU systems, all available with air or liquid cooling options. The new "Zen 5" processor core architecture implements full data path AVX-512 vector instructions for CPU-based AI inference and provides 17% better instructions per cycle (IPC) than the previous 4th generation EPYC processor, enabling more performance per core.

Supermicro's new H14 family uses the latest 5th Gen AMD EPYC processors which enable up to 192 cores per CPU with up to 500 W TDP (thermal design power). Supermicro has designed new H14 systems including the Hyper and FlexTwin systems which can accommodate the higher thermal requirements. The H14 family also includes three systems for AI training and inference workloads supporting up to 10 GPUs which feature the AMD EPYC 9005 Series CPU as the host processor and two which support the AMD Instinct MI325X GPU.

Astera Labs Introduces New Portfolio of Fabric Switches Purpose-Built for AI Infrastructure at Cloud-Scale

Astera Labs, Inc, a global leader in semiconductor-based connectivity solutions for AI and cloud infrastructure, today announced a new portfolio of fabric switches, including the industry's first PCIe 6 switch, built from the ground up for demanding AI workloads in accelerated computing platforms deployed at cloud-scale. The Scorpio Smart Fabric Switch portfolio is optimized for AI dataflows to deliver maximum predictable performance per watt, high reliability, easy cloud-scale deployment, reduced time-to-market, and lower total cost of ownership.

The Scorpio Smart Fabric Switch portfolio features two application-specific product lines with a multi-generational roadmap:
  • Scorpio P-Series for GPU-to-CPU/NIC/SSD PCIe 6 connectivity- architected to support mixed traffic head-node connectivity across a diverse ecosystem of PCIe hosts and endpoints.
  • Scorpio X-Series for back-end GPU clustering-architected to deliver the highest back-end GPU-to-GPU bandwidth with platform-specific customization.

Thundercomm Launches RUBIK Pi on Qualcomm Platforms

At an industry event in Austin today, Thundercomm announces RUBIK Pi, the first Pi built on Qualcomm SoC platforms for developers. RUBIK Pi is an innovative tool that aims to lower the barriers application development with AI inference, allowing developers to access high-performance, easy-to-deploy AI R&D tools.

The Pi product is a must-have for electronics enthusiasts and developers. It can be seen as a microcomputer, integrating a processor, memory, storage, and various interfaces on a credit card-sized board. Thundercomm, a world-leading IoT product and solution provider, building on its expertise in ICT technologies and developer workflows, launched RUBIK Pi, aiming to create the most user-friendly AI R&D tools.

AMD Introduces the Radeon PRO V710 to Microsoft Azure

AMD today introduced the Radeon PRO V710, the newest member of AMD's family of visual cloud GPUs. Available today in private preview on Microsoft Azure, the Radeon PRO V710 brings new capabilities to the public cloud. The AMD Radeon PRO V710's 54 Compute Units, along with 28 GB of VRAM, 448 GB/s memory transfer rate, and 54 MB of L3 AMD Infinity Cache technology, support small to medium ML inference workloads and small model training using open-source AMD ROCm software.

With support for hardware virtualization implemented in compliance with the PCI Express SR-IOV standard, instances based on the Radeon PRO V710 can provide robust isolation between multiple virtual machines running on the same physical GPU and between the host and guest environments. The efficient RDNA 3 architecture provides excellent performance per watt, enabling a single slot, passively cooled form factor compliant with the PCIe CEM spec.

Foxconn to Build Taiwan's Fastest AI Supercomputer With NVIDIA Blackwell

NVIDIA and Foxconn are building Taiwan's largest supercomputer, marking a milestone in the island's AI advancement. The project, Hon Hai Kaohsiung Super Computing Center, revealed Tuesday at Hon Hai Tech Day, will be built around NVIDIA's groundbreaking Blackwell architecture and feature the GB200 NVL72 platform, which includes a total of 64 racks and 4,608 Tensor Core GPUs. With an expected performance of over 90 exaflops of AI performance, the machine would easily be considered the fastest in Taiwan.

Foxconn plans to use the supercomputer, once operational, to power breakthroughs in cancer research, large language model development and smart city innovations, positioning Taiwan as a global leader in AI-driven industries. Foxconn's "three-platform strategy" focuses on smart manufacturing, smart cities and electric vehicles. The new supercomputer will play a pivotal role in supporting Foxconn's ongoing efforts in digital twins, robotic automation and smart urban infrastructure, bringing AI-assisted services to urban areas like Kaohsiung.

NVIDIA Cancels Dual-Rack NVL36x2 in Favor of Single-Rack NVL72 Compute Monster

NVIDIA has reportedly discontinued its dual-rack GB200 NVL36x2 GPU model, opting to focus on the single-rack GB200 NVL72 and NVL36 models. This shift, revealed by industry analyst Ming-Chi Kuo, aims to simplify NVIDIA's offerings in the AI and HPC markets. The decision was influenced by major clients like Microsoft, who prefer the NVL72's improved space efficiency and potential for enhanced inference performance. While both models perform similarly in AI large language model (LLM) training, the NVL72 is expected to excel in non-parallelizable inference tasks. As a reminder, the NVL72 features 36 Grace CPUs, delivering 2,592 Arm Neoverse V2 cores with 17 TB LPDDR5X memory with 18.4 TB/s aggregate bandwidth. Additionally, it includes 72 Blackwell GB200 SXM GPUs that have a massive 13.5 TB of HBM3e combined, running at 576 TB/s aggregate bandwidth.

However, this shift presents significant challenges. The NVL72's power consumption of around 120kW far exceeds typical data center capabilities, potentially limiting its immediate widespread adoption. The discontinuation of the NVL36x2 has also sparked concerns about NVIDIA's execution capabilities and may disrupt the supply chain for assembly and cooling solutions. Despite these hurdles, industry experts view this as a pragmatic approach to product planning in the dynamic AI landscape. While some customers may be disappointed by the dual-rack model's cancellation, NVIDIA's long-term outlook in the AI technology market remains strong. The company continues to work with clients and listen to their needs, to position itself as a leader in high-performance computing solutions.

Huawei Starts Shipping "Ascend 910C" AI Accelerator Samples to Large NVIDIA Customers

Huawei has reportedly started shipping its Ascend 910C accelerator—the company's domestic alternative to NVIDIA's H100 accelerator for AI training and inference. As the report from China South Morning Post notes, Huawei is shipping samples of its accelerator to large NVIDIA customers. This includes companies like Alibaba, Baidu, and Tencent, which have ordered massive amounts of NVIDIA accelerators. However, Huawei is on track to deliver 70,000 chips, potentially worth $2 billion. With NVIDIA working on a B20 accelerator SKU that complies with US government export regulations, the Huawei Ascend 910C accelerator could potentially outperform NVIDIA's B20 processor, per some analyst expectations.

If the Ascend 910C receives positive results from Chinese tech giants, it could be the start of Huawei's expansion into data center accelerators, once hindered by the company's ability to manufacture advanced chips. Now, with foundries like SMIC printing 7 nm designs and possibly 5 nm coming soon, Huawei will leverage this technology to satisfy the domestic demand for more AI processing power. Competing on a global scale, though, remains a challenge. Companies like NVIDIA, AMD, and Intel have access to advanced nodes, which gives their AI accelerators more efficiency and performance.

Cerebras Launches the World's Fastest AI Inference

Today, Cerebras Systems, the pioneer in high performance AI compute, announced Cerebras Inference, the fastest AI inference solution in the world. Delivering 1,800 tokens per second for Llama3.1 8B and 450 tokens per second for Llama3.1 70B, Cerebras Inference is 20 times faster than NVIDIA GPU-based solutions in hyperscale clouds. Starting at just 10c per million tokens, Cerebras Inference is priced at a fraction of GPU solutions, providing 100x higher price-performance for AI workloads.

Unlike alternative approaches that compromise accuracy for performance, Cerebras offers the fastest performance while maintaining state of the art accuracy by staying in the 16-bit domain for the entire inference run. Cerebras Inference is priced at a fraction of GPU-based competitors, with pay-as-you-go pricing of 10 cents per million tokens for Llama 3.1 8B and 60 cents per million tokens for Llama 3.1 70B.

AI SSD Procurement Capacity Estimated to Exceed 45 EB in 2024; NAND Flash Suppliers Accelerate Process Upgrades

TrendForce's latest report on enterprise SSDs reveals that a surge in demand for AI has led AI server customers to significantly increase their orders for enterprise SSDs over the past two quarters. Upstream suppliers have been accelerating process upgrades and planning for 2YY products—slated to enter mass production in 2025—in order to meet the growing demand for SSDs in AI applications.

TrendForce observes that increased orders for enterprise SSDs from AI server customers have resulted in contract prices for this category rising by over 80% from 4Q23 to 3Q24. SSDs play a crucial role in AI development. In AI model training, SSDs primarily store model parameters, including evolving weights and deviations.

NVIDIA Accelerates Humanoid Robotics Development

To accelerate humanoid development on a global scale, NVIDIA today announced it is providing the world's leading robot manufacturers, AI model developers and software makers with a suite of services, models and computing platforms to develop, train and build the next generation of humanoid robotics.

Among the offerings are new NVIDIA NIM microservices and frameworks for robot simulation and learning, the NVIDIA OSMO orchestration service for running multi-stage robotics workloads, and an AI- and simulation-enabled teleoperation workflow that allows developers to train robots using small amounts of human demonstration data.

Intel Submits Gaudi 2 Results on MLCommons' Newest Benchmark

Today, MLCommons published results of its industry AI performance benchmark, MLPerf Training v4.0. Intel's results demonstrate the choice that Intel Gaudi 2 AI accelerators give enterprises and customers. Community-based software simplifies generative AI (GenAI) development and industry-standard Ethernet networking enables flexible scaling of AI systems. For the first time on the MLPerf benchmark, Intel submitted results on a large Gaudi 2 system (1,024 Gaudi 2 accelerators) trained in Intel Tiber Developer Cloud to demonstrate Gaudi 2 performance and scalability and Intel's cloud capacity for training MLPerf's GPT-3 175B1 parameter benchmark model.

"The industry has a clear need: address the gaps in today's generative AI enterprise offerings with high-performance, high-efficiency compute options. The latest MLPerf results published by MLCommons illustrate the unique value Intel Gaudi brings to market as enterprises and customers seek more cost-efficient, scalable systems with standard networking and open software, making GenAI more accessible to more customers," said Zane Ball, Intel corporate vice president and general manager, DCAI Product Management.

Arm Also Announces Three New GPUs for Consumer Devices

In addition to its two new CPU cores, Arm has announced three new GPU cores, namely the Immortalis-G925, Mali-G725 and Mali-G625. Starting from the top, the Immortalis-G925 is said to bring up to 37 percent better performance at 30 percent lower power usage compared to last year's Immortalis-G720 GPU core, whilst having two additional GPU cores in the test scenario. It's also said to bring up to 52 percent better ray tracing performance and up to 36 percent improved inference in AI/ML workloads. It's also been given a big overhaul when it comes to ray tracing—due to it being aimed towards gaming phones—and Arm claims that it can either offer up to 52 percent increased performance by reducing the accuracy in scenes with intricate objects, or 27 percent more performance with maintained accuracy.

The Immortalis-G925 supports 50 percent more shader cores and it supports configurations of up to 24 cores, compared to 16 cores for the Immortalis-G720. The Mali-G725 will be available with between six and nine cores, whereas the Mali-G625 will sport between one and five cores. The Mali-G625 is intended for smartwatches and entry-level mobile devices where a more complex GPU might not be suitable due to power draw. The Mali-G725 on the other hand is targeting upper mid-range devices and the Immortalis-G925 is aimed towards flagship devices or gaming phones as mentioned above. In related news, Arm said it's working with Epic Games to get its Unreal Engine 5 desktop renderer up and running on Android, which could lead to more complex games on mobile devices.

Apple Introduces the M4 Chip

Apple today announced M4, the latest chip delivering phenomenal performance to the all-new iPad Pro. Built using second-generation 3-nanometer technology, M4 is a system on a chip (SoC) that advances the industry-leading power efficiency of Apple silicon and enables the incredibly thin design of iPad Pro. It also features an entirely new display engine to drive the stunning precision, color, and brightness of the breakthrough Ultra Retina XDR display on iPad Pro. A new CPU has up to 10 cores, while the new 10-core GPU builds on the next-generation GPU architecture introduced in M3, and brings Dynamic Caching, hardware-accelerated ray tracing, and hardware-accelerated mesh shading to iPad for the first time. M4 has Apple's fastest Neural Engine ever, capable of up to 38 trillion operations per second, which is faster than the neural processing unit of any AI PC today. Combined with faster memory bandwidth, along with next-generation machine learning (ML) accelerators in the CPU, and a high-performance GPU, M4 makes the new iPad Pro an outrageously powerful device for artificial intelligence.

"The new iPad Pro with M4 is a great example of how building best-in-class custom silicon enables breakthrough products," said Johny Srouji, Apple's senior vice president of Hardware Technologies. "The power-efficient performance of M4, along with its new display engine, makes the thin design and game-changing display of iPad Pro possible, while fundamental improvements to the CPU, GPU, Neural Engine, and memory system make M4 extremely well suited for the latest applications leveraging AI. Altogether, this new chip makes iPad Pro the most powerful device of its kind."

Apple Reportedly Developing Custom Data Center Processors with Focus on AI Inference

Apple is reportedly working on creating in-house chips designed explicitly for its data centers. This news comes from a recent report by the Wall Street Journal, which highlights the company's efforts to enhance its data processing capabilities and reduce dependency on third parties to supply the infrastructure. In the internal project called Apple Chips in Data Center (ACDC), which started in 2018, Apple wanted to design data center processors to handle the massive user base and increase the company's service offerings. The most recent advancement in AI means that Apple will probably serve an LLM processed in Apple's data center. The chip will most likely focus on inference of AI models rather than training.

The AI chips are expected to play a crucial role in improving the efficiency and speed of Apple's data centers, which handle vast amounts of data generated by the company's various services and products. By developing these custom chips, Apple aims to optimize its data processing and storage capabilities, ultimately leading to better user experiences across its ecosystem. The move by Apple to develop AI-enhanced chips for data centers is seen as a strategic step in the company's efforts to stay ahead in the competitive tech landscape. Almost all major tech companies, famously called the big seven, have products that use AI in silicon and in software processing. However, Apple is the one that seemingly lacked that. Now, the company is integrating AI across the entire vertical, from the upcoming iPhone integration to M4 chips for Mac devices and ACDC chips for data centers.

More than 500 AI Models Run Optimized on Intel Core Ultra Processors

Today, Intel announced it surpassed 500 AI models running optimized on new Intel Core Ultra processors - the industry's premier AI PC processor available in the market today, featuring new AI experiences, immersive graphics and optimal battery life. This significant milestone is a result of Intel's investment in client AI, the AI PC transformation, framework optimizations and AI tools including OpenVINO toolkit. The 500 models, which can be deployed across the central processing unit (CPU), graphics processing unit (GPU) and neural processing unit (NPU), are available across popular industry sources, including OpenVINO Model Zoo, Hugging Face, ONNX Model Zoo and PyTorch. The models draw from categories of local AI inferencing, including large language, diffusion, super resolution, object detection, image classification/segmentation, computer vision and others.

"Intel has a rich history of working with the ecosystem to bring AI applications to client devices, and today we celebrate another strong chapter in the heritage of client AI by surpassing 500 pre-trained AI models running optimized on Intel Core Ultra processors. This unmatched selection reflects our commitment to building not only the PC industry's most robust toolchain for AI developers, but a rock-solid foundation AI software users can implicitly trust."
-Robert Hallock, Intel vice president and general manager of AI and technical marketing in the Client Computing Group

ASUS IoT Announces PE8000G

ASUS IoT, the global AIoT solution provider, today announced PE8000G at Embedded World 2024, a powerful edge AI computer that supports multiple GPU cards for high performance—and is expertly engineered to handle rugged conditions with resistance to extreme temperatures, vibration and variable voltage. PE8000G is powered by formidable Intel Core processors (13th and 12th gen) and the Intel R680E chipset to deliver high-octane processing power and efficiency.

With its advanced architecture, PE8000G excels at running multiple neural network modules simultaneously in real-time—and represents a significant leap forward in edge AI computing. With its robust design, exceptional performance and wide range of features, PE8000G series is poised to revolutionize AI-driven applications across multiple industries, elevating edge AI computing to new heights and enabling organizations to tackle mission-critical tasks with confidence and to achieve unprecedented levels of productivity and innovation.

AMD Extends Leadership Adaptive SoC Portfolio with New Versal Series Gen 2 Devices Delivering End-to-End Acceleration for AI-Driven Embedded Systems

AMD today announced the expansion of the AMD Versal adaptive system on chip (SoC) portfolio with the new Versal AI Edge Series Gen 2 and Versal Prime Series Gen 2 adaptive SoCs, which bring preprocessing, AI inference, and postprocessing together in a single device for end-to-end acceleration of AI-driven embedded systems.

These initial devices in the Versal Series Gen 2 portfolio build on the first generation with powerful new AI Engines expected to deliver up to 3x higher TOPs-per-watt than first generation Versal AI Edge Series devicesi, while new high-performance integrated Arm CPUs are expected to offer up to 10x more scalar compute than first gen Versal AI Edge and Prime series devicesii.

US Government Wants Nuclear Plants to Offload AI Data Center Expansion

The expansion of AI technology affects not only the production and demand for graphics cards but also the electricity grid that powers them. Data centers hosting thousands of GPUs are becoming more common, and the industry has been building new facilities for GPU-enhanced servers to serve the need for more AI. However, these powerful GPUs often consume over 500 Watts per single card, and NVIDIA's latest Blackwell B200 GPU has a TGP of 1000 Watts or a single kilowatt. These kilowatt GPUs will be present in data centers with 10s of thousands of cards, resulting in multi-megawatt facilities. To combat the load on the national electricity grid, US President Joe Biden's administration has been discussing with big tech to re-evaluate their power sources, possibly using smaller nuclear plants. According to an Axios interview with Energy Secretary Jennifer Granholm, she has noted that "AI itself isn't a problem because AI could help to solve the problem." However, the problem is the load-bearing of the national electricity grid, which can't sustain the rapid expansion of the AI data centers.

The Department of Energy (DOE) has been reportedly talking with firms, most notably hyperscalers like Microsoft, Google, and Amazon, to start considering nuclear fusion and fission power plants to satisfy the need for AI expansion. We have already discussed the plan by Microsoft to embed a nuclear reactor near its data center facility and help manage the load of thousands of GPUs running AI training/inference. However, this time, it is not just Microsoft. Other tech giants are reportedly thinking about nuclear as well. They all need to offload their AI expansion from the US national power grid and develop a nuclear solution. Nuclear power is a mere 20% of the US power sourcing, and DOE is currently financing a Holtec Palisades 800-MW electric nuclear generating station with $1.52 billion in funds for restoration and resumption of service. Microsoft is investing in a Small Modular Reactors (SMRs) microreactor energy strategy, which could be an example for other big tech companies to follow.

NVIDIA Hopper Leaps Ahead in Generative AI at MLPerf

It's official: NVIDIA delivered the world's fastest platform in industry-standard tests for inference on generative AI. In the latest MLPerf benchmarks, NVIDIA TensorRT-LLM—software that speeds and simplifies the complex job of inference on large language models—boosted the performance of NVIDIA Hopper architecture GPUs on the GPT-J LLM nearly 3x over their results just six months ago. The dramatic speedup demonstrates the power of NVIDIA's full-stack platform of chips, systems and software to handle the demanding requirements of running generative AI. Leading companies are using TensorRT-LLM to optimize their models. And NVIDIA NIM—a set of inference microservices that includes inferencing engines like TensorRT-LLM—makes it easier than ever for businesses to deploy NVIDIA's inference platform.

Raising the Bar in Generative AI
TensorRT-LLM running on NVIDIA H200 Tensor Core GPUs—the latest, memory-enhanced Hopper GPUs—delivered the fastest performance running inference in MLPerf's biggest test of generative AI to date. The new benchmark uses the largest version of Llama 2, a state-of-the-art large language model packing 70 billion parameters. The model is more than 10x larger than the GPT-J LLM first used in the September benchmarks. The memory-enhanced H200 GPUs, in their MLPerf debut, used TensorRT-LLM to produce up to 31,000 tokens/second, a record on MLPerf's Llama 2 benchmark. The H200 GPU results include up to 14% gains from a custom thermal solution. It's one example of innovations beyond standard air cooling that systems builders are applying to their NVIDIA MGX designs to take the performance of Hopper GPUs to new heights.

NVIDIA Modulus & Omniverse Drive Physics-informed Models and Simulations

A manufacturing plant near Hsinchu, Taiwan's Silicon Valley, is among facilities worldwide boosting energy efficiency with AI-enabled digital twins. A virtual model can help streamline operations, maximizing throughput for its physical counterpart, say engineers at Wistron, a global designer and manufacturer of computers and electronics systems. In the first of several use cases, the company built a digital copy of a room where NVIDIA DGX systems undergo thermal stress tests (pictured above). Early results were impressive.

Making Smart Simulations
Using NVIDIA Modulus, a framework for building AI models that understand the laws of physics, Wistron created digital twins that let them accurately predict the airflow and temperature in test facilities that must remain between 27 and 32 degrees C. A simulation that would've taken nearly 15 hours with traditional methods on a CPU took just 3.3 seconds on an NVIDIA GPU running inference with an AI model developed using Modulus, a whopping 15,000x speedup. The results were fed into tools and applications built by Wistron developers with NVIDIA Omniverse, a platform for creating 3D workflows and applications based on OpenUSD.

Samsung Prepares Mach-1 Chip to Rival NVIDIA in AI Inference

During its 55th annual shareholders' meeting, Samsung Electronics announced its entry into the AI processor market with the upcoming launch of its Mach-1 AI accelerator chips in early 2025. The South Korean tech giant revealed its plans to compete with established players like NVIDIA in the rapidly growing AI hardware sector. The Mach-1 generation of chips is an application-specific integrated circuit (ASIC) design equipped with LPDDR memory that is envisioned to excel in edge computing applications. While Samsung does not aim to directly rival NVIDIA's ultra-high-end AI solutions like the H100, B100, or B200, the company's strategy focuses on carving out a niche in the market by offering unique features and performance enhancements at the edge, where low power and efficient computing is what matters the most.

According to SeDaily, the Mach-1 chips boast a groundbreaking feature that significantly reduces memory bandwidth requirements for inference to approximately 0.125x compared to existing designs, which is an 87.5% reduction. This innovation could give Samsung a competitive edge in terms of efficiency and cost-effectiveness. As the demand for AI-powered devices and services continues to soar, Samsung's foray into the AI chip market is expected to intensify competition and drive innovation in the industry. While NVIDIA currently holds a dominant position, Samsung's cutting-edge technology and access to advanced semiconductor manufacturing nodes could make it a formidable contender. The Mach-1 has been field-verified on an FPGA, while the final design is currently going through a physical design for SoC, which includes placement, routing, and other layout optimizations.
Return to Keyword Browsing
Nov 18th, 2024 21:27 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts