News Posts matching #Hopper

Return to Keyword Browsing

NVIDIA Ada Lovelace Successor Set for 2025

According to the NVIDIA roadmap that was spotted in the recently published MLCommons training results, the Ada Lovelace successor is set to come in 2025. The roadmap also reveals the schedule for Hopper Next and Grace Next GPUs, as well as the BlueField-4 DPU.

While the roadmap does not provide a lot of details, it does give us a general idea of when to expect NVIDIA's next GeForce architecture. Since NVIDIA usually launches a new GeForce architecture every two years or so, the latest schedule might sound like a small delay, at least if it plans to launch the Ada Lovelace Next in early 2025 and not later. NVIDIA Pascal was launched in May 2016, Turing in September 2018, Ampere in May 2020, and Ada Lovelace in October 2022.

NVIDIA H100 GPUs Set Standard for Generative AI in Debut MLPerf Benchmark

In a new industry-standard benchmark, a cluster of 3,584 H100 GPUs at cloud service provider CoreWeave trained a massive GPT-3-based model in just 11 minutes. Leading users and industry-standard benchmarks agree: NVIDIA H100 Tensor Core GPUs deliver the best AI performance, especially on the large language models (LLMs) powering generative AI.

H100 GPUs set new records on all eight tests in the latest MLPerf training benchmarks released today, excelling on a new MLPerf test for generative AI. That excellence is delivered both per-accelerator and at-scale in massive servers. For example, on a commercially available cluster of 3,584 H100 GPUs co-developed by startup Inflection AI and operated by CoreWeave, a cloud service provider specializing in GPU-accelerated workloads, the system completed the massive GPT-3-based training benchmark in less than eleven minutes.

NVIDIA H100 Hopper GPU Tested for Gaming, Slower Than Integrated GPU

NVIDIA's H100 Hopper GPU is a device designed for pure AI and other compute workloads, with the least amount of consideration for gaming workloads that involve graphics processing. However, it is still interesting to see how this 30,000 USD GPU fairs in comparison to other gaming GPUs and whether it is even possible to run games on it. It turns out that it is technically feasible but not making much sense, as the Chinese YouTube channel Geekerwan notes. Based on the GH100 GPU SKU with 14,592 CUDA, the H100 PCIe version tested here can achieve 204.9 TeraFLOPS at FP16, 51.22 TeraFLOPS at FP32, and 25.61 TeraFLOPS at FP64, with its natural power laying in accelerating AI workloads.

However, how does it fare in gaming benchmarks? Not very well, as the testing shows. It scored 2681 points in 3DMark Time Spy, which is lower than AMD's integrated Radeon 680M, which managed to score 2710 points. Interestingly, the GH100 has only 24 ROPs (render output units), while the gaming-oriented GA102 (highest-end gaming GPU SKU) has 112 ROPs. This is self-explanatory and provides a clear picture as to why the H100 GPU is used for computing only. Since it doesn't have any display outputs, the system needed another regular GPU to provide the picture, while the computation happened on the H100 GPU.

NVIDIA Grace Drives Wave of New Energy-Efficient Arm Supercomputers

NVIDIA today announced a supercomputer built on the NVIDIA Grace CPU Superchip, adding to a wave of new energy-efficient supercomputers based on the Arm Neoverse platform. The Isambard 3 supercomputer to be based at the Bristol & Bath Science Park, in the U.K., will feature 384 Arm-based NVIDIA Grace CPU Superchips to power medical and scientific research, and is expected to deliver 6x the performance and energy efficiency of Isambard 2, placing it among Europe's most energy-efficient systems.

It will achieve about 2.7 petaflops of FP64 peak performance and consume less than 270 kilowatts of power, ranking it among the world's three greenest non-accelerated supercomputers. The project is being led by the University of Bristol, as part of the research consortium the GW4 Alliance, together with the universities of Bath, Cardiff and Exeter.

TechPowerUp GPU-Z v2.53.0 Released

Today, we are releasing the latest version of TechPowerUp GPU-Z, the popular graphics sub-system information and diagnostic utility. Version 2.53.0 adds support for a large number of new and rare GPUs. Among the NVIDIA GPUs support is added for include the GeForce RTX 4070, RTX 4090, RTX 4080, RTX 4060, and RTX 4050; pro-vis RTX 6000 Ada, RTX 3060 Laptop GPU (based on GA104), RTX 3050 Laptop GPU 6 GB, RTX 2050; and compute accelerators Hopper H100 PCIe AIC, and a rare engineering sample of the RTX 2080 Ti. From the AMD camp, support is added for Radeon RX 7600S, Radeon Pro W6900X, Pro V620, and the iGPU of Ryzen "Mendocino" laptop processors. From the Intel side, we've added support for Intel "Raptor Lake-HX," "Alder Lake-N," "Alder Lake-U," and the UHD P750 iGPU found with certain "Rocket Lake" processors.

DOWNLOAD: TechPowerUp GPU-Z v2.53.0

Chinese GPU Maker Biren Technology Loses its Co-Founder, Only Months After Revealing New GPUs

Golf Jiao, a co-founder and general manager of Biren Technology, has left the company late last month according to insider sources in China. No official statement has been issued by the executive team at Biren Tech, and Jiao has not provided any details regarding his departure from the fabless semiconductor design company. The Shanghai-based firm is a relatively new startup - it was founded in 2019 by several former NVIDIA, Qualcomm and Alibaba veterans. Biren Tech received $726.6 million in funding for its debut range of general-purpose graphics processing units (GPGPUs), also defined as high-performance computing graphics processing units (HPC GPUs).

The company revealed its ambitions to take on NVIDIA's Ampere A100 and Hopper H100 compute platforms, and last August announced two HPC GPUs in the form of the BR100 and BR104. The specifications and performance charts demonstrated impressive figures, but Biren Tech had to roll back its numbers when it was hit by U.S Government enforced sanctions in October 2022. The fabless company had contracted with TSMC to produce its Biren range, and the new set of rules resulted in shipments from the Taiwanese foundry being halted. Biren Tech cut its work force by a third soon after losing its supply chain with TSMC, and the engineering team had to reassess how the BR100 and BR104 would perform on a process node larger than the original 7 nm design. It was decided that a downgrade in transfer rates would appease the legal teams, and get newly redesigned Biren silicon back onto the assembly line.

NVIDIA Hopper GPUs Expand Reach as Demand for AI Grows

NVIDIA and key partners today announced the availability of new products and services featuring the NVIDIA H100 Tensor Core GPU—the world's most powerful GPU for AI—to address rapidly growing demand for generative AI training and inference. Oracle Cloud Infrastructure (OCI) announced the limited availability of new OCI Compute bare-metal GPU instances featuring H100 GPUs. Additionally, Amazon Web Services announced its forthcoming EC2 UltraClusters of Amazon EC2 P5 instances, which can scale in size up to 20,000 interconnected H100 GPUs. This follows Microsoft Azure's private preview announcement last week for its H100 virtual machine, ND H100 v5.

Additionally, Meta has now deployed its H100-powered Grand Teton AI supercomputer internally for its AI production and research teams. NVIDIA founder and CEO Jensen Huang announced during his GTC keynote today that NVIDIA DGX H100 AI supercomputers are in full production and will be coming soon to enterprises worldwide.

NVIDIA, ASML, TSMC and Synopsys Set Foundation for Next-Generation Chip Manufacturing

NVIDIA today announced a breakthrough that brings accelerated computing to the field of computational lithography, enabling semiconductor leaders like ASML, TSMC and Synopsys to accelerate the design and manufacturing of next-generation chips, just as current production processes are nearing the limits of what physics makes possible.

The new NVIDIA cuLitho software library for computational lithography is being integrated by TSMC, the world's leading foundry, as well as electronic design automation leader Synopsys into their software, manufacturing processes and systems for the latest-generation NVIDIA Hopper architecture GPUs. Equipment maker ASML is working closely with NVIDIA on GPUs and cuLitho, and is planning to integrate support for GPUs into all of its computational lithography software products.

NVIDIA Announces New System for Accelerated Quantum-Classical Computing

NVIDIA today announced a new system built with Quantum Machines that provides a revolutionary new architecture for researchers working in high-performance and low-latency quantum-classical computing. The world's first GPU-accelerated quantum computing system, the NVIDIA DGX Quantum brings together the world's most powerful accelerated computing platform - enabled by the NVIDIA Grace Hopper Superchip and CUDA Quantum open-source programming model - with the world's most advanced quantum control platform, OPX, by Quantum Machines.

The combination allows researchers to build extraordinarily powerful applications that combine quantum computing with state-of-the-art classical computing, enabling calibration, control, quantum error correction and hybrid algorithms. "Quantum-accelerated supercomputing has the potential to reshape science and industry with capabilities that can serve humanity in enormous ways," said Tim Costa, director of HPC and quantum at NVIDIA. "NVIDIA DGX Quantum will enable researchers to push the boundaries of quantum-classical computing."

Microsoft Azure Announces New Scalable Generative AI VMs Featuring NVIDIA H100

Microsoft Azure announced their new ND H100 v5 virtual machine which packs Intel's Sapphire Rapids Xeon Scalable processors with NVIDIA's Hopper H100 GPUs, as well as NVIDIA's Quantum-2 CX7 interconnect. Inside each physical machine sits eight H100s—presumably the SXM5 variant packing a whopping 132 SMs and 528 4th generation tensor cores—interconnected by NVLink 4.0 which ties them all together with 3.6 TB/s bisectional bandwidth. Outside each local machine is a network of thousands more H100s connected together with 400 GB/s Quantum-2 CX7 InfiniBand, which Microsoft says allows 3.2 Tb/s per VM for on-demand scaling to accelerate the largest AI training workloads.

Generative AI solutions like ChatGPT have accelerated demand for multi-ExaOP cloud services that can handle the large training sets and utilize the latest development tools. Azure's new ND H100 v5 VMs offer that capability to organizations of any size, whether you're a smaller startup or a larger company looking to implement large-scale AI training deployments. While Microsoft is not making any direct claims for performance, NVIDIA has advertised H100 as running up to 30x faster than the preceding Ampere architecture that is currently offered with the ND A100 v4 VMs.

NVIDIA Could Launch Hopper H100 PCIe GPU with 120 GB Memory

NVIDIA's high-performance computing hardware stack is now equipped with the top-of-the-line Hopper H100 GPU. It features 16896 or 14592 CUDA cores, developing if it comes in SXM5 of PCIe variant, with the former being more powerful. Both variants come with a 5120-bit interface, with the SXM5 version using HBM3 memory running at 3.0 Gbps speed and the PCIe version using HBM2E memory running at 2.0 Gbps. Both versions use the same capacity capped at 80 GBs. However, that could soon change with the latest rumor suggesting that NVIDIA could be preparing a PCIe version of Hopper H100 GPU with 120 GBs of an unknown type of memory installed.

According to the Chinese website "s-ss.cc" the 120 GB variant of the H100 PCIe card will feature an entire GH100 chip with everything unlocked. As the site suggests, this version will improve memory capacity and performance over the regular H100 PCIe SKU. With HPC workloads increasing in size and complexity, more significant memory allocation is needed for better performance. With the recent advances in Large Language Models (LLMs), AI workloads use trillions of parameters for tranining, most of which is done on GPUs like NVIDIA H100.

NVIDIA Ada's 4th Gen Tensor Core, 3rd Gen RT Core, and Latest CUDA Core at a Glance

Yesterday, NVIDIA launched its GeForce RTX 40-series, based on the "Ada" graphics architecture. We're yet to receive a technical briefing about the architecture itself, and the various hardware components that make up the silicon; but NVIDIA on its website gave us a first look at what's in store with the key number-crunching components of "Ada," namely the Ada CUDA core, 4th generation Tensor core, and 3rd generation RT core. Besides generational IPC and clock speed improvements, the latest CUDA core benefits from SER (shader execution reordering), an SM or GPC-level feature that reorders execution waves/threads to optimally load each CUDA core and improve parallelism.

Despite using specialized hardware such as the RT cores, the ray tracing pipeline still relies on CUDA cores and the CPU for a handful tasks, and here NVIDIA claims that SER contributes to a 3X ray tracing performance uplift (the performance contribution of CUDA cores). With traditional raster graphics, SER contributes a meaty 25% performance uplift. With Ada, NVIDIA is introducing its 4th generation of Tensor core (after Volta, Turing, and Ampere). The Tensor cores deployed on Ada are functionally identical to the ones on the Hopper H100 Tensor Core HPC processor, featuring the new FP8 Transformer Engine, which delivers up to 5X the AI inference performance over the previous generation Ampere Tensor Core (which itself delivered a similar leap by leveraging sparsity).

NVIDIA Rush-Orders A100 and H100 AI-GPUs with TSMC Before US Sanctions Hit

Early this month, the US Government banned American companies from exporting AI-acceleration GPUs to China and Russia, but these restrictions don't take effect before March 2023. This gives NVIDIA time to take rush-orders from Chinese companies for its AI-accelerators before the sanctions hit. The company has placed "rush orders" for a large quantity of A100 "Ampere" and H100 "Hopper" chips with TSMC, so they could be delivered to firms in China before March 2023, according to a report by Chinese business news publication UDN. The rush-orders for high-margin products such as AI-GPUs, could come as a shot in the arm for NVIDIA, which is facing a sudden loss in gaming GPU revenues, as those chips are no longer in demand from crypto-currency miners.

NVIDIA Hopper Features "SM-to-SM" Comms Within GPC That Minimize Cache Roundtrips and Boost Multi-Instance Performance

NVIDIA in its HotChips 34 presentation revealed a defining feature of its "Hopper" compute architecture that works to increase parallelism and help the H100 processor better perform in a multi-instance environment. The hardware component hierarchy of "Hopper" is typical of NVIDIA architectures, with GPCs, SMs, and CUDA cores forming a hierarchy. The company is introducing a new component it calls "SM to SM Network." This is a high-bandwidth communications fabric inside the Graphics Processing Cluster (GPC), which facilitates direct communication among the SMs without making round-trips to the cache or memory hierarchy, play a significant role in NVIDIA's overarching claim of "6x throughput gain over the A100."

Direct SM-to-SM communication not just impacts latency, but also unburdens the L2 cache, letting NVIDIA's memory-management free up the cache of "cooler" (infrequently accessed) data. CUDA sees every GPU as a "grid," every GPC as a "Cluster," every SM as a "thread block," and every lane of SIMD units as a "lane." Each lane has a 64 KB of shared memory, which makes up 256 KB of shared local storage per SM as there are four lanes. The GPCs interface with 50 MB of L2 cache, which is the last-level on-die cache before the 80 GB of HBM3 serves as main memory.

NVIDIA Grace CPU Specs Remind Us Why Intel Never Shared x86 with the Green Team

NVIDIA designed the Grace CPU, a processor in the classical sense, to replace the Intel Xeon or AMD EPYC processors it was having to cram into its pre-built HPC compute servers for serial-processing roles, and mainly because those half-a-dozen GPU HPC processors need to be interconnected by a CPU. The company studied the CPU-level limitations and bottlenecks not just with I/O, but also the machine-architecture, and realized its compute servers need a CPU purpose-built for the role, with an architecture that's heavily optimized for NVIDIA's APIs. This, the NVIDIA Grace CPU was born.

This is NVIDIA's first outing with a CPU with a processing footprint rivaling server processors from Intel and AMD. Built on the TSMC N4 (4 nm EUV) silicon fabrication process, it is a monolithic chip that's deployed standalone with an H100 HPC processor on a single board that NVIDIA calls a "Superchip." A board with a Grace and an H100, makes up a "Grace Hopper" Superchip. A board with two Grace CPUs makes a Grace CPU Superchip. Each Grace CPU contains a 900 GB/s switching fabric, a coherent interface, which has seven times the bandwidth of PCI-Express 5.0 x16. This is key to connecting the companion H100 processor, or neighboring Superchips on the node, with coherent memory access.

Intel Claims "Ponte Vecchio" Will Trade Blows with NVIDIA Hopper in Most Compute Workloads

With AMD and NVIDIA launching its next-generation HPC compute architectures, "Hopper" and CDNA2, it began seeming like Intel's ambitious "Ponte Vecchio" accelerator based on the Xe-HP architecture, has missed the time-to-market bus. Intel doesn't think so, and in its Hot Chips 34 presentation, disclosed some of the first detailed performance claims that—at least on paper—put the "Hopper" H100 accelerator's published compute performance numbers to shame. We already had some idea of how Ponte Vecchio would perform this spring, at Intel's ISC'22 presentation, but the company hadn't finalized the product's power and thermal characteristics, which are determined by its clock-speed and boosting behavior. Team blue claims to have gotten over the final development hurdles, and is ready with some big numbers.

Intel claims that in classic FP32 (single-precision) and FP64 (double-precision) floating-point tests, its silicon is highly competitive with the H100 "Hopper," with the company claiming 52 TFLOP/s FP32 for the "Ponte Vecchio," compared to 60 TFLOP/s for the H100; and a significantly higher 52 TFLOP/s FP64 for the "Ponte Vecchio," compared to 30 TFLOP/s for the H100. This has to do with the SIMD units of the Xe-HP architecture all being natively capable of double-precision floating-point operations; whereas NVIDIA's architecture typically relies on FP64-specialized streaming multiprocessors.

NVIDIA PrefixRL Model Designs 25% Smaller Circuits, Making GPUs More Efficient

When designing integrated circuits, engineers aim to produce an efficient design that is easier to manufacture. If they manage to keep the circuit size down, the economics of manufacturing that circuit is also going down. NVIDIA has posted on its technical blog a technique where the company uses an artificial intelligence model called PrefixRL. Using deep reinforcement learning, NVIDIA uses the PrefixRL model to outperform traditional EDA (Electronics Design Automation) tools from major vendors such as Cadence, Synopsys, or Siemens/Mentor. EDA vendors usually implement their in-house AI solution to silicon placement and routing (PnR); however, NVIDIA's PrefixRL solution seems to be doing wonders in the company's workflow.

Creating a deep reinforcement learning model that aims to keep the latency the same as the EDA PnR attempt while achieving a smaller die area is the goal of PrefixRL. According to the technical blog, the latest Hopper H100 GPU architecture uses 13,000 instances of arithmetic circuits that the PrefixRL AI model designed. NVIDIA produced a model that outputs a 25% smaller circuit than comparable EDA output. This is all while achieving similar or better latency. Below, you can compare a 64-bit adder design made by PrefixRL and the same design made by an industry-leading EDA tool.

NVIDIA H100 SXM Hopper GPU Pictured Up Close

ServeTheHome, a tech media outlet focused on everything server/enterprise, posted an exclusive set of photos of NVIDIA's latest H100 "Hopper" accelerator. Being the fastest GPU NVIDIA ever created, H100 is made on TSMC's 4 nm manufacturing process and features over 80 billion transistors on an 814 mm² CoWoS package designed by TSMC. Complementing the massive die, we have 80 GB of HBM3 memory that sits close to the die. Pictured below, we have an SXM5 H100 module packed with VRM and power regulation. Given that the rated TDP for this GPU is 700 Watts, power regulation is a serious concern and NVIDIA managed to keep it in check.

On the back of the card, we see one short and one longer mezzanine connector that acts as a power delivery connector, different from the previous A100 GPU layout. This board model is labeled PG520 and is very close to the official renders that NVIDIA supplied us with on launch day.

NVIDIA Hopper Whitepaper Reveals Key Specs of Monstrous Compute Processor

The NVIDIA GH100 silicon powering the next-generation NVIDIA H100 compute processor is a monstrosity on paper, with an NVIDIA whitepaper published over the weekend revealing its key specifications. NVIDIA is tapping into the most advanced silicon fabrication node currently available from TSMC to build the compute die, which is TSMC N4 (4 nm-class EUV). The H100 features a monolithic silicon surrounded by up to six on-package HBM3 stacks.

The GH100 compute die is built on the 4 nm EUV process, and has a monstrous transistor-count of 80 billion, a nearly 50% increase over the GA100. Interestingly though, at 814 mm², the die-area of the GH100 is less than that of the GA100, with its 826 mm² die built on the 7 nm DUV (TSMC N7) node, all thanks to the transistor-density gains of the 4 nm node over the 7 nm one.

NVIDIA Allegedly Testing a 900 Watt TGP Ada Lovelace AD102 GPU

With the release of Hopper, NVIDIA's cycle of new architecture releases is not yet over. Later this year, we expect to see next-generation gaming architecture codenamed Ada Lovelace. According to a well-known hardware leaker for NVIDIA products, @kopite7kimi, on Twitter, the green team is reportedly testing a potent variant of the upcoming AD102 SKU. As the leak indicates, we could see an Ada Lovelace AD102 SKU with a Total Graphics Power (TGP) of 900 Watts. While we don't know where this SKU is supposed to sit in the Ada Lovelace family, it could be the most powerful, Titan-like design making a comeback. Alternatively, this could be a GeForce RTX 4090 Ti SKU. It carries 48 GB of GDDR6X memory running at 24 Gbps speeds alongside monstrous TGP. Feeding the card are two 16-pin connectors.

Another confirmation from the leaker is that the upcoming RTX 4080 GPU uses the AD103 SKU variant, while the RTX 4090 uses AD102. For further information, we have to wait a few more months and see what NVIDIA decides to launch in the upcoming generation of gaming-oriented graphics cards.

NVIDIA GeForce RTX 4090/4080 to Feature up to 24 GB of GDDR6X Memory and 600 Watt Board Power

After the data center-oriented Hopper architecture launch, NVIDIA is slowly preparing to transition the consumer section to new, gaming-focused designs codenamed Ada Lovelace. For starters, the source claims that NVIDIA is using the upcoming GeForce RTX 3090 Ti GPU as a test run for the next-generation Ada Lovelace AD102 GPU. Thanks to the authorities over at Igor's Lab, we have some additional information about the upcoming lineup. We have a sneak peek of a few features regarding the top-end GeForce RTX 4080 and RTX 4090 GPU SKUs. According to Igor's claims, NVIDIA is testing the PCIe Gen5 power connector and wants to see how it fares with the biggest GA102 SKU - GeForce RTX 3090 Ti.

Additionally, we find that the AD102 GPU is supposed to be pin-compatible with GA102. This means that the number of pins located on GA102 is the same as what we are going to see on AD102. There are 12 places for memory modules on the AD102 reference design board, resulting in up to 24 GB of GDDR6X memory. As much as 24 voltage converters surround the GPU, NVIDIA will likely implement uP9512 SKU. It can drive eight phases, resulting in three voltage converters per phase, ensuring proper power delivery. The total board power (TBP) is likely rated at up to 600 Watts, meaning that the GPU, memory, and power delivery combined output 600 Watts of heat. Igor notes that board partners will bundle 12+4 (12VHPWR) to four 8-pin (PCIe old) converters to enable PSU compatibility.

NVIDIA Announces Hopper Architecture, the Next Generation of Accelerated Computing

GTC—To power the next wave of AI data centers, NVIDIA today announced its next-generation accelerated computing platform with NVIDIA Hopper architecture, delivering an order of magnitude performance leap over its predecessor. Named for Grace Hopper, a pioneering U.S. computer scientist, the new architecture succeeds the NVIDIA Ampere architecture, launched two years ago.

The company also announced its first Hopper-based GPU, the NVIDIA H100, packed with 80 billion transistors. The world's largest and most powerful accelerator, the H100 has groundbreaking features such as a revolutionary Transformer Engine and a highly scalable NVIDIA NVLink interconnect for advancing gigantic AI language models, deep recommender systems, genomics and complex digital twins.

NVIDIA GTC 2022 Keynote Liveblog: NVIDIA Hopper Architecture Unveil

NVIDIA today kicked off the 2022 Graphics Technology Conference, its annual gathering of compute and gaming developers discovering the very next in AI, data-science, HPC, graphics, autonomous machines, edge computing, and networking. At the 2022 show premiering now, NVIDIA is expected to unveil its next-generation "Hopper" architecture, which could make its debut as an AI/HPC product, much like "Ampere." Stay tuned for our live blog!

15:00 UTC: The show gets underway with a thank-you to the sponsors.

NVIDIA "Hopper" Might Have Huge 1000 mm² Die, Monolithic Design

Renowned hardware leaker kopike7kimi on Twitter revealed some purported details on NVIDIA's next-generation architecture for HPC (High Performance Computing), Hopper. According to the leaker, Hopper is still sporting a classic monolithic die design despite previous rumors, and it appears that NVIDIA's performance targets have led to the creation of a monstrous, ~1000 mm² die package for the GH100 chip, which usually maxes out the complexity and performance that can be achieved on a particular manufacturing process. This is despite the fact that Hopper is also rumored to be manufactured under TSMC's 5 nm technology, thus achieving higher transistor density and power efficiency compared to the 8 nm Samsung process that NVIDIA is currently contracting. At the very least, it means that the final die will be bigger than the already enormous 826 mm² of NVIDIA's GA100.

If this is indeed the case and NVIDIA isn't deploying a MCM (Multi-Chip Module) design on Hopper, which is designed for a market with increased profit margins, it likely means that less profitable consumer-oriented products from NVIDIA won't be featuring the technology either. MCM designs also make more sense in NVIDIA's HPC products, as they would enable higher theoretical performance when scaling - exactly what that market demands. Of course, NVIDIA could be looking to develop an MCM version of the GH100 still; but if that were to happen, the company could be looking to pair two of these chips together as another HPC product (rumored GH-102). ~2,000 mm² in a single GPU package, paired with increased density and architectural improvements might actually be what NVIDIA requires to achieve the 3x performance jump from the Ampere-based A100 the company is reportedly targeting.

AMD Readies MI250X Compute Accelerator with 110 CUs and 128 GB HBM2E

AMD is preparing an update to its compute accelerator lineup with the new MI250X. Based on the CDNA2 architecture, and built on existing 7 nm node, the MI250X will be accompanied by a more affordable variant, the MI250. According to leaks put out by ExecutableFix, the MI250X packs a whopping 110 compute units (7,040 stream processors), running at 1.70 GHz. The package features 128 GB of HBM2E memory, and a package TDP of 500 W. As for speculative performance numbers, it is expected to offer double-precision (FP64) throughput of 47.9 TFLOP/s, ditto full-precision (FP32), and 383 TFLOP/s half-precision (FP16 and BFLOAT16). AMD's MI200 "Aldebaran" family of compute accelerators are expected to square off against Intel's "Ponte Vecchio" Xe-HPC, and NVIDIA Hopper H100 accelerators in 2022.
Return to Keyword Browsing
Dec 19th, 2024 18:07 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts