News Posts matching #Infinity Fabric

Return to Keyword Browsing

ASRock Rack Unveils GPU Servers, Offers AI GPU Choices from All Three Brands

ASRock Rack sells the entire stack of servers a data-center could possibly want, and at Computex 2024, the company showed us their servers meant for AI GPUs. The 6U8M-GENOA2, as its name suggests, is a 6U server based on 2P AMD EPYC 9004 series "Genoa" processors in the SP5 package. You can configure it with even the variants of "Genoa" that come with 3D V-cache, for superior compute performance from the large cache. Each of the two SP5 sockets is wired to 12 DDR5 RDIMM slots, for a total of 24 memory channels. The server supports eight AMD Instinct MI300X or MI325X AI GPUs, which it wires out using Infinity Fabric links and PCIe Gen 5 x16 individually. A 3 kW 80 Plus Titanium PSU keeps the server fed. There are vacant Gen 5 x16 slots left even after connecting the GPUs, so you could give it a DPU-based 40 GbE NIC.

The 6U8X-EGS2 B100 is a 6U AI GPU server modeled along the 6U8M-GENOA2, with a couple of big changes. To begin with, the EPYC "Genoa" chips make way for a 2P Intel Xeon Socket E (LGA4677) CPU setup, for 2P Xeon 5 "Emerald Rapids" processors. Each socket is wired to 16 DDR5 DIMM slots (the processor itself has 8-channel DDR5, but this is a 2 DIMM-per-channel setup). The server integrates an NVIDIA NVSwitch that wires out NVLinks to eight NVIDIA B100 "Blackwell" AI GPUs. The server features eight HHHL PCIe Gen 5 x16, and five FHHL PCIe Gen 5 x16 connectors. There are vacant x16 slots for your DPU/NIC, you can even use an AIC NVIDIA BlueField card. The same 3 kW PSU as the "Genoa" system is also featured here.

Supermicro Extends AI and GPU Rack Scale Solutions with Support for AMD Instinct MI300 Series Accelerators

Supermicro, Inc., a Total IT Solution Manufacturer for AI, Cloud, Storage, and 5G/Edge, is announcing three new additions to its AMD-based H13 generation of GPU Servers, optimized to deliver leading-edge performance and efficiency, powered by the new AMD Instinct MI300 Series accelerators. Supermicro's powerful rack scale solutions with 8-GPU servers with the AMD Instinct MI300X OAM configuration are ideal for large model training.

The new 2U liquid-cooled and 4U air-cooled servers with the AMD Instinct MI300A Accelerated Processing Units (APUs) accelerators are available and improve data center efficiencies and power the fast-growing complex demands in AI, LLM, and HPC. The new systems contain quad APUs for scalable applications. Supermicro can deliver complete liquid-cooled racks for large-scale environments with up to 1,728 TFlops of FP64 performance per rack. Supermicro worldwide manufacturing facilities streamline the delivery of these new servers for AI and HPC convergence.

AMD Announces Appointment of New Corporate Fellows

AMD today announced the appointment of five technical leaders to the role of AMD Corporate Fellow. These appointments recognize each leader's significant impact on semiconductor innovation across various areas, from graphics architecture to advanced packaging. "David, Nathan, Suresh, Ben and Ralph - whose engineering contributions have already left an indelible mark on our industry - represent the best of our innovation culture," said Mark Papermaster, chief technology officer and executive vice president of Technology and Engineering at AMD. "Their appointments to Corporate Fellow will enable AMD to innovate in new dimensions as we work to deliver the most significant breakthroughs in high-performance computing in the decade ahead."

Appointment to AMD Corporate Fellow is an honor bestowed on the most accomplished AMD innovators. AMD Corporate Fellows are appointed after a rigorous review process that assesses not only specific technical contributions to the company, but also involvement in the industry, mentoring of others and improving the long-term strategic position of the company. Currently, only 13 engineers at AMD hold the title of Corporate Fellow.

AMD Explains the Economics Behind Chiplets for GPUs

AMD, in its technical presentation for the new Radeon RX 7900 series "Navi 31" GPU, gave us an elaborate explanation on why it had to take the chiplets route for high-end GPUs, devices that are far more complex than CPUs. The company also enlightened us on what sets chiplet-based packages apart from classic multi-chip modules (MCMs). An MCM is a package that consists of multiple independent devices sharing a fiberglass substrate.

An example of an MCM would be a mobile Intel Core processor, in which the CPU die and the PCH die share a substrate. Here, the CPU and the PCH are independent pieces of silicon that can otherwise exist on their own packages (as they do on the desktop platform), but have been paired together on a single substrate to minimize PCB footprint, which is precious on a mobile platform. A chiplet-based device is one where a substrate is made up of multiple dies that cannot otherwise independently exist on their own packages without an impact on inter-die bandwidth or latency. They are essentially what should have been components on a monolithic die, but disintegrated into separate dies built on different semiconductor foundry nodes, with a purely cost-driven motive.

AMD EPYC "Genoa" Zen 4 Product Stack Leaked

With its recent announcement of the Ryzen 7000 desktop processors, the action now shifts to the server, with AMD preparing a wide launch of its EPYC "Genoa" and "Bergamo" processors this year. Powered by the "Zen 4" microarchitecture, and contemporary I/O that includes PCI-Express Gen 5, CXL, and DDR5, these processors dial the CPU core-counts per socket up to 96 in case of "Genoa," and up to 128 in case of "Bergamo." The EPYC "Genoa" series represents the main trunk of the company's server processor lineup, with various internal configurations targeting specific use-cases.

The 96 cores are spread twelve 5 nm 8-core CCDs, each with a high-bandwidth Infinity Fabric path to the sIOD (server I/O die), which is very likely built on the 6 nm node. Lower core-count models can be built either by lowering the CCD count (ensuring more cores/CCD), or by reducing the number of cores/CCD and keeping the CCD-count constant, to yield more bandwidth/core. The leaked product-stack table below shows several of these sub-classes of "Genoa" and "Bergamo," classified by use-cases. The leaked slide also details the nomenclature AMD is using with its new processors. The leaked roadmap also mentions the upcoming "Genoa-X" processor for HPC and cloud-compute uses, which features the 3D Vertical Cache technology.

AMD Releases its CDNA2 MI250X "Aldebaran" HPC GPU Block Diagram

AMD in its HotChips 22 presentation released a block-diagram of its biggest AI-HPC processor, the Instinct MI250X. Based on the CDNA2 compute architecture, at the heart of the MI250X is the "Aldebaran" MCM (multi-chip module). This MCM contains two logic dies (GPU dies), and eight HBM2E stacks, four per GPU die. The two GPU dies are connected by a 400 GB/s Infinity Fabric link. They each have up to 500 GB/s of external Infinity Fabric bandwidth for inter-socket communications; and PCI-Express 4.0 x16 as the host system bus for AIC form-factors. The two GPU dies together make up 58 billion transistors, and are fabricated on the TSMC N6 (6 nm) node.

The component hierarchy of each GPU die sees eight Shader Engines share a last-level L2 cache. The eight Shader Engines total 112 Compute Units, or 14 CU per engine. The CDNA2 compute unit contains 64 stream processors making up the Shader Core, and four Matrix Core Units. These are specialized hardware for matrix/tensor math operations. There are hence 7,168 stream processors per GPU die, and 14,336 per package. AMD claims a 100% increase in double-precision compute performance over CDNA (MI100). AMD attributes this to increases in frequencies, efficient data paths, extensive operand reuse and forwarding; and power-optimization enabling those higher clocks. The MI200 is already powering the Frontier supercomputer, and is working for more design wins in the HPC space. The company also dropped a major hint that the MI300, based on CDNA3, will be an APU. It will incorporate GPU dies, core-logic, and CPU CCDs onto a single package, in what is a rival solution to NVIDIA Grace Hopper Superchip.

Ryzen 7000 Said to Have a DDR5-6000 Memory "Sweet Spot"

If you remember, there were quite a lot of discussions about memory speed "sweet spots" for both the Ryzen 3000- and Ryzen 5000-series, with the user experience not always meeting AMD's sweet spot for memory clocks. Now details of the Ryzen 7000-series memory sweet spot has arrived courtesy of Wccftech and the speed is said to be DDR5-6000. This is 400 MHz higher than the apparent official maximum memory clock speed of DDR5-5600, but as we know, the manufacturer's max memory clock is rarely the actual max. In AMD's case, things obviously work a bit differently, as the Infinity Fabric clock should ideally run at a 1:1 ratio with the memory in the case of the AM4 platform, to deliver best possible system performance and memory latencies.

That said, as we're using DDR memory, the actual clocks are only half of the memory speeds, so the IF clock is operating at no more than 2000 MHz if the memory is DDR4-4000. However, if the same applies to the Ryzen 7000-series, it appears that AMD has managed to bump the IF clocks by a not insignificant 1000 MHz, as the IF fabric would now be operating at up to 3000 MHz. This could see the Ryzen 7000-series offering better memory latencies than Intel's Alder Lake and upcoming Raptor Lake CPUs, as Intel is running DDR5 memory at a 2:1 ratio or a 4:1 ratio. AMD is said to still have a 2:1 ratio as well, but as with the AM4 CPUs, this offers worse overall performance.

Update 11:49 UTC: Yuri Bubliy aka @1usmus has confirmed on Twitter that the max IF frequency of 3000 MHz and it seems like AMD has added a range of new memory and bus related features to the AM5 platform, going by the additional features he posted.

AMD's Second Socket AM5 Ryzen Processor will be "Granite Ridge," Company Announces "Phoenix Point"

AMD in its 2022 Financial Analyst Day presentation announced the codename for the second generation of Ryzen desktop processors for Socket AM5, which is "Granite Ridge." A successor to the Ryzen 7000 "Raphael," the next-generation "Granite Ridge" processor will incorporate the "Zen 5" CPU microarchitecture, with its CPU complex dies (CCDs) built on the 4 nm silicon fabrication node. "Zen 5" will feature several core-level designs as detailed in our older article, including a redesigned front-end with greater parallelism, which should indicate a much large execution stage. The architecture could also incorporate AI/ML performance enhancements as AMD taps into Xilinx IP to add more fixed-function hardware backing the AI/ML capabilities of its processors.

The "Zen 5" microarchitecture makes its client debut with Ryzen "Granite Ridge," and server debut with EPYC "Turin." It's being speculated that AMD could give "Turin" a round of CPU core-count increases, while retaining the same SP5 infrastructure; which means we could see either smaller CCDs, or higher core-count per CCD with "Zen 5." Much like "Raphael," the next-gen "Granite Ridge" will be a series of high core-count desktop processors that will feature a functional iGPU that's good enough for desktop/productivity, though not gaming. AMD confirmed that it doesn't see "Raphael" as an APU, and that its definition of an "APU" is a processor with a large iGPU that's capable of gaming. The company's next such APU will be "Phoenix Point."

AMD CDNA3 Architecture Sees the Inevitable Fusion of Compute Units and x86 CPU at Massive Scale

AMD in its 2022 Financial Analyst Day presentation unveiled its next-generation CDNA3 compute architecture, which will see something we've been expecting for a while—a compute accelerator that has a large number of compute units for scalar processing, and a large number of x86-64 CPU cores based on some future "Zen" microarchitecture, onto a single package. The presence of CPU cores on the package would eliminate the need for the system to have an EPYC or Xeon processor at its head, and clusters of Instinct CDNA3 processors could run themselves without the need for a CPU and its system memory.

The Instinct CDNA3 processor will feature an advanced packaging technology that brings various IP blocks together as chiplets, each based on a node most economical to it, without compromising on its function. The package features stacked HBM memory, and this memory is shared not just by the compute units and x86 cores, but also forms part of large shared memory pools accessible across packages. 4th Generation Infinity Fabric ties it all together.

AMD Ryzen 7000 "Phoenix" APUs with RDNA3 Graphics to Rock Large 3D V-Cache

AMD's next-generation Ryzen 7000-series "Phoenix" mobile processors are all the rage these days. Bound for 2023, these chips feature a powerful iGPU based on the RDNA3 graphics architecture, with performance allegedly rivaling that of a GeForce RTX 3060 Laptop GPU—a popular performance-segment discrete GPU. What's more, AMD is also taking a swing at Intel in the CPU core-count game, by giving "Phoenix" a large number of "Zen 4" CPU cores. The secret ingredient pushing this combo, however, is a large cache.

AMD has used large caches to good effect both on its "Zen 3" processors, such as the Ryzen 7 5800X3D, where they're called 3D Vertical Cache (3D V-cache); as well as its Radeon RX 6000 discrete GPUs, where they're called Infinity Cache. The only known difference between the two is that the latter is fully on-die, while the former is stacked on top of existing silicon IP. It's being reported now, that "Phoenix" will indeed feature a stacked 3D V-cache.

"Navi 31" RDNA3 Sees AMD Double Down on Chiplets: As Many as 7

Way back in January 2021, we heard a spectacular rumor about "Navi 31," the next-generation big GPU by AMD, being the company's first logic-MCM GPU (a GPU with more than one logic die). The company has a legacy of MCM GPUs, but those have been a single logic die surrounded by memory stacks. The RDNA3 graphics architecture that the "Navi 31" is based on, sees AMD fragment the logic die into smaller chiplets, with the goal of ensuring that only those specific components that benefit from the TSMC N5 node (6 nm), such as the number crunching machinery, are built on the node, while ancillary components, such as memory controllers, display controllers, or even media accelerators, are confined to chiplets built on an older node, such as the TSMC N6 (6 nm). AMD had taken this approach with its EPYC and Ryzen processors, where the chiplets with the CPU cores got the better node, and the other logic components got an older one.

Greymon55 predicts an interesting division of labor on the "Navi 31" MCM. Apparently, the number-crunching machinery is spread across two GCD (Graphics Complex Dies?). These dies pack the Shader Engines with their RDNA3 compute units (CU), Command Processor, Geometry Processor, Asynchronous Compute Engines (ACEs), Rendering Backends, etc. These are things that can benefit from the advanced 5 nm node, enabling AMD to the CUs at higher engine clocks. There's also sound logic behind building a big GPU with two such GCDs instead of a single large GCD, as smaller GPUs can be made with a single such GCD (exactly why we have two 8-core chiplets making up a 16-core Ryzen processors, and the one of these being used to create 8-core and 6-core SKUs). The smaller GCD would result in better yields per wafer, and minimize the need for separate wafer orders for a larger die (such as in the case of the Navi 21).

AMD EPYC "Genoa" Zen 4 Processor Multi-Chip Module Pictured

Here is the first picture of a next-generation AMD EPYC "Genoa" processor with its integrated heatspreader (IHS) removed. This is also possibly the first picture of a "Zen 4" CPU Complex Die (CCD). The picture reveals as many as twelve CCDs, and a large sIOD silicon. The "Zen 4" CCDs, built on the TSMC N5 (5 nm EUV) process, look visibly similar in size to the "Zen 3" CCDs built on the N7 (7 nm) process, which means the CCD's transistor count could be significantly higher, given the transistor-density gained from the 5 nm node. Besides more number-crunching machinery on the CPU core, we're hearing that AMD will increase cache sizes, particularly the dedicated L2 cache size, which is expected to be 1 MB per core, doubling from the previous generations of the "Zen" microarchitecture.

Each "Zen 4" CCD is reported to be about 8 mm² smaller in die-area than the "Zen 3" CCD, or about 10% smaller. What's interesting, though, is that the sIOD (server I/O die) is smaller in size, too, estimated to measure 397 mm², compared to the 416 mm² of the "Rome" and "Milan" sIOD. This is good reason to believe that AMD has switched over to a newer foundry process, such as the TSMC N7 (7 nm), to build the sIOD. The current-gen sIOD is built on Global Foundries 12LPP (12 nm). Supporting this theory is the fact that the "Genoa" sIOD has a 50% wider memory I/O (12-channel DDR5), 50% more IFOP ports (Infinity Fabric over package) to interconnect with the CCDs, and the mere fact that PCI-Express 5.0 and DDR5 switching fabric and SerDes (serializer/deserializers), may have higher TDP; which together compel AMD to use a smaller node such as 7 nm, for the sIOD. AMD is expected to debut the EPYC "Genoa" enterprise processors in the second half of 2022.

AMD's Robert Hallock Confirms Lack of Manual CPU Overclocking for Ryzen 7 5800X3D

In a livestream talking about AMD's mobile CPUs with HotHardware, Robert Hallock shone some light on the rumours about the Ryzen 7 5800X3D lacking manual overclocking. As per earlier rumours, something TechPowerUp! confirmed with our own sources, AMD's Ryzen 7 5800X3D lacks support for manual CPU overclocking and AMD asked its motherboard partners to remove these features in the UEFI. According to the livestream, these CPUs are said to be hard locked, so there's no workaround when it comes to adjusting the CPU multiplier or Voltage, but at least AMD has a good reason for it.

It turns out that the 3D V-Cache is Voltage limited to a maximum of 1.3 to 1.35 Volts, which means that the regular boost Voltage of individual Ryzen CPU cores, which can hit 1.45 to 1.5 Volts, would be too high for the 3D V-Cache to handle. As such, AMD implemented the restrictions for this CPU. However, the Infinity Fabric and memory bus can still be manually overclocked. The lower Voltage boost also helps explain why the Ryzen 7 5800X3D has lower boost clocks, as it's possible that the higher Voltages are needed to hit the higher frequencies.

AMD Announces Ryzen 7 5800X3D, World's Fastest Gaming Processor

AMD today announced its Spring 2022 update for the company's Ryzen desktop processors, with as many as seven new processor models in the retail channel. The lineup is led by the Ryzen 7 5800X3D 8-core/16-thread processor, which AMD claims is the "world's fastest gaming processor." This processor introduces the 3D Vertical Cache (3DV Cache) to the consumer space.

64 MB of fast SRAM is stacked on top of the region of the CCD (8-core chiplet) that has 32 MB of on-die L3 cache, with structural silicon leveling the region over the CPU cores with it. This SRAM is tied directly with the bi-directional ring-bus that interconnects the CPU cores, L3 cache, and IFOP (Infinity Fabric Over Package) interconnect. The result is 96 MB of seamless L3 cache, with each of the 8 "Zen 3" CPU cores having equal access to all of it.

AMD Details Instinct MI200 Series Compute Accelerator Lineup

AMD today announced the new AMD Instinct MI200 series accelerators, the first exascale-class GPU accelerators. AMD Instinct MI200 series accelerators includes the world's fastest high performance computing (HPC) and artificial intelligence (AI) accelerator,1 the AMD Instinct MI250X.

Built on AMD CDNA 2 architecture, AMD Instinct MI200 series accelerators deliver leading application performance for a broad set of HPC workloads. The AMD Instinct MI250X accelerator provides up to 4.9X better performance than competitive accelerators for double precision (FP64) HPC applications and surpasses 380 teraflops of peak theoretical half-precision (FP16) for AI workloads to enable disruptive approaches in further accelerating data-driven research.

AMD Instinct MI200: Dual-GPU Chiplet; CDNA2 Architecture; 128 GB HBM2E

AMD today announced the debut of its 6 nm CDNA2 (Compute-DNA) architecture in the form of the MI200 family. The new, dual-GPU chiplet accelerator aims to lead AMD into a new era of High Performance Computing (HPC) applications, the high margin territory it needs to compete in for continued, sustainable growth. To that end, AMD has further improved on a matured, compute-oriented architecture born with Graphics Core Next (GCN) - and managed to improve performance while reducing total die size compared to its MI100 family.

New AMD Radeon PRO W6000X Series GPUs Bring Groundbreaking High-Performance AMD RDNA 2 Architecture to Mac Pro

AMD today announced availability of the new AMD Radeon PRO W6000X series GPUs for Mac Pro. The new GPU product line delivers exceptional performance and incredible visual fidelity to power a wide variety of demanding professional applications and workloads, including 3D rendering, 8K video compositing, color correction and more.

Built on groundbreaking AMD RDNA 2 architecture, AMD Infinity Cache and other advanced technologies, the new workstation graphics line-up includes the AMD Radeon PRO W6900X and AMD Radeon PRO W6800X GPUs. Mac Pro users also have the option of choosing the AMD Radeon PRO W6800X Duo graphics card, a dual-GPU configuration that leverages high-speed AMD Infinity Fabric interconnect technology to deliver outstanding levels of compute performance.

AMD Announces 3rd Generation EPYC 7003 Enterprise Processors

AMD today announced its 3rd generation EPYC (7003 series) enterprise processors, codenamed "Milan." These processors combine up to 64 of the company's latest "Zen 3" CPU cores, with an updated I/O controller die, and promise significant performance uplifts and new security capabilities over the previous generation EPYC 7002 "Rome." The "Zen 3" CPU cores, AMD claims, introduce an IPC uplift of up to 19% over the previous generation, which when combined by generational increases in CPU clock speeds, bring about significant single-threaded performance increases. The processor also comes with large multi-threaded performance gains thanks to a redesigned CCD.

The new "Zen 3" CPU complex die (CCD) comes with a radical redesign in the arrangement of CPU cores, putting all eight CPU cores of the CCD in a single CCX, sharing a large 32 MB L3 cache. This the total amount of L3 cache addressable by a CPU core, and significantly reduces latencies for multi-threaded workloads. The "Milan" multi-chip module has up to eight such CCDs talking to a centralized server I/O controller die (sIOD) over the Infinity Fabric interconnect.

AMD Ryzen 5000 Series Features Three Synchronized Memory Clock Domains

A leaked presentation slide by AMD for its Ryzen 5000 series "Zen 3" processors reveals details of the processor's memory interface. Much like the Ryzen 3000 series "Matisse," the Ryzen 5000 series "Vermeer" is a multi-chip module of up to 16 CPU cores spread across two 8-core CPU dies, and a unified I/O die that handles the processor's memory-, PCIe, and SoC interfaces. There are three configurable clock domains that ensure the CPU cores are fed with data at the right speed, and to ensure that the MCM design doesn't pose bottlenecks to the memory performance.

The first domain is fclk or Infinity Fabric clock. Each of the two CCDs (8-core CPU dies) has just one CCX (CPU core complex) with 8 cores, and hence the CCD's internal Infinity Fabric cedes relevance to the IFOP (Infinity Fabric over Package) interconnect that binds the two CCDs and the cIOD (client I/O controller die) together. The next frequency is uclk, or the internal frequency of the dual-channel DDR4 memory controller contained in the cIOD. And lastly, the mclk, or memory clock is the industry-standard DRAM frequency.

AMD 4th Gen Ryzen "Vermeer" Zen 3 Rumored to Include 10-core Parts

Yuri "1usmus" Bubliy, author of DRAM Calculator for Ryzen and the upcoming ClockTuner for Ryzen, revealed three pieces of juicy details on the upcoming 4th Gen AMD Ryzen "Vermeer" performance desktop processors. He predicts AMD turning up CPU core counts with this generation, including the introduction of new 10-core SKUs, possibly to one-up Intel in the multi-threaded performance front. Last we heard, AMD's upcoming "Zen 3" CCDs (chiplets) feature 8 CPU cores sharing a monolithic 32 MB slab of L3 cache. This should, in theory, allow AMD to create 10-core chips with two CCDs, each with 5 cores enabled.

Next up, are two features that should interest overclockers - which is Bubliy's main domain. The processors should support a feature called "Curve Optimizer," enabling finer-grained control over the boost algorithm, and on a per-core basis. As we understand, the "curve" in question could even be voltage/frequency. It remains to be seen of the feature is leveraged at a CBS level (UEFI setup program), or by Ryzen Master. Lastly, there's mention of new Infinity Fabric dividers that apparently helps you raise DCT (memory controller) frequencies "slightly higher" in mixed mode. AMD is expected to debut its 4th Gen Ryzen "Vermeer" desktop processors within 2020.

AMD Confirms CDNA-Based Radeon Instinct MI100 Coming to HPC Workloads in 2H2020

Mark Papermaster, chief technology officer and executive vice president of Technology and Engineering at AMD, today confirmed that CDNA is on-track for release in 2H2020 for HPC computing. The confirmation was (adequately) given during Dell's EMC High-Performance Computing Online event. This confirms that AMD is looking at a busy 2nd half of the year, with both Zen 3, RDNA 2 and CDNA product lines being pushed to market.

CDNA is AMD's next push into the highly-lucrative HPC market, and will see the company differentiating their GPU architectures through market-based product differentiation. CDNA will see raster graphics hardware, display and multimedia engines, and other associated components being removed from the chip design in a bid to recoup die area for both increased processing units as well as fixed-function tensor compute hardware. CNDA-based Radeon Instinct MI100 will be fabricated under TSMC's 7 nm node, and will be the first AMD architecture featuring shared memory pools between CPUs and GPUs via the 2nd gen Infinity Fabric, which should bring about both throughput and power consumption improvements to the platform.

AMD Announces Radeon Pro VII Graphics Card, Brings Back Multi-GPU Bridge

AMD today announced its Radeon Pro VII professional graphics card targeting 3D artists, engineering professionals, broadcast media professionals, and HPC researchers. The card is based on AMD's "Vega 20" multi-chip module that incorporates a 7 nm (TSMC N7) GPU die, along with a 4096-bit wide HBM2 memory interface, and four memory stacks adding up to 16 GB of video memory. The GPU die is configured with 3,840 stream processors across 60 compute units, 240 TMUs, and 64 ROPs. The card is built in a workstation-optimized add-on card form-factor (rear-facing power connectors and lateral-blower cooling solution).

What separates the Radeon Pro VII from last year's Radeon VII is full double precision floating point support, which is 1:2 FP32 throughput compared to the Radeon VII, which is locked to 1:4 FP32. Specifically, the Radeon Pro VII offers 6.55 TFLOPs double-precision floating point performance (vs. 3.36 TFLOPs on the Radeon VII). Another major difference is the physical Infinity Fabric bridge interface, which lets you pair up to two of these cards in a multi-GPU setup to double the memory capacity, to 32 GB. Each GPU has two Infinity Fabric links, running at 1333 MHz, with a per-direction bandwidth of 42 GB/s. This brings the total bidirectional bandwidth to a whopping 168 GB/s—more than twice the PCIe 4.0 x16 limit of 64 GB/s.

AMD Announces the CDNA and CDNA2 Compute GPU Architectures

AMD at its 2020 Financial Analyst Day event unveiled its upcoming CDNA GPU-based compute accelerator architecture. CDNA will complement the company's graphics-oriented RDNA architecture. While RDNA powers the company's Radeon Pro and Radeon RX client- and enterprise graphics products, CDNA will power compute accelerators such as Radeon Instinct, etc. AMD is having to fork its graphics IP to RDNA and CDNA due to what it described as market-based product differentiation.

Data centers and HPCs using Radeon Instinct accelerators have no use for the GPU's actual graphics rendering capabilities. And so, at a silicon level, AMD is removing the raster graphics hardware, the display and multimedia engines, and other associated components that otherwise take up significant amounts of die area. In their place, AMD is adding fixed-function tensor compute hardware, similar to the tensor cores on certain NVIDIA GPUs.
AMD Datacenter GPU Roadmap CDNA CDNA2 AMD CDNA Architecture AMD Exascale Supercomputer

AMD Financial Analyst Day 2020 Live Blog

AMD Financial Analyst Day presents an opportunity for AMD to talk straight with the finance industry about the company's current financial health, and a taste of what's to come. Guidance and product teasers made during this time are usually very accurate due to the nature of the audience. In this live blog, we will post information from the Financial Analyst Day 2020 as it unfolds.
20:59 UTC: The event has started as of 1 PM PST. CEO Dr Lisa Su takes stage.

AMD Scores Another EPYC Win in Exascale Computing With DOE's "El Capitan" Two-Exaflop Supercomputer

AMD has been on a roll in both consumer, professional, and exascale computing environments, and it has just snagged itself another hugely important contract. The US Department of Energy (DOE) has just announced the winners for their next-gen, exascale supercomputer that aims to be the world's fastest. Dubbed "El Capitan", the new supercomputer will be powered by AMD's next-gen EPYC Genoa processors (Zen 4 architecture) and Radeon GPUs. This is the first such exascale contract where AMD is the sole purveyor of both CPUs and GPUs, with AMD's other design win with EPYC in the Cray Shasta being paired with NVIDIA graphics cards.

El Capitan will be a $600 million investment to be deployed in late 2022 and operational in 2023. Undoubtedly, next-gen proposals from AMD, Intel and NVIDIA were presented, with AMD winning the shootout in a big way. While initially the DOE projected El Capitan to provide some 1.5 exaflops of computing power, it has now revised their performance goals to a pure 2 exaflop machine. El Capitan willl thus be ten times faster than the current leader of the supercomputing world, Summit.
Return to Keyword Browsing
Jul 15th, 2024 23:23 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts