News Posts matching #Infinity Fabric

Return to Keyword Browsing

Social Media Imagines AMD "Navi 48" RDNA 4 to be a Dual-Chiplet GPU

A Chinese tech forum ChipHell user who goes by zcjzcj11111 sprung up a fascinating take on what the next-generation AMD "Navi 48" GPU could be, and put their imagination on a render. Apparently, the "Navi 48," which powers AMD's series-topping performance-segment graphics card, is a dual chiplet-based design, similar to the company's latest Instinct MI300 series AI GPUs. This won't be a disaggregated GPU such as the "Navi 31" and "Navi 32," but rather a scale-out multi-chip module of two GPU dies that can otherwise run on their own in single-die packages. You want to call this a multi-GPU-on-a-stick? Go ahead, but there are a couple of changes.

On AMD's Instinct AI GPUs, the chiplets have full cache coherence with each other, and can address memory controlled by each other. This cache coherence makes the chiplets work like one giant chip. In a multi-GPU-on-a-stick, there would be no cache coherence, the two dies would be mapped by the host machine as two separate devices, and then you'd be at the mercy of implicit or explicit multi-GPU technologies for performance to scale. This isn't what's happening on AI GPUs—despite multiple chiplets, the GPU is seen by the host as a single PCI device with all its cache and memory visible to software as a contiguously addressable block.

AMD Granite Ridge "Zen 5" Processor Annotated

High-resolution die-shots of the AMD "Zen 5" 8-core CCD were released and annotated by Nemez, Fitzchens Fitz, and HighYieldYT. These provide a detailed view of how the silicon and its various components appear, particularly the new "Zen 5" CPU core with its 512-bit FPU. The "Granite Ridge" package looks similar to "Raphael," with up to two 8-core CPU complex dies (CCDs) depending on the processor model, and a centrally located client I/O die (cIOD). This cIOD is carried over from "Raphael," which minimizes product development costs for AMD at least for the uncore portion of the processor. The "Zen 5" CCD is built on the TSMC N4P (4 nm) foundry node.

The "Granite Ridge" package sees the up to two "Zen 5" CCDs snuck up closer to each other than the "Zen 4" CCDs on "Raphael." In the picture above, you can see the pad of the absent CCD behind the solder mask of the fiberglass substrate, close to the present CCD. The CCD contains 8 full-sized "Zen 5" CPU cores, each with 1 MB of L2 cache, and a centrally located 32 MB L3 cache that's shared among all eight cores. The only other components are an SMU (system management unit), and the Infinity Fabric over Package (IFoP) PHYs, which connect the CCD to the cIOD.

AMD Strix Point Silicon Pictured and Annotated

The first die shot of AMD's new 4 nm "Strix Point" mobile processor surfaced, thanks to an enthusiast on Chinese social media. "Strix Point" is a significantly larger die than "Phoenix." It measures 12.06 mm x 18.71 mm (L x W), compared to the 9.06 mm x 15.01 mm of "Phoenix." Much of this die size increase comes from the larger CPU, iGPU, and NPU. The process has been improved from TSMC N4 on "Phoenix" and its derivative "Hawk Point," to the newer TSMC N4P node.

Nemez (GPUsAreMagic) annotated the die shot in great detail. The CPU now has 12 cores spread across two CCX, one of which contains four "Zen 5" cores sharing a 16 MB L3 cache; and the other with eight "Zen 5c" cores sharing an 8 MB L3 cache. The two CCXs connect to the rest of the chip over Infinity Fabric. The rather large iGPU takes up the central region of the die. It is based on the RDNA 3.5 graphics architecture, and features 8 workgroup processors (WGPs), or 16 compute units (CU) worth 1,024 stream processors. Other key components include four render backends worth 16 ROPs, and control logic. The GPU has its own 2 MB of L2 cache that cushions transfers to the Infinity Fabric.

AMD Strix Point SoC Reintroduces Dual-CCX CPU, Other Interesting Silicon Details Revealed

Since its reveal last week, we got a slightly more technical deep-dive from AMD on its two upcoming processors—the "Strix Point" silicon powering its Ryzen AI 300 series mobile processors; and the "Granite Ridge" chiplet MCM powering its Ryzen 9000 desktop processors. We present a closer look into the "Strix Point" SoC in this article. It turns out that "Strix Point" takes a significantly different approach to heterogeneous multicore than "Phoenix 2." AMD gave us a close look at how this works. AMD built the "Strix Point" monolithic silicon on the TSMC N4P foundry node, with a die-area of around 232 mm².

The "Strix Point" silicon sees the company's Infinity Fabric interconnect as its omnipresent ether. This is a point-to-point interconnect, unlike the ringbus on some Intel processors. The main compute machinery on the "Strix Point" SoC are its two CPU compute complexes (CCX), each with a 32b (read)/16b (write) per cycle data-path to the fabric. The concept of CCX makes a comeback with "Strix Point" after nearly two generations of "Zen." The first CCX contains the chip's four full-sized "Zen 5" CPU cores, which share a 16 MB L3 cache among themselves. The second CCX contains the chip's eight "Zen 5c" cores that share a smaller 8 MB L3 cache. Each of the 12 cores has a 1 MB dedicated L2 cache.

Ryzen 9000 Chip Layout: New Details Announced

AMD "Granite Ridge" is codename for the four new Ryzen 9000 series desktop processors the company plans to launch on July 31, 2024. The processor is built in the Socket AM5 package, and is meant to be backwards compatible with AMD 600-series chipset motherboards, besides the new 800-series chipset ones that will launch alongside. "Granite Ridge" is a chiplet-based processor, much like the Ryzen 7000 "Raphael," Ryzen 5000 "Vermeer," and Ryzen 3000 "Matisse." AMD is carrying over the 6 nm client I/O die over from "Raphael" in an effort to minimize development costs, much in the same way it carried over the 12 nm cIOD for "Vermeer" from "Matisse."

The SoC I/O features of "Granite Ridge" are contemporary, with its awesome 28-lane PCI-Express Gen 5 root complex that allows a PCI-Express 5.0 x16, two CPU-attached M.2 Gen 5 slots, and a Gen 5 x4 chipset bus. There's also a basic integrated graphics solution based on the older RDNA 2 graphics architecture; which should make these processors fit for all use-cases that don't need discrete graphics. The iGPU even has multimedia accelerators, an audio coprocessor, a display controller, and USB 3.2 interfaces from the processor.

ASRock Rack Unveils GPU Servers, Offers AI GPU Choices from All Three Brands

ASRock Rack sells the entire stack of servers a data-center could possibly want, and at Computex 2024, the company showed us their servers meant for AI GPUs. The 6U8M-GENOA2, as its name suggests, is a 6U server based on 2P AMD EPYC 9004 series "Genoa" processors in the SP5 package. You can configure it with even the variants of "Genoa" that come with 3D V-cache, for superior compute performance from the large cache. Each of the two SP5 sockets is wired to 12 DDR5 RDIMM slots, for a total of 24 memory channels. The server supports eight AMD Instinct MI300X or MI325X AI GPUs, which it wires out using Infinity Fabric links and PCIe Gen 5 x16 individually. A 3 kW 80 Plus Titanium PSU keeps the server fed. There are vacant Gen 5 x16 slots left even after connecting the GPUs, so you could give it a DPU-based 40 GbE NIC.

The 6U8X-EGS2 B100 is a 6U AI GPU server modeled along the 6U8M-GENOA2, with a couple of big changes. To begin with, the EPYC "Genoa" chips make way for a 2P Intel Xeon Socket E (LGA4677) CPU setup, for 2P Xeon 5 "Emerald Rapids" processors. Each socket is wired to 16 DDR5 DIMM slots (the processor itself has 8-channel DDR5, but this is a 2 DIMM-per-channel setup). The server integrates an NVIDIA NVSwitch that wires out NVLinks to eight NVIDIA B100 "Blackwell" AI GPUs. The server features eight HHHL PCIe Gen 5 x16, and five FHHL PCIe Gen 5 x16 connectors. There are vacant x16 slots for your DPU/NIC, you can even use an AIC NVIDIA BlueField card. The same 3 kW PSU as the "Genoa" system is also featured here.

Supermicro Extends AI and GPU Rack Scale Solutions with Support for AMD Instinct MI300 Series Accelerators

Supermicro, Inc., a Total IT Solution Manufacturer for AI, Cloud, Storage, and 5G/Edge, is announcing three new additions to its AMD-based H13 generation of GPU Servers, optimized to deliver leading-edge performance and efficiency, powered by the new AMD Instinct MI300 Series accelerators. Supermicro's powerful rack scale solutions with 8-GPU servers with the AMD Instinct MI300X OAM configuration are ideal for large model training.

The new 2U liquid-cooled and 4U air-cooled servers with the AMD Instinct MI300A Accelerated Processing Units (APUs) accelerators are available and improve data center efficiencies and power the fast-growing complex demands in AI, LLM, and HPC. The new systems contain quad APUs for scalable applications. Supermicro can deliver complete liquid-cooled racks for large-scale environments with up to 1,728 TFlops of FP64 performance per rack. Supermicro worldwide manufacturing facilities streamline the delivery of these new servers for AI and HPC convergence.

AMD Announces Appointment of New Corporate Fellows

AMD today announced the appointment of five technical leaders to the role of AMD Corporate Fellow. These appointments recognize each leader's significant impact on semiconductor innovation across various areas, from graphics architecture to advanced packaging. "David, Nathan, Suresh, Ben and Ralph - whose engineering contributions have already left an indelible mark on our industry - represent the best of our innovation culture," said Mark Papermaster, chief technology officer and executive vice president of Technology and Engineering at AMD. "Their appointments to Corporate Fellow will enable AMD to innovate in new dimensions as we work to deliver the most significant breakthroughs in high-performance computing in the decade ahead."

Appointment to AMD Corporate Fellow is an honor bestowed on the most accomplished AMD innovators. AMD Corporate Fellows are appointed after a rigorous review process that assesses not only specific technical contributions to the company, but also involvement in the industry, mentoring of others and improving the long-term strategic position of the company. Currently, only 13 engineers at AMD hold the title of Corporate Fellow.

AMD Explains the Economics Behind Chiplets for GPUs

AMD, in its technical presentation for the new Radeon RX 7900 series "Navi 31" GPU, gave us an elaborate explanation on why it had to take the chiplets route for high-end GPUs, devices that are far more complex than CPUs. The company also enlightened us on what sets chiplet-based packages apart from classic multi-chip modules (MCMs). An MCM is a package that consists of multiple independent devices sharing a fiberglass substrate.

An example of an MCM would be a mobile Intel Core processor, in which the CPU die and the PCH die share a substrate. Here, the CPU and the PCH are independent pieces of silicon that can otherwise exist on their own packages (as they do on the desktop platform), but have been paired together on a single substrate to minimize PCB footprint, which is precious on a mobile platform. A chiplet-based device is one where a substrate is made up of multiple dies that cannot otherwise independently exist on their own packages without an impact on inter-die bandwidth or latency. They are essentially what should have been components on a monolithic die, but disintegrated into separate dies built on different semiconductor foundry nodes, with a purely cost-driven motive.

AMD EPYC "Genoa" Zen 4 Product Stack Leaked

With its recent announcement of the Ryzen 7000 desktop processors, the action now shifts to the server, with AMD preparing a wide launch of its EPYC "Genoa" and "Bergamo" processors this year. Powered by the "Zen 4" microarchitecture, and contemporary I/O that includes PCI-Express Gen 5, CXL, and DDR5, these processors dial the CPU core-counts per socket up to 96 in case of "Genoa," and up to 128 in case of "Bergamo." The EPYC "Genoa" series represents the main trunk of the company's server processor lineup, with various internal configurations targeting specific use-cases.

The 96 cores are spread twelve 5 nm 8-core CCDs, each with a high-bandwidth Infinity Fabric path to the sIOD (server I/O die), which is very likely built on the 6 nm node. Lower core-count models can be built either by lowering the CCD count (ensuring more cores/CCD), or by reducing the number of cores/CCD and keeping the CCD-count constant, to yield more bandwidth/core. The leaked product-stack table below shows several of these sub-classes of "Genoa" and "Bergamo," classified by use-cases. The leaked slide also details the nomenclature AMD is using with its new processors. The leaked roadmap also mentions the upcoming "Genoa-X" processor for HPC and cloud-compute uses, which features the 3D Vertical Cache technology.

AMD Releases its CDNA2 MI250X "Aldebaran" HPC GPU Block Diagram

AMD in its HotChips 22 presentation released a block-diagram of its biggest AI-HPC processor, the Instinct MI250X. Based on the CDNA2 compute architecture, at the heart of the MI250X is the "Aldebaran" MCM (multi-chip module). This MCM contains two logic dies (GPU dies), and eight HBM2E stacks, four per GPU die. The two GPU dies are connected by a 400 GB/s Infinity Fabric link. They each have up to 500 GB/s of external Infinity Fabric bandwidth for inter-socket communications; and PCI-Express 4.0 x16 as the host system bus for AIC form-factors. The two GPU dies together make up 58 billion transistors, and are fabricated on the TSMC N6 (6 nm) node.

The component hierarchy of each GPU die sees eight Shader Engines share a last-level L2 cache. The eight Shader Engines total 112 Compute Units, or 14 CU per engine. The CDNA2 compute unit contains 64 stream processors making up the Shader Core, and four Matrix Core Units. These are specialized hardware for matrix/tensor math operations. There are hence 7,168 stream processors per GPU die, and 14,336 per package. AMD claims a 100% increase in double-precision compute performance over CDNA (MI100). AMD attributes this to increases in frequencies, efficient data paths, extensive operand reuse and forwarding; and power-optimization enabling those higher clocks. The MI200 is already powering the Frontier supercomputer, and is working for more design wins in the HPC space. The company also dropped a major hint that the MI300, based on CDNA3, will be an APU. It will incorporate GPU dies, core-logic, and CPU CCDs onto a single package, in what is a rival solution to NVIDIA Grace Hopper Superchip.

Ryzen 7000 Said to Have a DDR5-6000 Memory "Sweet Spot"

If you remember, there were quite a lot of discussions about memory speed "sweet spots" for both the Ryzen 3000- and Ryzen 5000-series, with the user experience not always meeting AMD's sweet spot for memory clocks. Now details of the Ryzen 7000-series memory sweet spot has arrived courtesy of Wccftech and the speed is said to be DDR5-6000. This is 400 MHz higher than the apparent official maximum memory clock speed of DDR5-5600, but as we know, the manufacturer's max memory clock is rarely the actual max. In AMD's case, things obviously work a bit differently, as the Infinity Fabric clock should ideally run at a 1:1 ratio with the memory in the case of the AM4 platform, to deliver best possible system performance and memory latencies.

That said, as we're using DDR memory, the actual clocks are only half of the memory speeds, so the IF clock is operating at no more than 2000 MHz if the memory is DDR4-4000. However, if the same applies to the Ryzen 7000-series, it appears that AMD has managed to bump the IF clocks by a not insignificant 1000 MHz, as the IF fabric would now be operating at up to 3000 MHz. This could see the Ryzen 7000-series offering better memory latencies than Intel's Alder Lake and upcoming Raptor Lake CPUs, as Intel is running DDR5 memory at a 2:1 ratio or a 4:1 ratio. AMD is said to still have a 2:1 ratio as well, but as with the AM4 CPUs, this offers worse overall performance.

Update 11:49 UTC: Yuri Bubliy aka @1usmus has confirmed on Twitter that the max IF frequency of 3000 MHz and it seems like AMD has added a range of new memory and bus related features to the AM5 platform, going by the additional features he posted.

AMD's Second Socket AM5 Ryzen Processor will be "Granite Ridge," Company Announces "Phoenix Point"

AMD in its 2022 Financial Analyst Day presentation announced the codename for the second generation of Ryzen desktop processors for Socket AM5, which is "Granite Ridge." A successor to the Ryzen 7000 "Raphael," the next-generation "Granite Ridge" processor will incorporate the "Zen 5" CPU microarchitecture, with its CPU complex dies (CCDs) built on the 4 nm silicon fabrication node. "Zen 5" will feature several core-level designs as detailed in our older article, including a redesigned front-end with greater parallelism, which should indicate a much large execution stage. The architecture could also incorporate AI/ML performance enhancements as AMD taps into Xilinx IP to add more fixed-function hardware backing the AI/ML capabilities of its processors.

The "Zen 5" microarchitecture makes its client debut with Ryzen "Granite Ridge," and server debut with EPYC "Turin." It's being speculated that AMD could give "Turin" a round of CPU core-count increases, while retaining the same SP5 infrastructure; which means we could see either smaller CCDs, or higher core-count per CCD with "Zen 5." Much like "Raphael," the next-gen "Granite Ridge" will be a series of high core-count desktop processors that will feature a functional iGPU that's good enough for desktop/productivity, though not gaming. AMD confirmed that it doesn't see "Raphael" as an APU, and that its definition of an "APU" is a processor with a large iGPU that's capable of gaming. The company's next such APU will be "Phoenix Point."

AMD CDNA3 Architecture Sees the Inevitable Fusion of Compute Units and x86 CPU at Massive Scale

AMD in its 2022 Financial Analyst Day presentation unveiled its next-generation CDNA3 compute architecture, which will see something we've been expecting for a while—a compute accelerator that has a large number of compute units for scalar processing, and a large number of x86-64 CPU cores based on some future "Zen" microarchitecture, onto a single package. The presence of CPU cores on the package would eliminate the need for the system to have an EPYC or Xeon processor at its head, and clusters of Instinct CDNA3 processors could run themselves without the need for a CPU and its system memory.

The Instinct CDNA3 processor will feature an advanced packaging technology that brings various IP blocks together as chiplets, each based on a node most economical to it, without compromising on its function. The package features stacked HBM memory, and this memory is shared not just by the compute units and x86 cores, but also forms part of large shared memory pools accessible across packages. 4th Generation Infinity Fabric ties it all together.

AMD Ryzen 7000 "Phoenix" APUs with RDNA3 Graphics to Rock Large 3D V-Cache

AMD's next-generation Ryzen 7000-series "Phoenix" mobile processors are all the rage these days. Bound for 2023, these chips feature a powerful iGPU based on the RDNA3 graphics architecture, with performance allegedly rivaling that of a GeForce RTX 3060 Laptop GPU—a popular performance-segment discrete GPU. What's more, AMD is also taking a swing at Intel in the CPU core-count game, by giving "Phoenix" a large number of "Zen 4" CPU cores. The secret ingredient pushing this combo, however, is a large cache.

AMD has used large caches to good effect both on its "Zen 3" processors, such as the Ryzen 7 5800X3D, where they're called 3D Vertical Cache (3D V-cache); as well as its Radeon RX 6000 discrete GPUs, where they're called Infinity Cache. The only known difference between the two is that the latter is fully on-die, while the former is stacked on top of existing silicon IP. It's being reported now, that "Phoenix" will indeed feature a stacked 3D V-cache.

"Navi 31" RDNA3 Sees AMD Double Down on Chiplets: As Many as 7

Way back in January 2021, we heard a spectacular rumor about "Navi 31," the next-generation big GPU by AMD, being the company's first logic-MCM GPU (a GPU with more than one logic die). The company has a legacy of MCM GPUs, but those have been a single logic die surrounded by memory stacks. The RDNA3 graphics architecture that the "Navi 31" is based on, sees AMD fragment the logic die into smaller chiplets, with the goal of ensuring that only those specific components that benefit from the TSMC N5 node (6 nm), such as the number crunching machinery, are built on the node, while ancillary components, such as memory controllers, display controllers, or even media accelerators, are confined to chiplets built on an older node, such as the TSMC N6 (6 nm). AMD had taken this approach with its EPYC and Ryzen processors, where the chiplets with the CPU cores got the better node, and the other logic components got an older one.

Greymon55 predicts an interesting division of labor on the "Navi 31" MCM. Apparently, the number-crunching machinery is spread across two GCD (Graphics Complex Dies?). These dies pack the Shader Engines with their RDNA3 compute units (CU), Command Processor, Geometry Processor, Asynchronous Compute Engines (ACEs), Rendering Backends, etc. These are things that can benefit from the advanced 5 nm node, enabling AMD to the CUs at higher engine clocks. There's also sound logic behind building a big GPU with two such GCDs instead of a single large GCD, as smaller GPUs can be made with a single such GCD (exactly why we have two 8-core chiplets making up a 16-core Ryzen processors, and the one of these being used to create 8-core and 6-core SKUs). The smaller GCD would result in better yields per wafer, and minimize the need for separate wafer orders for a larger die (such as in the case of the Navi 21).

AMD EPYC "Genoa" Zen 4 Processor Multi-Chip Module Pictured

Here is the first picture of a next-generation AMD EPYC "Genoa" processor with its integrated heatspreader (IHS) removed. This is also possibly the first picture of a "Zen 4" CPU Complex Die (CCD). The picture reveals as many as twelve CCDs, and a large sIOD silicon. The "Zen 4" CCDs, built on the TSMC N5 (5 nm EUV) process, look visibly similar in size to the "Zen 3" CCDs built on the N7 (7 nm) process, which means the CCD's transistor count could be significantly higher, given the transistor-density gained from the 5 nm node. Besides more number-crunching machinery on the CPU core, we're hearing that AMD will increase cache sizes, particularly the dedicated L2 cache size, which is expected to be 1 MB per core, doubling from the previous generations of the "Zen" microarchitecture.

Each "Zen 4" CCD is reported to be about 8 mm² smaller in die-area than the "Zen 3" CCD, or about 10% smaller. What's interesting, though, is that the sIOD (server I/O die) is smaller in size, too, estimated to measure 397 mm², compared to the 416 mm² of the "Rome" and "Milan" sIOD. This is good reason to believe that AMD has switched over to a newer foundry process, such as the TSMC N7 (7 nm), to build the sIOD. The current-gen sIOD is built on Global Foundries 12LPP (12 nm). Supporting this theory is the fact that the "Genoa" sIOD has a 50% wider memory I/O (12-channel DDR5), 50% more IFOP ports (Infinity Fabric over package) to interconnect with the CCDs, and the mere fact that PCI-Express 5.0 and DDR5 switching fabric and SerDes (serializer/deserializers), may have higher TDP; which together compel AMD to use a smaller node such as 7 nm, for the sIOD. AMD is expected to debut the EPYC "Genoa" enterprise processors in the second half of 2022.

AMD's Robert Hallock Confirms Lack of Manual CPU Overclocking for Ryzen 7 5800X3D

In a livestream talking about AMD's mobile CPUs with HotHardware, Robert Hallock shone some light on the rumours about the Ryzen 7 5800X3D lacking manual overclocking. As per earlier rumours, something TechPowerUp! confirmed with our own sources, AMD's Ryzen 7 5800X3D lacks support for manual CPU overclocking and AMD asked its motherboard partners to remove these features in the UEFI. According to the livestream, these CPUs are said to be hard locked, so there's no workaround when it comes to adjusting the CPU multiplier or Voltage, but at least AMD has a good reason for it.

It turns out that the 3D V-Cache is Voltage limited to a maximum of 1.3 to 1.35 Volts, which means that the regular boost Voltage of individual Ryzen CPU cores, which can hit 1.45 to 1.5 Volts, would be too high for the 3D V-Cache to handle. As such, AMD implemented the restrictions for this CPU. However, the Infinity Fabric and memory bus can still be manually overclocked. The lower Voltage boost also helps explain why the Ryzen 7 5800X3D has lower boost clocks, as it's possible that the higher Voltages are needed to hit the higher frequencies.

AMD Announces Ryzen 7 5800X3D, World's Fastest Gaming Processor

AMD today announced its Spring 2022 update for the company's Ryzen desktop processors, with as many as seven new processor models in the retail channel. The lineup is led by the Ryzen 7 5800X3D 8-core/16-thread processor, which AMD claims is the "world's fastest gaming processor." This processor introduces the 3D Vertical Cache (3DV Cache) to the consumer space.

64 MB of fast SRAM is stacked on top of the region of the CCD (8-core chiplet) that has 32 MB of on-die L3 cache, with structural silicon leveling the region over the CPU cores with it. This SRAM is tied directly with the bi-directional ring-bus that interconnects the CPU cores, L3 cache, and IFOP (Infinity Fabric Over Package) interconnect. The result is 96 MB of seamless L3 cache, with each of the 8 "Zen 3" CPU cores having equal access to all of it.

AMD Details Instinct MI200 Series Compute Accelerator Lineup

AMD today announced the new AMD Instinct MI200 series accelerators, the first exascale-class GPU accelerators. AMD Instinct MI200 series accelerators includes the world's fastest high performance computing (HPC) and artificial intelligence (AI) accelerator,1 the AMD Instinct MI250X.

Built on AMD CDNA 2 architecture, AMD Instinct MI200 series accelerators deliver leading application performance for a broad set of HPC workloads. The AMD Instinct MI250X accelerator provides up to 4.9X better performance than competitive accelerators for double precision (FP64) HPC applications and surpasses 380 teraflops of peak theoretical half-precision (FP16) for AI workloads to enable disruptive approaches in further accelerating data-driven research.

AMD Instinct MI200: Dual-GPU Chiplet; CDNA2 Architecture; 128 GB HBM2E

AMD today announced the debut of its 6 nm CDNA2 (Compute-DNA) architecture in the form of the MI200 family. The new, dual-GPU chiplet accelerator aims to lead AMD into a new era of High Performance Computing (HPC) applications, the high margin territory it needs to compete in for continued, sustainable growth. To that end, AMD has further improved on a matured, compute-oriented architecture born with Graphics Core Next (GCN) - and managed to improve performance while reducing total die size compared to its MI100 family.

New AMD Radeon PRO W6000X Series GPUs Bring Groundbreaking High-Performance AMD RDNA 2 Architecture to Mac Pro

AMD today announced availability of the new AMD Radeon PRO W6000X series GPUs for Mac Pro. The new GPU product line delivers exceptional performance and incredible visual fidelity to power a wide variety of demanding professional applications and workloads, including 3D rendering, 8K video compositing, color correction and more.

Built on groundbreaking AMD RDNA 2 architecture, AMD Infinity Cache and other advanced technologies, the new workstation graphics line-up includes the AMD Radeon PRO W6900X and AMD Radeon PRO W6800X GPUs. Mac Pro users also have the option of choosing the AMD Radeon PRO W6800X Duo graphics card, a dual-GPU configuration that leverages high-speed AMD Infinity Fabric interconnect technology to deliver outstanding levels of compute performance.

AMD Announces 3rd Generation EPYC 7003 Enterprise Processors

AMD today announced its 3rd generation EPYC (7003 series) enterprise processors, codenamed "Milan." These processors combine up to 64 of the company's latest "Zen 3" CPU cores, with an updated I/O controller die, and promise significant performance uplifts and new security capabilities over the previous generation EPYC 7002 "Rome." The "Zen 3" CPU cores, AMD claims, introduce an IPC uplift of up to 19% over the previous generation, which when combined by generational increases in CPU clock speeds, bring about significant single-threaded performance increases. The processor also comes with large multi-threaded performance gains thanks to a redesigned CCD.

The new "Zen 3" CPU complex die (CCD) comes with a radical redesign in the arrangement of CPU cores, putting all eight CPU cores of the CCD in a single CCX, sharing a large 32 MB L3 cache. This the total amount of L3 cache addressable by a CPU core, and significantly reduces latencies for multi-threaded workloads. The "Milan" multi-chip module has up to eight such CCDs talking to a centralized server I/O controller die (sIOD) over the Infinity Fabric interconnect.

AMD Ryzen 5000 Series Features Three Synchronized Memory Clock Domains

A leaked presentation slide by AMD for its Ryzen 5000 series "Zen 3" processors reveals details of the processor's memory interface. Much like the Ryzen 3000 series "Matisse," the Ryzen 5000 series "Vermeer" is a multi-chip module of up to 16 CPU cores spread across two 8-core CPU dies, and a unified I/O die that handles the processor's memory-, PCIe, and SoC interfaces. There are three configurable clock domains that ensure the CPU cores are fed with data at the right speed, and to ensure that the MCM design doesn't pose bottlenecks to the memory performance.

The first domain is fclk or Infinity Fabric clock. Each of the two CCDs (8-core CPU dies) has just one CCX (CPU core complex) with 8 cores, and hence the CCD's internal Infinity Fabric cedes relevance to the IFOP (Infinity Fabric over Package) interconnect that binds the two CCDs and the cIOD (client I/O controller die) together. The next frequency is uclk, or the internal frequency of the dual-channel DDR4 memory controller contained in the cIOD. And lastly, the mclk, or memory clock is the industry-standard DRAM frequency.

AMD 4th Gen Ryzen "Vermeer" Zen 3 Rumored to Include 10-core Parts

Yuri "1usmus" Bubliy, author of DRAM Calculator for Ryzen and the upcoming ClockTuner for Ryzen, revealed three pieces of juicy details on the upcoming 4th Gen AMD Ryzen "Vermeer" performance desktop processors. He predicts AMD turning up CPU core counts with this generation, including the introduction of new 10-core SKUs, possibly to one-up Intel in the multi-threaded performance front. Last we heard, AMD's upcoming "Zen 3" CCDs (chiplets) feature 8 CPU cores sharing a monolithic 32 MB slab of L3 cache. This should, in theory, allow AMD to create 10-core chips with two CCDs, each with 5 cores enabled.

Next up, are two features that should interest overclockers - which is Bubliy's main domain. The processors should support a feature called "Curve Optimizer," enabling finer-grained control over the boost algorithm, and on a per-core basis. As we understand, the "curve" in question could even be voltage/frequency. It remains to be seen of the feature is leveraged at a CBS level (UEFI setup program), or by Ryzen Master. Lastly, there's mention of new Infinity Fabric dividers that apparently helps you raise DCT (memory controller) frequencies "slightly higher" in mixed mode. AMD is expected to debut its 4th Gen Ryzen "Vermeer" desktop processors within 2020.
Return to Keyword Browsing
Dec 21st, 2024 21:26 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts