News Posts matching #TSV

Return to Keyword Browsing

AMD Ryzen 7 9800X3D Has the CCD on Top of the 3D V-cache Die, Not Under it

Much of the Ryzen 7 9800X3D teaser material from AMD had the recurring buzzwords "X3D Reimagined," causing us to speculate what it could be. 9550pro, a reliable source with hardware leaks, says that AMD has redesigned the way the CPU complex die (CCD) and 3D V-cache die (L3D) are stacked together. In past generations of X3D processors, such as the 5800X3D "Vermeer-X" and the 7800X3D "Raphael-X," the L3D is stacked on top of the CCD. It would stack above the central region of the CCD that has the on-die 32 MB L3 cache, while blocks of structural silicon would be placed on top of the edges of the CCD that have the CPU cores, with these structural silicon blocks performing the crucial task of transferring heat from the CPU cores to the IHS above. This is about to change.

If the leaks are right, AMD has inverted the CCD-L3D stack with the 9000X3D series such that the "Zen 5" CCD is now on top, the L3D is below it, under the central region of the CCD. The CPU cores now dissipate heat to the IHS as they do on regular 9000 series processors without the 3D V-cache technology. The way we imagine they achieved this is by enlarging the L3D to align with the size of the CCD, and serve as a kind of "base tile." The L3D would have to be peppered with TSVs that connect the CCD to the fiberglass substrate below. We know where AMD is going with this in the future. Right now, the L3D "base tile" contains the 64 MB 3D V-cache that gets appended to the 32 MB on-die L3 cache, but in the future (probably with "Zen 6"), AMD could design the CCDs with TSVs even for the per-core L2 caches.

AMD Granite Ridge "Zen 5" Processor Annotated

High-resolution die-shots of the AMD "Zen 5" 8-core CCD were released and annotated by Nemez, Fitzchens Fitz, and HighYieldYT. These provide a detailed view of how the silicon and its various components appear, particularly the new "Zen 5" CPU core with its 512-bit FPU. The "Granite Ridge" package looks similar to "Raphael," with up to two 8-core CPU complex dies (CCDs) depending on the processor model, and a centrally located client I/O die (cIOD). This cIOD is carried over from "Raphael," which minimizes product development costs for AMD at least for the uncore portion of the processor. The "Zen 5" CCD is built on the TSMC N4P (4 nm) foundry node.

The "Granite Ridge" package sees the up to two "Zen 5" CCDs snuck up closer to each other than the "Zen 4" CCDs on "Raphael." In the picture above, you can see the pad of the absent CCD behind the solder mask of the fiberglass substrate, close to the present CCD. The CCD contains 8 full-sized "Zen 5" CPU cores, each with 1 MB of L2 cache, and a centrally located 32 MB L3 cache that's shared among all eight cores. The only other components are an SMU (system management unit), and the Infinity Fabric over Package (IFoP) PHYs, which connect the CCD to the cIOD.

AMD Says Ryzen 9000 Series Won't Beat 7000X3D Series at Gaming

AMD's upcoming Ryzen 9000 "Granite Ridge" desktop processors based on the "Zen 5" microarchitecture won't beat the Ryzen 7000X3D series at gaming workloads, said Donny Woligroski, the company's senior technical marketing manager, in an interview with Tom's Hardware. The new "Zen 5" chips, such as the Ryzen 7 9700X and Ryzen 9 9950X, will come close to the gaming performance of the 7800X3D and 7950X3D, but won't quite beat it. The new processors, however, will offer significant generational performance uplifts in productivity workloads, particularly multithreaded workloads that use vector extensions such as VNNI and AVX512. The Ryzen 7 7800X3D remains the fastest gaming desktop processor you can buy, it edges out even Intel's Core i9-14900KS, in our testing.

Given this, we expect the gaming performance of processors like the Ryzen 7 9700X and Ryzen 9 9950X to end up closer to those of the Intel Core i9-13900K or i9-14900K. Gamers with a 7000X3D series chip or even a 14th Gen Core i7 or Core i9 chip don't have much to look forward to. AMD confirmed that it's already working on a Ryzen 9000X3D series—that's "Zen 5" with 3D V-cache technology, and is sounds confident of holding on to the title of having the fastest gaming processors. This doesn't seem implausible.

Intel Demos 3D Transistors, RibbonFET, and PowerVia Technologies

During the 69th annual IEEE International Electron Devices Meeting (IEDM), Intel demonstrated some of its latest transistor design and manufacturing advancements. The first one in line is the 3D integration of transistors. According to Intel, the company has successfully stacked complementary field effect transistors (CFET) at a scaled gate pitch down to 60 nm. With CFETs promising thinner gate channels, the 3D stacked CFET would allow for higher density by going vertically and horizontally. Intel's 7 node has a 54 nm gate pitch, meaning CFETs are already close to matching production-ready nodes. With more time and development, we expect to see 3D stacked CFETs in the production runs in the coming years.

Next, Intel has demonstrated RibbonFET technology, a novel approach that is the first new transistor architecture since the introduction of FinFET in 2012. Using ribbon-shaped channels surrounded by the gate, these transistors allow for better control and higher drive current at all voltage levels. This allows faster transistor switching speeds, which later lead to higher frequency and performance. The width of these nanoribbon channels can be modulated depending on the application, where low-power mobile applications use less current, making the channels thinner, and high-performance applications require more current, making the channels wider. One stack of nanoribbons can achieve the same drive current as multiple fins found in FinFET but at a smaller footprint.

Samsung Notes: HBM4 Memory is Coming in 2025 with New Assembly and Bonding Technology

According to the editorial blog post published on the Samsung blog by SangJoon Hwang, Executive Vice President and Head of the DRAM Product & Technology Team at Samsung Electronics, we have information that High-Bandwidth Memory 4 (HBM4) is coming in 2025. In the recent timeline of HBM development, we saw the first appearance of HBM memory in 2015 with the AMD Radeon R9 Fury X. The second-generation HBM2 appeared with NVIDIA Tesla P100 in 2016, and the third-generation HBM3 saw the light of the day with NVIDIA Hopper GH100 GPU in 2022. Currently, Samsung has developed 9.8 Gbps HBM3E memory, which will start sampling to customers soon.

However, Samsung is more ambitious with development timelines this time, and the company expects to announce HBM4 in 2025, possibly with commercial products in the same calendar year. Interestingly, the HBM4 memory will have some technology optimized for high thermal properties, such as non-conductive film (NCF) assembly and hybrid copper bonding (HCB). The NCF is a polymer layer that enhances the stability of micro bumps and TSVs in the chip, so memory solder bump dies are protected from shock. Hybrid copper bonding is an advanced semiconductor packaging method that creates direct copper-to-copper connections between semiconductor components, enabling high-density, 3D-like packaging. It offers high I/O density, enhanced bandwidth, and improved power efficiency. It uses a copper layer as a conductor and oxide insulator instead of regular micro bumps to increase the connection density needed for HBM-like structures.

Winbond Introduces Innovative CUBE Architecture for Powerful Edge AI Devices

Winbond Electronics Corporation, a leading global supplier of semiconductor memory solutions, has unveiled a powerful enabling technology for affordable Edge AI computing in mainstream use cases. The Company's new customized ultra-bandwidth elements (CUBE) enable memory technology to be optimized for seamless performance running generative AI on hybrid edge/cloud applications.

CUBE enhances the performance of front-end 3D structures such as chip on wafer (CoW) and wafer on wafer (WoW), as well as back-end 2.5D/3D chip on Si-interposer on substrate and fan-out solutions. Designed to meet the growing demands of edge AI computing devices, it is compatible with memory density from 256 Mb to 8 Gb with a single die, and it can also be 3D stacked to enhance bandwidth while reducing data transfer power consumption.

Samsung Electronics Unveils Industry's Highest-Capacity 12nm-Class 32Gb DDR5 DRAM

collaboration with diverse industries and support various applications
Samsung Electronics, a world leader in advanced memory technology, today announced that it has developed the industry's first and highest-capacity 32-gigabit (Gb) DDR5 DRAM using 12 nanometer (nm)-class process technology. This achievement comes after Samsung began mass production of its 12 nm-class 16Gb DDR5 DRAM in May 2023. It solidifies Samsung's leadership in next-generation DRAM technology and signals the next chapter of high-capacity memory.

"With our 12 nm-class 32Gb DRAM, we have secured a solution that will enable DRAM modules of up to 1-terabyte (TB), allowing us to be ideally positioned to serve the growing need for high-capacity DRAM in the era of AI (Artificial Intelligence) and big data," said SangJoon Hwang, Executive Vice President of DRAM Product & Technology at Samsung Electronics. "We will continue to develop DRAM solutions through differentiated process and design technologies to break the boundaries of memory technology."

Suppliers Amp Up Production, HBM Bit Supply Projected to Soar by 105% in 2024

TrendForce highlights in its latest report that memory suppliers are boosting their production capacity in response to escalating orders from NVIDIA and CSPs for their in-house designed chips. These efforts include the expansion of TSV production lines to increase HBM output. Forecasts based on current production plans from suppliers indicate a remarkable 105% annual increase in HBM bit supply by 2024. However, due to the time required for TSV expansion, which encompasses equipment delivery and testing (9 to 12 months), the majority of HBM capacity is expected to materialize by 2Q24.

TrendForce analysis indicates that 2023 to 2024 will be pivotal years for AI development, triggering substantial demand for AI Training chips and thereby boosting HBM utilization. However, as the focus pivots to Inference, the annual growth rate for AI Training chips and HBM is expected to taper off slightly. The imminent boom in HBM production has presented suppliers with a difficult situation: they will need to strike a balance between meeting customer demand to expand market share and avoiding a surplus due to overproduction. Another concern is the potential risk of overbooking, as buyers, anticipating an HBM shortage, might inflate their demand.

Micron Delivers Industry's Fastest, Highest-Capacity HBM to Advance Generative AI Innovation

Micron Technology, Inc. today announced it has begun sampling the industry's first 8-high 24 GB HBM3 Gen2 memory with bandwidth greater than 1.2 TB/s and pin speed over 9.2 Gb/s, which is up to a 50% improvement over currently shipping HBM3 solutions. With a 2.5 times performance per watt improvement over previous generations, Micron's HBM3 Gen2 offering sets new records for the critical artificial intelligence (AI) data center metrics of performance, capacity and power efficiency. These Micron improvements reduce training times of large language models like GPT-4 and beyond, deliver efficient infrastructure use for AI inference and provide superior total cost of ownership (TCO).

The foundation of Micron's high-bandwidth memory (HBM) solution is Micron's industry-leading 1β (1-beta) DRAM process node, which allows a 24Gb DRAM die to be assembled into an 8-high cube within an industry-standard package dimension. Moreover, Micron's 12-high stack with 36 GB capacity will begin sampling in the first quarter of calendar 2024. Micron provides 50% more capacity for a given stack height compared to existing competitive solutions. Micron's HBM3 Gen2 performance-to-power ratio and pin speed improvements are critical for managing the extreme power demands of today's AI data centers. The improved power efficiency is possible because of Micron advancements such as doubling of the through-silicon vias (TSVs) over competitive HBM3 offerings, thermal impedance reduction through a five-time increase in metal density, and an energy-efficient data path design.

Major CSPs Aggressively Constructing AI Servers and Boosting Demand for AI Chips and HBM, Advanced Packaging Capacity Forecasted to Surge 30~40%

TrendForce reports that explosive growth in generative AI applications like chatbots has spurred significant expansion in AI server development in 2023. Major CSPs including Microsoft, Google, AWS, as well as Chinese enterprises like Baidu and ByteDance, have invested heavily in high-end AI servers to continuously train and optimize their AI models. This reliance on high-end AI servers necessitates the use of high-end AI chips, which in turn will not only drive up demand for HBM during 2023~2024, but is also expected to boost growth in advanced packaging capacity by 30~40% in 2024.

TrendForce highlights that to augment the computational efficiency of AI servers and enhance memory transmission bandwidth, leading AI chip makers such as Nvidia, AMD, and Intel have opted to incorporate HBM. Presently, Nvidia's A100 and H100 chips each boast up to 80 GB of HBM2e and HBM3. In its latest integrated CPU and GPU, the Grace Hopper Superchip, Nvidia expanded a single chip's HBM capacity by 20%, hitting a mark of 96 GB. AMD's MI300 also uses HBM3, with the MI300A capacity remaining at 128 GB like its predecessor, while the more advanced MI300X has ramped up to 192 GB, marking a 50% increase. Google is expected to broaden its partnership with Broadcom in late 2023 to produce the AISC AI accelerator chip TPU, which will also incorporate HBM memory, in order to extend AI infrastructure.

Intel to Demonstrate PowerVia on E-Core Processor Built with Intel 4 Node

At VLSI Symposium 2023, scheduled to take place between June 11-16, Intel is set to demonstrate its PowerVia technology working efficiently on an E-Core chip built using the Intel 4 node. Conventional chips have power and signal interconnects distributed across multiple metal layers. PowerVia, on the other hand, dedicates specific layers for power delivery, effectively separating them from the signal routing layers. This approach allows for vertical power delivery through a set of power-specific Through-Silicon Vias (TSVs) or PowerVias, which are essentially vertical connections between the top and bottom surfaces of the chip. By delivering power directly from the backside of the chip, PowerVia reduces power supply noise and resistive losses, optimizing power distribution and improving overall energy efficiency. PowerVia is set to make a debut in 2024 with Intel 20A node.

For VLSI Symposium 2023 talk, the company has prepared a paper that highlights a design made using Intel 4 technology and implements E-Cores only in a test chip. The document states: "PowerVia Technology is a novel innovation to extend Process Scaling by having Power Delivery on the backside. This paper presents the pre and post silicon findings from implementing an Intel E-Core in PowerVia Technology. PowerVia enabled standard cell utilization of greater than 90 percent in large areas of the core while showing greater than 5 percent frequency benefit in silicon due reduced IR drop. Successful Post silicon debug is demonstrated with slightly higher but acceptable throughput times. The thermal characteristics of the PowerVia testchip is inline with higher power densities expected from logic scaling."

PMIC Issue with Server DDR5 RDIMMs Reported, Convergence of DDR5 Server DRAM Price Decline

TrendForce reports that mass production of new server platforms—such as Intel Sapphire Rapids and AMD Genoa—is imminent. However, recent market reports have indicated a PMIC compatibility issue for server DDR5 RDIMMs; DRAM suppliers and PMIC vendors are working to address the problem. TrendForce believes this will have two effects: First, DRAM suppliers will temporarily procure more PMICs from Monolithic Power Systems (MPS), which supplies PMICs without any issues. Second, supply will inevitably be affected in the short term as current DDR5 server DRAM production still uses older processes, which will lead to a convergence in the price decline of DDR5 server DRAM in 2Q23—from the previously estimated 15~20% to 13~18%.

As previously mentioned, PMIC issues and the production process relying on older processes are all having a short-term impact on the supply of DDR5 server DRAM. SK hynix has gradually ramped up production and sales of 1α-nm, which, unlike 1y-nm, has yet to be fully verified by consumers. Current production processes are still being dominated by Samsung and SK hynix's 1y-nm and Micron's 1z-nm; 1α and 1β-nm production is projected to increase in 2H23.

AMD Details its 3D V-Cache Design at ISSCC

This week, the International Solid-State Circuits Conference is taking place online and during one of the sessions, AMD shared some more details of its 3D V-Cache design. The interesting part here is the overall design of AMD's 3D V-Cache, as well as how it interfaces with its CPU dies. The cache chip itself is said to measure 36 mm² and interfaces directly with the L3 cache using a Through Silicon Via or TSV interface. For all the CPU cores to be able to communicate with the 3D V-Cache, AMD has implemented a shared ring bus design at the L3 level. The entire L3 cache is said to be available to each of the cores, which should further help improve performance.

The 3D V-Cache is made up of multiple 8 MB "slices" which has a 1,024 contact interface with a single CPU core, for a total of 8,192 connections in total between the CCX and the 3D V-Cache. This allows for a bandwidth in excess of two terabyte per second, per slice, in full duplex mode. This should allow for full L3 speeds for the 3D V-Cache, despite the fact that it's not an integrated part of the CCX. AMD is also said to have improved the design of its CCX for the upcoming Ryzen 7 5800X3D in several ways to try and reduce the power draw, while improving clock speeds. AMD has yet to reveal a launch date for the Ryzen 7 5800X3D, but it'll be interesting to see if the 3D V-Cache and the various minor optimizations can make it competitive with Intel's Alder Lake CPUs until Zen 4 arrives.

JEDEC Publishes HBM3 Update to High Bandwidth Memory (HBM) Standard

JEDEC Solid State Technology Association, the global leader in the development of standards for the microelectronics industry, today announced the publication of the next version of its High Bandwidth Memory (HBM) DRAM standard: JESD238 HBM3, available for download from the JEDEC website. HBM3 is an innovative approach to raising the data processing rate used in applications where higher bandwidth, lower power consumption and capacity per area are essential to a solution's market success, including graphics processing and high-performance computing and servers.

AMD Envisions Direct Circuit Slicing for Future 3D Stacked Dies

AMD in its HotChips 33 presentation shed light on the the company's efforts to stay on the cutting edge of 3D silicon packaging technology, especially as rival Intel takes giant strides with 2.5D and 3D packaging on its latest "Ponte Vecchio" and "Sapphire Rapids" packages. The company revealed that it co-developed a pioneering new die-on-die stacking technique with TSMC for its upcoming "Zen 3" CCDs with 3D Vertical Caches, which are 64 MB SRAM dies stacked on top of "Zen 3" CCDs to serve as an extension of the 32 MB on-die L3 cache. The micro-bumps connecting the 3D Vertical Cache die with the CCD are 9-micron in pitch, compared to 10-micron on the production variant of Intel Foveros.

AMD believes that no single packaging technology works for all products, and depend entirely on what it is you're trying to stack. The company spoke on the future of die-on-die stacking. For over a decade, package-on-package stacking has been possible (as in the case of smartphones. Currently, it's possible to put memory-on-logic within a single package, between the logic die and an SRAM die for additional cache memory; a logic die an DRAM for RAM integrated with package; or even logic with NAND flash for extreme-density server devices.

Samsung Develops 512 GB DDR5 Memory Modules Running at 7.2 Gbps

At this year's Hot chips 33 conference, Samsung has presented its works on the upcoming DDR5 memory standard. The company has managed to achieve a lot of new developments, as the newer standard pairs with new technologies to deliver higher speeds and better capacity. The Korean company designed its DDR5 modules as 8-high (8H) stacked TSV (through silicon via) dies. In the previous DDR4 implementations, Samsung used 4-high (4H) stacked TSV dies, which are actually thicker than the latest 8-high implementations. To achieve the new thin design, Samsung has used thin wafer handling techniques, which resulted in a 40% reduction in gab between stacked dies. The new 8H DDR5 modules are only 1.0 mm thick, compared to the 1.2 mm of the older 4H modules.

When it comes to performance, Samsung expects the new DDR5 modules to deliver big. Running at 7.2 Gbps speeds, the Samsung-made RDIMM/LRDIMM modules can reach up to 512 GB in capacity. This is, of course, limited to the server/enterprise market. Regular consumers/PC users can expect to have UDIMMs with up to 64 GB of capacity. The aforementioned 7.2 Gbps speed is achieved at the specified 1.1 Volts of power, meaning that Samsung's implementation is very efficient. According to some estimations made by the company, the DDR5 crossover for the mainstream market is not expected before 2023/2024, meaning that there is still a lot of time for memory makers to refine their DDR5 products.

AMD "Zen 3" 3D Vertical Cache Detailed Some More

Senior Technology Fellow Yuzo Fukuzaki shed light on the elusive new CPU technology AMD unveiled at its Computex 2021 keynote, 3D Vertical Cache (3DV Cache). The company had then detailed it as an additional 64 MB last-level cache stacked on top of a CCD (CPU core complex die), which significantly improves performance, including a claimed 15% average gain in gaming performance, which accounts for a generational performance gain over "Zen 3." The prototype AMD unveiled in its keynote was based on a Socket AM4 processor with "Zen 3" CCDs that have the 3DV Cache components in place. With two such CCDs, a 16-core processor would end up with 192 MB of L3 cache.

Yuzo Fukuzaki's theory sheds light on the most plausible position of 3DV Cache in the processor's cache hierarchy. Apparently, it expands the CCD's L3 cache, and doesn't serve as an "L4" victim cache to the L3. This way, the cache setup remains transparent to the OS, which sees it as a contiguous 96 MB block of L3 cache (per CCD). The 3DV Cache die is an SRAM chip fabricated on the same 7 nm process as the "Zen 3" CCD. It measures 6 mm x 6 mm (36 mm²), and is located above the region of the CCD that typically has the 32 MB L3 SRAM. Fukuzaki estimates that roughly 23,000 TSVs (through-silicon vias), each about 17 µm in size, connect the 3DV Cache die to the main CCD.

AMD Shares New Details on Their 3D V-Cache Tech for Zen 3+

AMD via its official YouTube has shared a video that goes into slightly more detail on their usage of V-Cache on the upcoming Zen 3+ CPUs. Firstly demoed to the public on AMD's Computex 2021 event, the 3D V-Cache leverages TSMC's SoIC stacking technology, which enables silicon developments along the Z axis, instead of the more usual footprint increase along the X axis. The added 3D V-Cache, which was shown in Computex as being deployed in a prototype Ryzen 9 5900X 12-core CPU, adds 64 MB of L3 cache to each CCX (the up-to-eight-cores core complex on AMD's latest Zen design), basically tripling the amount of L3 cache available for the CPU. This, in turn, was shown to increase FPS in games quite substantially (somewhere around 15%), as games in particular are sensitive to this type of CPU resources.

The added information explains that there is no usage of microbumps - instead, there is a perfect alignment between the bottom layer (with the CCX) and the top layer (the L3 cache) which enables the bonding process to occur naturally via the TSVs (Through Silicon Vias) already present in the silicon, in a zero-gap manner, between both halves of the CPU-cache sandwich. To enable this, AMD flipped the CCX upside down (the core complex now faces the bottom of the chip, instead of the top), shaved 95% of the silicon on top of the upside-down core complexes, and then attaches the 3D V-Cache chips on top of this formation. This also has the added bonus of decreasing the distance between the L3 cache and the CCX (the distance between both in the Z axis is around 1,000 times smaller than if the L3 cache was deployed in the classical X axis), which decreases power consumption, temperatures, and latency, allowing for further increases to system performance. Look after the break for the full video.
Return to Keyword Browsing
Dec 22nd, 2024 02:36 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts