News Posts matching #5 nm

Return to Keyword Browsing

Meta Announces New MTIA AI Accelerator with Improved Performance to Ease NVIDIA's Grip

Meta has announced the next generation of its Meta Training and Inference Accelerator (MTIA) chip, which is designed to train and infer AI models at scale. The newest MTIA chip is a second-generation design of Meta's custom silicon for AI, and it is being built on TSMC's 5 nm technology. Running at the frequency of 1.35 GHz, the new chip is getting a boost to 90 Watts of TDP per package compared to just 25 Watts for the first-generation design. Basic Linear Algebra Subprograms (BLAS) processing is where the chip shines, and it includes matrix multiplication and vector/SIMD processing. At GEMM matrix processing, each chip can process 708 TeraFLOPS at INT8 (presumably meant FP8 in the spec) with sparsity, 354 TeraFLOPS without, 354 TeraFLOPS at FP16/BF16 with sparsity, and 177 TeraFLOPS without.

Classical vector and processing is a bit slower at 11.06 TeraFLOPS at INT8 (FP8), 5.53 TeraFLOPS at FP16/BF16, and 2.76 TFLOPS single-precision FP32. The MTIA chip is specifically designed to run AI training and inference on Meta's PyTorch AI framework, with an open-source Triton backend that produces compiler code for optimal performance. Meta uses this for all its Llama models, and with Llama3 just around the corner, it could be trained on these chips. To package it into a system, Meta puts two of these chips onto a board and pairs them with 128 GB of LPDDR5 memory. The board is connected via PCIe Gen 5 to a system where 12 boards are stacked densely. This process is repeated six times in a single rack for 72 boards and 144 chips in a single rack for a total of 101.95 PetaFLOPS, assuming linear scaling at INT8 (FP8) precision. Of course, linear scaling is not quite possible in scale-out systems, which could bring it down to under 100 PetaFLOPS per rack.
Below, you can see images of the chip floorplan, specifications compared to the prior version, as well as the system.

Intel Launches Gaudi 3 AI Accelerator: 70% Faster Training, 50% Faster Inference Compared to NVIDIA H100, Promises Better Efficiency Too

During the Vision 2024 event, Intel announced its latest Gaudi 3 AI accelerator, promising significant improvements over its predecessor. Intel claims the Gaudi 3 offers up to 70% improvement in training performance, 50% better inference, and 40% better efficiency than Nvidia's H100 processors. The new AI accelerator is presented as a PCIe Gen 5 dual-slot add-in card with a 600 W TDP or an OAM module with 900 W. The PCIe card has the same peak 1,835 TeraFLOPS of FP8 performance as the OAM module despite a 300 W lower TDP. The PCIe version works as a group of four per system, while the OAM HL-325L modules can be run in an eight-accelerator configuration per server. This likely will result in a lower sustained performance, given the lower TDP, but it confirms that the same silicon is used, just finetuned with a lower frequency. Built on TSMC's N5 5 nm node, the AI accelerator features 64 Tensor Cores, delivering double the FP8 and quadruple FP16 performance over the previous generation Gaudi 2.

The Gaudi 3 AI chip comes with 128 GB of HBM2E with 3.7 TB/s of bandwidth and 24 200 Gbps Ethernet NICs, with dual 400 Gbps NICs used for scale-out. All of that is laid out on 10 tiles that make up the Gaudi 3 accelerator, which you can see pictured below. There is 96 MB of SRAM split between two compute tiles, which acts as a low-level cache that bridges data communication between Tensor Cores and HBM memory. Intel also announced support for the new performance-boosting standardized MXFP4 data format and is developing an AI NIC ASIC for Ultra Ethernet Consortium-compliant networking. The Gaudi 3 supports clusters of up to 8192 cards, coming from 1024 nodes comprised of systems with eight accelerators. It is on track for volume production in Q3, offering a cost-effective alternative to NVIDIA accelerators with the additional promise of a more open ecosystem. More information and a deeper dive can be found in the Gaudi 3 Whitepaper.

US Backs TSMC's $65B Arizona Investment with $11.6B Support Package

According to the latest report from Bloomberg, the US government under Joe Biden's administration has announced plans to provide Taiwan Semiconductor Manufacturing Company (TSMC) with a substantial financial support package worth $11.6 billion. The package is composed of $6.6 billion in grants and up to $5 billion in loans. This represents the most significant financial assistance approved under the CHIPS and Science Act, a key initiative to resurrect the US chip industry. The funding will aid TSMC in establishing three cutting-edge semiconductor production facilities in Arizona, with the company's total investment in the state expected to exceed an impressive $65 billion. TSMC's multi-phase Arizona project will commence with the construction of a fab module near its existing Fab 21 facility. Production using 4 nm and 5 nm process nodes is slated to begin by early 2025. The second phase, scheduled for 2028, will focus on even more advanced 2 nm and 3 nm technologies.

TSMC has kept details about the third facility's production timeline and process node under wraps. The company's massive investment in Arizona is expected to profoundly impact the local economy, creating 6,000 high-tech manufacturing jobs and over 20,000 construction positions. Moreover, $50 million has been earmarked for training local workers, which aligns with President Joe Biden's goal of bolstering domestic manufacturing and technological independence. However, TSMC's Arizona projects have encountered obstacles, including labor disputes and uncertainties regarding government support, resulting in delays for the second facility's production timeline. Additionally, reports suggest that at least one TSMC supplier has abandoned plans to set up operations in Arizona due to workforce-related challenges.

Huawei and SMIC Prepare Quadruple Semiconductor Patterning for 5 nm Production

According to Bloomberg's latest investigation, Huawei and Semiconductor Manufacturing International Corporation (SMIC) have submitted patents on the self-aligned quadruple patterning (SAQP) pattern etching technique to enable SMIC to achieve 5 nm semiconductor production. The two Chinese giants have been working with the Deep Ultra Violet (DUV) machinery to develop a pattern etching technique allowing SMIC to produce a node compliant with the US exporting rules while maintaining the density improvements from the previously announced 7 nm node. In the 7 nm process, SMIC most likely used self-aligned dual patterning (SADP) with DUV tools, but for the increased density of the 5 nm node, a doubling to SAQP is required. In semiconductor manufacturing, lithography tools take multiple turns to etch the design of the silicon wafer.

Especially with smaller nodes getting ever-increasing density requirements, it is becoming challenging to etch sub-10 nm designs using DUV tools. That is where Extreme Ultra Violet (EUV) tools from ASML come into play. With EUV, the wavelengths of the lithography printers are 14 times smaller than DUV, at only 13.5 nm, compared to 193 nm of ArF immersion DUV systems. This means that without EUV, SMIC has to look into alternatives like SAQP to increase the density of its nodes and, as a result, include more complications and possibly lower yields. As an example, Intel tried to use SAQP in its first 10 nm nodes to reduce reliance on EUV, which resulted in a series of delays and complications, eventually pushing Intel into EUV. While Huawei and SMIC may develop a more efficient solution for SAQP, the use of EUV is imminent as the regular DUV can not keep up with the increasing density of semiconductor nodes. Given that ASML can't ship its EUV machinery to China, Huawei is supposedly developing its own EUV machines, but will likely take a few more years to show.

NVIDIA "Blackwell" GeForce RTX to Feature Same 5nm-based TSMC 4N Foundry Node as GB100 AI GPU

Following Monday's blockbuster announcements of the "Blackwell" architecture and NVIDIA's B100, B200, and GB200 AI GPUs, all eyes are now on its client graphics derivatives, or the GeForce RTX GPUs that implement "Blackwell" as a graphics architecture. Leading the effort will be the new GB202 ASIC, a successor to the AD102 powering the current RTX 4090. This will be NVIDIA's biggest GPU with raster graphics and ray tracing capabilities. The GB202 is rumored to be followed by the GB203 in the premium segment, the GB205 a notch lower, and the GB206 further down the stack. Kopite7kimi, a reliable source with NVIDIA leaks, says that the GB202 silicon will be built on the same TSMC 4N foundry node as the GB100.

TSMC 4N is a derivative of the company's mainline N4P node, the "N" in 4N stands for NVIDIA. This is a nodelet that TSMC designed with optimization for NVIDIA SoCs. TSMC still considers the 4N as a derivative of the 5 nm EUV node. There is very little public information on the power- and transistor density improvements of the TSMC 4N over TSMC N5. For reference, the N4P, which TSMC regards as a 5 nm derivative, offers a 6% transistor-density improvement, and a 22% power efficiency improvement. In related news, Kopite7kimi says that with "Blackwell," NVIDIA is focusing on enlarging the L1 caches of the streaming multiprocessors (SM), which suggests a design focus on increasing the performance at an SM-level.

SMIC Prepares for 3 nm Node Development, Requires Chinese Government Subsidies

SMIC, China's largest semiconductor manufacturer, is reportedly assembling a dedicated team to develop 3 nm semiconductor node technology, following reports of the company setting up 5 nm chip production for Huawei later this year. This move is part of SMIC's efforts to achieve independence from foreign companies and reduce its reliance on US technology. According to a report from Joongang, SMIC's initial goal is to commence operations of its 5 nm production line, which will mass-produce Huawei chipsets for various products, including AI silicon. However, SMIC is already looking beyond the 5 nm node. The company has assembled an internal research and development team to begin work on the next-generation 3 nm node.

The Chinese manufacturer is expected to accomplish this using existing DUV machinery, as ASML, the sole supplier of advanced EUV technology, is prohibited from providing equipment to Chinese companies due to US restrictions. It is reported that one of the biggest challenges facing SMIC is the potential for low yields and high production costs. The company is seeking substantial subsidies from the Chinese government to overcome these obstacles. Receiving government subsidies will be crucial for SMIC, especially considering that its 5 nm chips are expected to be up to 50 percent more expensive than TSMC's due to the use of older DUV equipment. The first 3 nm wafers from SMIC are not expected to roll out for several years, as the company will prioritize the commercialization of Huawei's 5 nm chips. This ambitious undertaking by SMIC represents a significant challenge for the company as it strives to reduce its dependence on foreign semiconductor technology and establish itself as an essential player in the global manufacturing industry.

Intel CEO Discloses TSMC Production Details: N3 for Arrow Lake & N3B for Lunar Lake

Intel CEO Pat Gelsinger engaged with press/media representatives following the conclusion of his IFS Direct Connect 2024 keynote speech—when asked about Team Blue's ongoing relationship with TSMC, he confirmed that their manufacturing agreement has advanced from "5 nm to 3 nm." According to a China Times news article: "Gelsinger also confirmed the expansion of orders to TSMC, confirming that TSMC will hold orders for Intel's Arrow and Lunar Lake CPU, GPU, and NPU chips this year, and will produce them using the N3B process, officially ushering in the Intel notebook platform that the outside world has been waiting for many years." Past leaks have indicated that Intel's Arrow Lake processor family will have CPU tiles based on their in-house 20A process, while TSMC takes care of the GPU tile aspect with their 3 nm N3 process node.

That generation is expected to launch later this year—the now "officially confirmed" upgrade to 3 nm should produce pleasing performance and efficiency improvements. The current crop of Core Ultra "Meteor Lake" mobile processors has struggled with the latter, especially when compared to rivals. Lunar Lake is marked down for a 2025 launch window, so some aspects of its internal workings remain a mystery—Gelsinger has confirmed that TSMC's N3B is in the picture, but no official source has disclosed their in-house manufacturing choice(s) for LNL chips. Wccftech believes that Lunar Lake will: "utilize the same P-Core (Lion Cove) and brand-new E-Core (Skymont) core architecture which are expected to be fabricated on the 20A node. But that might also be limited to the CPU tile. The GPU tile will be a significant upgrade over the Meteor Lake and Arrow Lake CPUs since Lunar Lake ditches Alchemist and goes for the next-gen graphics architecture codenamed "Battlemage" (AKA Xe2-LPG)." Late January whispers pointed to Intel and TSMC partnering up on a 2 nanometer process for the "Nova Lake" processor generation—perhaps a very distant prospect (2026).

AMD "Zen 5c" CCDs Made On More Advanced 3 nm Node Than "Zen 5"

AMD is reportedly building its upcoming "Zen 5" and "Zen 5c" CPU Core Dies (CCDs) on two different foundry nodes, a report by Chinese publication UDN, claims. The Zen 5 CCD powering the upcoming Ryzen "Granite Ridge" desktop processors, "Fire Range" mobile processors, and EPYC "Turin" server processors, will be reportedly built on the 4 nm EUV foundry node, a slightly more advanced node than the current 5 nm EUV the company is building "Zen 4" CCDs on. The "Zen 5c" CCD, or the chiplet with purely "Zen 5c" cores in a high density configuration; on the other hand, will be built on an even more advanced 3 nm EUV foundry node, the report says. Both CCDs will go into mass production in Q2-2024, with product launches expected across the second half of the year.

The "Zen 5c" chiplet has a mammoth 32 cores spread across two CCXs of 16 cores, each. Each CCX has 16 cores sharing a 32 MB L3 cache. It is to cram these 32 cores, each with 1 MB of L2 cache; and a total of 64 MB of L3 cache, that AMD could be turning to the 3 nm foundry node. Another reason could be voltages. If "Zen 4c" is anything to go by, the "Zen 5c" core is a highly compacted variant of "Zen 5," which operates at a lower voltage band than its larger sibling, without any change in IPC or instruction sets. The decision to go with 3 nm could be a move aimed at increasing clock speeds at those lower voltages, in a bid to generationally improve performance using clock speeds, besides IPC and core count. The EPYC processor with "Zen 5c" chiplets will feature no more than six such large CCDs, for a maximum core count of 192. The regular "Zen 5" CCD has just 8 cores in a single CCX, with 32 MB of L3 cache shared among the cores; and TSV provision for 3D Vertical Cache, to increase the L3 cache in special variants.

Loongson 3A6000 CPU Reportedly Matches AMD Zen 4 and Intel Raptor Lake IPC

China's homegrown Loongson 3A6000 CPU shows promise but still needs to catch up AMD and Intel's latest offerings in real-world performance. According to benchmarks by Chinese tech reviewer Geekerwan, the 3A6000 has instructions per clock (IPC) on par with AMD's Zen 4 architecture and Intel's Raptor Lake. Using the SPEC CPU 2017 processor benchmark, Geekerwan has clocked all the CPUs at 2.5 GHs to compare the raw benchmark results to Zen 4 and Intel's Raptor Lake (Raptor Cove) processors. As a result, the Loongson 3A6000 seemingly matches the latest designs by AMD and Intel in integer results, with integer IPC measured at 4.8, while Zen 4 and Raptor Cove have 5.0 and 4.9, respectively. The floating point performance is still lagging behind a lot, though. This demonstrates that Loongson's CPU design can catching up to global leaders, but still needs further development, especially for floating point arithmetic.

However, the 3A6000 is held back by low clock speeds and limited core counts. With a maximum boost speed of just 2.5 GHz across four CPU cores, the 3A6000 cannot compete with flagship chips like AMD's 16-core Ryzen 9 7950X running at 5.7 GHz. While the 3A6000's IPC is impressive, its raw computing power is a fraction of that of leading x86 CPUs. Loongson must improve manufacturing process technology to increase clock speeds, core counts, and cache size. The 3A6000's strengths highlight Loongson's ambitions: an in-house LoongArch ISA design fabricated on 12 nm achieves competitive IPC to state-of-the-art x86 chips built on more advanced TSMC 5 nm and Intel 7 nm nodes. This shows the potential behind Loongson's engineering. Reports suggest that next-generation Loongson 3A7000 CPUs will use SMIC 7 nm, allowing higher clocks and more cores to better harness the architecture's potential. So, we expect the next generation to set a bar for China's homegrown CPU performance.

SMIC Reportedly Ramping Up 5 Nanometer Production Line in Shanghai

Semiconductor Manufacturing International Corp (SMIC) is preparing new semiconductor production lines at its Shanghai facilities according to a fresh Reuters report—China's largest contract chip maker is linked to next generation Huawei SoC designs, possibly 5 nm-based Kirin models. SMIC's newest Shanghai wafer fabrication site was an expensive endeavor—involving a $8.8 billion investment—but their flagship lines face a very challenging scenario with new phases of mass production. Huawei, a key customer, is expected to "upgrade" to a 5 nm process for new chip designs—their current flagship, Kirin 9000S, is based on a SMIC 7 nm node. Reuter's industry sources believe that the foundry's current stable of "U.S. and Dutch-made equipment" will be deployed to "produce 5-nanometer chips."

Revised trade rulings have prevented ASML shipping advanced DUV machinery to mainland China manufacturing sites—SMIC workers have reportedly already repurposed the existing inventory of lithography equipment for next-gen pursuits. Burn Lin (ex-TSMC), a renowned "chip guru," believes that it is possible to mass produce 5 nm product on slightly antiquated gear (previously used for 7 nm)—but the main caveats being increased expense and low yields. According to a DigiTimes Asia report, mass production of a 5 nm SoC on SMIC's existing DUV lithography would require four-fold patterning in a best case scenario.

NVIDIA GeForce RTX 4080 SUPER Starts Selling at $999

NVIDIA today launched the third and final high-end GPU in its GeForce RTX 40-series SUPER refresh. The new GeForce RTX 4080 SUPER is being launched at an attractive $999 price, compared to the $1,199 that the RTX 4080 originally launched at. Besides this huge 20% cut in pricing, there's also more performance on offer, as the company chose to max out the 5 nm AD103 silicon that it's based on. If you recall, the RTX 4080 has 76 out of 80 streaming multiprocessors of the AD103 enabled, and its memory runs at an odd 22.4 Gbps speed. The RTX 4080 SUPER gets all 80 SM, and a well rounded 23 Gbps memory speed.

With 80 SM on tap, you get 10,240 CUDA cores, 320 Tensor cores, 80 RT cores, 320 TMUs, and 112 ROPs. The memory size is unchanged at 16 GB, across the 256-bit wide memory interface of the AD103; as is the total graphics power (TGP), at 320 W. All cards will include an NVIDIA-designed adapter that converts three 8-pin PCIe power connectors into a 12VHPWR that's capable of delivering 450 W of power. The target audience for this card is the same as that of the RTX 4080—maxed out 4K Ultra HD gaming with ray tracing. At $999, the RTX 4080 SUPER allows NVIDIA to better compete with the AMD Radeon RX 7900 XTX that's sometimes spotted for prices as low as $900. Don't forget to catch our exhaustive review coverage from the links below!

NVIDIA GeForce RTX 4080 SUPER Founders Edition | ASUS ROG Strix RTX 4080 SUPER OC | ASUS TUF Gaming RTX 4080 SUPER OC | MSI RTX 4080 SUPER Expert | Gigabyte RTX 4080 SUPER Gaming OC | PNY RTX 4080 SUPER Verto | Galax RTX 4080 SUPER SG 1-click OC | Palit RTX 4080 SUPER GamingPro OC | Zotac RTX 4080 SUPER AMP Extreme AIRO

Canon Wants to Challenge ASML with a Cheaper 5 nm Nanoimprint Lithography Machine

Japanese tech giant Canon hopes to shake up the semiconductor manufacturing industry by shipping new low-cost nanoimprint lithography (NIL) machines as early as this year. The technology, which stamps chip designs onto silicon wafers rather than using more complex light-based etching like market leader ASML's systems, could allow Canon to undercut rivals and democratize leading-edge chip production. "We would like to start shipping this year or next year...while the market is hot. It is a very unique technology that will enable cutting-edge chips to be made simply and at a low cost," said Hiroaki Takeishi, head of Canon's industrial group overseeing nanoimprint lithography technological advancement. Nanoimprint machines target a semiconductor node width of 5 nanometers, aiming to reach 2 nm eventually. Takeishi said the technology has primarily resolved previous defect rate issues, but success will depend on convincing customers that integration into existing fabrication plants is worthwhile.

There is skepticism about Canon's ability to significantly disrupt the market led by ASML's expensive but sophisticated extreme ultraviolet (EUV) lithography tools. However, if nanoimprint can increase yields to nearly 90% at lower costs, it could carve out a niche, especially with EUV supply struggling to meet surging demand. Canon's NIL machines are supposedly 40% the cost of ASML machinery, while operating with up to 90% lower power draw. Initially focusing on 3D NAND memory chips rather than complex processors, Canon must contend with export controls limiting sales to China. But with few options left, Takeishi said Canon will "pay careful attention" to sanctions risks. If successfully deployed commercially after 15+ years in development, Canon's nanoimprint technology could shift the competitive landscape by enabling new players to manufacture leading-edge semiconductors at dramatically lower costs. But it remains to be seen whether the new machines' defect rates, integration challenges, and geopolitical headwinds will allow Canon to disrupt the chipmaking giants it aims to compete with significantly.

Intel, Marvell, and Synopsys to Showcase Next-Gen Memory PHY IP Capable of 224 Gbps on 3nm-class FinFET Nodes

The sneak peeks from the upcoming IEEE Solid State Circuit Conference continues, as the agenda items unveil interesting tech that will be either unveiled or demonstrated there. Intel, Synopsys, and Marvell, are leading providers of DRAM physical layer interface (PHY) IP. Various processor, GPU, and SoC manufacturers license PHY and memory controller IP from these companies, to integrate with their designs. All three companies are ready with over 200 Gbps around the 2.69 to 3 petajoule per bit range. This energy cost is as important as the data-rate on offer; as it showcases the viability of the PHY for a specific application (for example, a smartphone SoC has to conduct its memory sub-system at a vastly more constrained energy budget compared to an HPC processor).

Intel is the first in the pack to showcase a 224 Gbps sub-picojoule/bit PHY transmitter that supports PAM4 and PAM6 signaling, and is designed for 3 nm-class FinFET foundry nodes. If you recall, Intel 3 will be the company's final FinFET node before it transitions to nanosheets with the Intel 20A node. At the physical layer, all digital memory signal is analogue, and Intel's IP focuses on the DAC aspect of the PHY. Next up, is a somewhat similar transceiver IP by Synopsys. This too claims 224 Gbps speeds at 3 pJ/b, but at a 40 dB insertion loss; and is designed for 3 nm class FinFET nodes such as the TSMC N3 family and Intel 3. Samsung's 3 nm node uses the incompatible GAAFET technology for its 3 nm EUV node. Lastly, there's Marvell, with a 212 Gb/s DSP-based transceiver for optical direct-detect applications on the 5 nm FinFET nodes, which is relevant for high speed network switching fabrics.

NVIDIA GeForce RTX 4070 Ti SUPER Starts Selling

NVIDIA GeForce RTX 4070 Ti SUPER starts selling today, at a starting price of USD $800. This is the second in a three-part product stack refresh under the SUPER brand extension. This card is designed for maxed out AAA gaming at 1440p, 4K Ultra HD gaming at fairly high settings, 1440p high refresh-rate gaming, as well as gaming in certain ultra-wide resolutions such as 3440 x 1440. The new RTX 4070 Ti SUPER is carved out from the 5 nm AD103 silicon, a physically larger silicon than the AD104 that it had maxed out with the original RTX 4070 Ti. The card enjoys not just a 10% increase in CUDA cores and 20% increase in ROPs, but also a larger 16 GB memory size, across a wider 256-bit memory interface, which is a straight 33% increase in memory bandwidth. The GeForce RTX 4070 Ti SUPER is a partner exclusive launch—there's no Founders Edition card from NVIDIA. Instead, the company ensured that nearly every board partner has cards to offer at the $800 MSRP, with premium overclocked cards being priced in the $850-$900 range.

We have a plethora of RTX 4070 Ti SUPER reviews for you to devour!

ASUS ROG Strix RTX 4070 Ti SUPER OC | ASUS TUF Gaming RTX 4070 Ti SUPER | MSI RTX 4070 Ti SUPER Ventus 3X | Gigabyte RTX 4070 Ti SUPER Gaming OC | Palit RTX 4070 Ti SUPER JetStream OC | PNY RTX 4070 Ti SUPER Verto OC | Gainward RTX 4070 Ti SUPER Phoenix GS | Zotac RTX 4070 Ti SUPER Trinity | Galax RTX 4070 Ti SUPER EX Gamer White

NVIDIA GeForce RTX 4070 SUPER Goes on Sale, Starting at $599

NVIDIA GeForce RTX 4070 SUPER started selling today. The card is generally available, with the NVIDIA MSRP set at USD $599. The RTX 4070 Super is part of a three product refresh of the GeForce RTX 40-series product stack that NVIDIA announced at its 2024 International CES event, on January 8. It offers more performance for the price the RTX 4070 originally sold at, which now comes with a price cut to $549, with its real-world pricing expected to be between $510-560. The RTX 4070 SUPER is based on the same 5 nm "AD104" silicon as the RTX 4070 and RTX 4070 Ti, but comes with a decent bump in shaders over the original RTX 4070.

The GeForce RTX 4070 SUPER is configured with 7,168 CUDA cores—a 21 percent increase over the RTX 4070. It also gets an extra 16 ROPs, maxing out the 80 ROPs present on the silicon. What's more, NVIDIA also unlocked the full 48 MB of on-die L2 cache memory for the RTX 4070 SUPER, which is the same as the RTX 4070 Ti. The original RTX 4070 only has 36 MB of this cache enabled. Spare for 4 SM worth 512 shaders, the RTX 4070 SUPER is almost an RTX 4070 Ti, but there's one last differentiator—power limits. The RTX 4070 SUPER is configured with a total graphics power (TGP) of 220 W, whereas the RTX 4070 Ti has it set at 285 W. Some of the factory-overclocked RTX 4070 SUPER cards attempt to raise this limit by around 20 W. NVIDIA has decided to phase out the RTX 4070 Ti from its product stack, which finds itself replaced with the GeForce RTX 4070 Ti SUPER, coming in next week.

Our extensive Review coverage is as follows: NVIDIA GeForce RTX 4070 SUPER Founders Edition | ASUS TUF Gaming RTX 4070 SUPER OC | Palit RTX 4070 SUPER JetStream | GIGABYTE RTX 4070 SUPER AORUS Master | ZOTAC RTX 4070 SUPER Trinity Black | ASUS RTX 4070 SUPER DUAL | PNY RTX 4070 SUPER Verto | Gainward RTX 4070 SUPER Ghost

NVIDIA Announces the GeForce RTX 40 SUPER Series Graphics Cards

NVIDIA today gave its GeForce RTX 40-series "Ada" a midlife refresh targeting the higher end of its product stack, with the new GeForce RTX 4070 SUPER, GeForce RTX 4070 Ti SUPER, and the GeForce RTX 4080 SUPER. The new RTX 4080 SUPER replaces the current RTX 4080, which will gradually be phased out of the market. The new RTX 4070 Ti SUPER does the same to the current RTX 4070 Ti. The RTX 4070 SUPER, however, will coexist with the current RTX 4070, albeit at a slight price premium. The RTX 4070 SUPER and RTX 4070 Ti SUPER are both being recommended by NVIDIA for maxed out 1440p gaming with full ray tracing; while the RTX 4080 SUPER is for those who want to max out gameplay at 4K with full ray tracing. The RTX 4070 SUPER and RTX 4070 Ti SUPER should still very much be capable of 4K gaming and more than acceptable frame rates, especially given the latest DLSS 3 Frame Generation and its proliferation among new AAA titles.

NVIDIA is giving the three new graphics card SKUs a staggered launch spread across January 2024. The RTX 4070 SUPER should be available to purchase on January 17, at a starting price of $599, which was the original MSRP of the RTX 4070. After this launch, the RTX 4070 slides down a bit to $549 while remaining in the product stack. Things get interesting higher up the stack. The RTX 4070 Ti SUPER, which goes on sale on January 24, is priced at $799, while the current RTX 4070 Ti is being retired from the product stack. The remaining RTX 4070 Ti cards should be up at slightly discounted prices.

Huawei Still Ships 5 nm TSMC Chips in its Laptops, Despite US Sanctions

According to the latest teardown from TechInsights, China's biggest technology maker, Huawei, has been shipping laptops with technology supposedly sanctioned by the United States. As the teardown shows, TechInisights has discovered that Huawei's Kirin 9006C processor is manufactured on TSMC's 5 nm semiconductor technology. Originally, the United States have imposed sanctions on Huawei back in 2020, when the government cut off Huawei's access from TSMC's advanced facilities and forbade the use of the latest nodes by Huawei's HiSilicon chip design arm. Today's findings show signs of contradiction, as the Qingyun L540 notebook that launched in December 2023 employs a Kirin 9006C chipset manufactured on a TSMC 5 nm node.

TechInsight's findings indicate that Kirin 9006C assembly and packaging occurred around the third quarter of 2020, whereas the 2020 Huawei sanctions started in the second quarter. Of course, the implication of the sanctions likely prohibited any new orders and didn't prevent Huawei from possibly stockpiling millions of chip orders in its warehouse before they took place. The Chinese giant probably made orders beforehand and is using the technology only now, with the Qingyun L540 laptop being one of the first Kirin 9006C appearances. Some online retailers also point out that the laptop complies with the latest security practices required for the government, which means that they have been in the works since the chip began the early stages of design, way before 2020. We don't know the stockpile quantity, but SMIC's domestic efforts seem insufficient to supply the Chinese market alone. The news that Huawei is still using TSMC chips made SMIC's share go for a 2% free fall on the Hong Kong stock exchange.

DEEPX's DX-M1 Chip Recognized at CES 2024 as Leading AI of Things Solution

DEEPX (CEO, Lokwon Kim), an original AI semiconductor technology company, is announcing that it has surpassed 40 customers for its flagship chip solution, DX-M1—the only AI accelerator on the market to combine low power consumption, high efficiency and performance, and cost-effectiveness. The groundbreaking solution has been deployed for a hands-on trial to this customer pool, which spans global companies and domestic Korean enterprises across various sectors.

DEEPX is currently running an Early Engagement Customer Program (EECP) to provide customers with early access to its small camera module, a one-chip solution featuring DX-V1; M.2 module featuring DX-M1; and DXNN, the company's developer environment. This allows customers to receive pre-production validation of DEEPX's hardware and software, integrate them into mass-produced products, and realize AI technology innovations with the brand's technical support.

AMD Close to Launching Radeon RX 7800M Series Based on "Navi 32"

AMD's small Radeon RX 7000M and RX 7000S lines of mobile GPUs based on the latest RDNA 3 graphics architecture includes just five SKUs, spanning the "Navi 31" and "Navi 33" chips. The RX 7000M series only has enthusiast-segment RX 7900M series based on the "Navi 31," and the RX 7600M series based on the "Navi 33," leaving a vast gap that the company plans to fill with RX 7800M series and RX 7700M series SKUs based on the "Navi 32," referred to internally as "Cuarzo Verde." The GPU is meant to be hardwired onto the mainboards of gaming notebooks, however, AMD hands out reference-design MXM boards to OEMs. These were sniffed out in a public shipping manifest harukaze5719 on Twitter.

The "Navi 32" package is roughly similar in size to the compacted "Navi 31" package powering the RX 7900M series. It has a physically smaller 5 nm GCD with 60 compute units compared to the 96 on the "Navi 31" GCD; and is surrounded by four 6 nm MCDs, which give it 64 MB of Infinity Cache, and a 256-bit GDDR6 memory bus. With this, AMD has the option of not just carving out RX 7800M series and RX 7700M series SKUs, but also RX 7900S series SKUs within its segment aimed at gaming-grade ultraportables. We could see some product announcements to this effect Q1 2024, alongside some new desktop SKUs.

ASUS Intros China-exclusive Radeon RX 7900 GRE TUF White

ASUS over the weekend introduced the China-exclusive Radeon RX 7900 GRE TUF Gaming White graphics card. The card shares a lot in common with the other RX 7900 series TUF Gaming custom-design cards, but swaps out the gunmetal-gray cooler shroud and backplate combo for one that's matte white. The whitewash even extends to the impellers of the three Axial Tech fans. The PCB remains black, but due to the 3-D design of the shroud and backplate, is largely concealed. The card draws power from a pair of 8-pin PCIe power connectors, and uses a 14-phase VRM to condition it for the "Navi 31 XL" ASIC.

The Radeon RX 7900 GRE (golden rabbit edition) is based on a "Navi 31 XL," a unique package that combines the 5 nm graphics compute die (GCD) of "Navi 31," with the 4-MCD (memory cache die) setup of the smaller "Navi 32" package. AMD designed this primarily to drive the mobile RX 7900 series SKUs, but it found its way to the desktop platform to fill the gap between the RX 7800 XT and RX 7900 XT, and possibly undercut the GeForce RTX 4070 Ti. It is configured with 80 out of 96 available compute units on the GCD, giving it 5,120 stream processors, 160 AI accelerators, and 80 Ray accelerators, the Infinity Cache size is reduced to 64 MB, since there are only 4 MCDs, driving its 256-bit memory bus that handles 16 GB of 18 Gbps GDDR6 memory (576 GB/s bandwidth).

Two New Marvell OCTEON 10 Processors Bring Server-Class Performance to Networking Devices

Marvell Technology, a leader in data infrastructure semiconductor solutions, is enabling networking equipment and firewall manufacturers achieve breakthrough levels of performance and efficiency with two new OCTEON 10 data processing units (DPUs), the OCTEON 10 CN102 and OCTEON 10 CN103. The 5 nm OCTEON CN102 and CN103, broadly available to OEMs for product design and pilot production, are optimized for data and control plane applications in routers, firewalls, 5G small cells, SD-WAN appliances, and control plane applications in top-of-rack switches and line card controllers. Several of the world's largest networking equipment manufacturers have already incorporated the OCTEON 10 CN102 into a number of product designs.

Containing up to eight Arm Neoverse N2 cores, OCTEON 10 CN102 and CN103 deliver 3x the performance of Marvell current DPU solutions for devices while reducing power consumption by 50% to 25 W. Achieving SPEC CPU (2017) integer rate (SPECint) scores of 36.5, OCTEON 10 CN102 and CN103 are able to deliver nearly 1.5 SPECint points per Watt. The chips can serve as an offload DPU for host processors or as the primary processor in devices; advanced performance per watt also enables OEMs to design fanless systems to simplify systems and further reduce cost, maintenance and power consumption.

Top 10 Foundries Experience 7.9% QoQ Growth in 3Q23, with a Continued Upward Trend Predicted for Q4

TrendForce's research indicates a dynamic third quarter for the global foundry industry, marked by an uptick in urgent orders for smartphone and notebook components. This surge was fueled by healthy inventory levels and the release of new iPhone and Android devices in 2H23. Despite persisting inflation risks and market uncertainties, these orders were predominantly executed as rush orders. Additionally, TSMC and Samsung's high-cost 3 nm manufacturing process had a positive impact on revenues, driving the 3Q23 value of the top ten global foundries to approximately US$28.29 billion—a 7.9% QoQ increase.

Looking ahead to 4Q23, the anticipation of year-end festive demand is expected to sustain the inflow of urgent orders for smartphones and laptops, particularly for smartphone components. Although the end-user market is yet to fully recover, pre-sales season stockpiling for Chinese Android smartphones appears to be slightly better than expected, with demand for mid-to-low range 5G and 4G phone APs and continued interest in new iPhone models. This scenario suggests a continued upward trend for the top ten global foundries in Q4, potentially exceeding the growth rate seen in Q3.

Microsoft Introduces 128-Core Arm CPU for Cloud and Custom AI Accelerator

During its Ignite conference, Microsoft introduced a duo of custom-designed silicon made to accelerate AI and excel in cloud workloads. First of the two is Microsoft's Azure Cobalt 100 CPU, a 128-core design that features a 64-bit Armv9 instruction set, implemented in a cloud-native design that is set to become a part of Microsoft's offerings. While there aren't many details regarding the configuration, the company claims that the performance target is up to 40% when compared to the current generation of Arm servers running on Azure cloud. The SoC has used Arm's Neoverse CSS platform customized for Microsoft, with presumably Arm Neoverse N2 cores.

The next and hottest topic in the server space is AI acceleration, which is needed for running today's large language models. Microsoft hosts OpenAI's ChatGPT, Microsoft's Copilot, and many other AI services. To help make them run as fast as possible, Microsoft's project Athena now has the name of Maia 100 AI accelerator, which is manufactured on TSMC's 5 nm process. It features 105 billion transistors and supports various MX data formats, even those smaller than 8-bit bit, for maximum performance. Currently tested on GPT 3.5 Turbo, we have yet to see performance figures and comparisons with competing hardware from NVIDIA, like H100/H200 and AMD, with MI300X. The Maia 100 has an aggregate bandwidth of 4.8 Terabits per accelerator, which uses a custom Ethernet-based networking protocol for scaling. These chips are expected to appear in Microsoft data centers early next year, and we hope to get some performance numbers soon.

AMD Announces Ryzen Embedded 7000 Series Processors Powered by Zen 4

AMD today announced at Smart Production Solutions 2023 the AMD Ryzen Embedded 7000 Series processor family, optimized for the high-performance requirements of industrial markets. By combining "Zen 4" architecture and integrated Radeon graphics, Ryzen Embedded 7000 Series processors deliver performance and functionality not previously offered for the embedded market. With its expanded features and integration, Ryzen Embedded 7000 Series processors are ideal for a wide range of embedded applications, including industrial automation, machine vision, robotics and edge servers.

The Ryzen Embedded 7000 Series processor is the first embedded processor to use next-generation 5 nm technology with a 7-year manufacturing availability commitment. The new embedded processor integrates AMD Radeon RDNA 2 graphics that eliminates the need for a discrete GPU for industrial applications. And because embedded applications require additional operating system software options, Ryzen Embedded 7000 Series processors include support for both Windows Server and Linux Ubuntu, on top of Windows 10 and Windows 11. Ryzen Embedded 7000 Series processors also include up to 12 high-performance "Zen 4" CPU cores, which combined with its integrated features and wide operating system choices, offers unparalleled ease of integration for system designers.

AMD Mobile Processor Lineup in 2025 Sees "Fire Range," "Strix Halo," and Signficant AI Performance Increases

With Windows 11 23H2 setting the stage for increased prevalence of AI in client PC use cases, the new hardware battleground between AMD and its rivals Intel, Apple, and Qualcomm, will be in equipping their mobile processors with sufficient AI acceleration performance. AMD already introduced accelerated AI with the current "Phoenix" processor that debuts Ryzen AI, and its Xilinx XDNA hardware backend that provides a performance of up to 16 TOPS. This will see a 2-3 fold increase with the company's 2024-25 mobile processor lineup, according to a roadmap leak by "Moore's Law is Dead."

At the very top of the pile, in a product segment called "ultimate compute," which consists of large gaming notebooks, mobile workstations, and desktop-replacements; the company's current Ryzen 7045 "Dragon Range" processor will continue throughout 2024. Essentially a non-socketed version of the desktop "Raphael" MCM, "Dragon Range" features up to two 5 nm "Zen 4" CCDs for up to 16 cores, and a 6 nm cIOD. This processor lacks any form of AI acceleration. In 2025, the processor will be succeeded with "Fire Range," a similar non-socketed, mobile-friendly MCM that's derived from "Granite Ridge," with up to two 4 nm "Zen 5" CCDs for up to 16 cores; and the 6 nm cIOD. What's interesting to note here, is that the quasi-roadmap makes no mention of AI acceleration for "Fire Range," which means "Granite Ridge" could miss out on Ryzen AI acceleration from the processor. Modern discrete GPUs from both NVIDIA and AMD support AI accelerators, so this must have been AMD's consideration to exclude an XDNA-based Ryzen AI accelerator on "Fire Range" and "Granite Ridge."
Return to Keyword Browsing
Nov 21st, 2024 11:47 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts