News Posts matching #AD102

Return to Keyword Browsing

ZOTAC's Gigantic GeForce RTX 4090 D PGF OC Edition Card Gets Reviewed

ZOTAC debuted a massive flagship GeForce RTX 4090 24 GB custom design graphics card last summer—the Prime Gamer Force (PGF) OC edition model was released as a China exclusive product. ZOTAC's PGF shroud design remains the largest on the market—381 mm (L) x 154 mm (W) x 74 mm (D)—even with downgraded silicon beneath the surface. NVIDIA's China-specific GeForce RTX 4090D GPU was introduced last December, as a sanction conformant substitute for the full-fat version—naturally, ZOTAC has prepared a revised PGF model. This week, Expreview has published an in-depth review of the GeForce RTX 4090 D PGF OC edition graphics card. They found that ZOTAC's cooling system—three 11 cm fans and a vapor chamber—offered: "high-frequency stability...comparable to that of water-cooled (solutions)."

The Chinese publication reviewed the GALAX RTX 4090 D Metal Master model in January—at the time, software restrictions prevented the implementation of significant overclocks. It was theorized that future updates or community workarounds could bypass limitations, but the latest review—of ZOTAC's "super luxurious" PGF edition—indicates that this GeForce RTX 4090 D GPU's OC potential is still constricted. VideoCardz has pulled out essential details from the Expreview article: "(The PGF) has high maximum TGP (530 W) and a powerful 28-phase power PCB design. Despite the technological headroom, the card struggles to offer much of the overclocking potential. The team from Expreview only managed to squeeze 3.7% more performance from this card. That's despite 24.7% more power theoretically available." An underwhelming overclocking aspect is counterbalanced by the premium-tier card's impressive performance stability—the review also praised ZOTAC's quiet cooling solution and usage of high-end "heat dissipation materials."

Honkai: Star Rail-themed MSI GeForce RTX 4090 SUPRIM X Special Edition Gets Leaked

An MSI "Ruan Mei" special edition GeForce RTX 4090 SUPRIM X model has been teased by hongxing2020 on social media—it is not clear whether this is an intentional leak, given the formal office setting photographed in the background. Seemingly finalized retail packaging, heavily updated shroud plus backplate designs, and bundled poster and mousepad merchandise are emblazoned with a 5-star rated Honkai: Star Rail character. As noted by VideoCardz, the web link displayed on MSI's accompanying poster would indicate that the special green and gold special SUPRIM X model is destined for the Chinese PC hardware market—where miHoYo's Honkai games franchise is super popular.

The MSI GeForce RTX 4090 SUPRIM X "Ruan Mei" special edition's packaging does not make a distinction between the onboard GPU being a US sanction-compliant NVIDIA GeForce RTX 4090 D variant, or the original + uncompromised AD102-300-A1. Perhaps it is safe to assume that a first quarter 2024 launch model will sport ever so slightly downgraded internals—MSI has already prepared their standard silver range-topping GeForce RTX 4090 D SUPRIM X model for the region. A small batch of Ruan Mei limited editions could arrive at a later date, once the the standard card has cleared the way.

GALAX GeForce RTX 4090D Tested: ~5% Slower Than Standard RTX 4090

The first review of a Chinese-exclusive "RTX 4090D" GPU model hit the internet last week—Expreview received a sample GALAX RTX 4090 D Metal Master model not long ago, and their testing team proceeded to find out whether the nerfed version of NVIDIA's flagship gaming GPU was truly compromised in terms of performance. Effective October 2023, the US Federal Trade Commission placed restrictions on Team Green—thus blocking trade of units based on the "Ada Lovelace" AD102-300 GPU in China. In turn, a variant—AD102-250-A1—was prepared in order to confirm to new policies.

NVIDIA's China-specific GeForce RTX 4090D launched officially right at the end of 2023. Board partner GALAX seems to be leading the pack, with customized versions being sent out for evaluation. The GeForce RTX 4090D GPU arrives with a lesser configuration: 14,592 CUDA, 456 Tensor, and 114 RT cores—but the first review indicates that this only trails behind its uncompromised sibling by roughly 5 to 6% across sixteen games. It lags behind in Stable Diffusion benchmarks—an AI workload at 512x512 resolution shows a 10% difference, although the gaps narrows at 768x768 and 1024x1024.

NVIDIA's China-only GeForce RTX 4090D Launched with Fewer Shaders than Regular RTX 4090

NVIDIA today formally launched the China-specific GeForce RTX 4090D graphics card for gaming and creator applications, after it was banned this October by the United States Federal Trade Commission from exporting the RTX 4090 (among other AI GPUs) to China. The RTX 4090D comes with a reduced AI inference performance than the RTX 4090 to comply with the US-FTC limits, and measures are put in place to prevent end-users from modifying it into a regular RTX 4090. Besides firmware and driver-level performance limiters, the card gets a completely different ASIC code, a different device ID (which prevents BIOS transplants from the RTX 4090); and a different core-configuration of the 5 nm "AD102" silicon itself.

The "AD102" silicon physically has 72 TPCs (144 SM), from which NVIDIA carved out the original RTX 4090 by enabling 64 TPCs (128 SM). The new RTX 4090D only gets 57 TPCs (114 SM), which reduces the counts of the CUDA cores, Tensor cores, and RT cores. While the original RTX 4090 has 16,384 CUDA cores, 512 Tensor cores, and 128 RT cores; the new RTX 4090D is configured with 14,592 CUDA cores, 456 Tensor cores, and 114 RT cores. The GPU clocks are the same, both boost up to 2.52 GHz, although the power limits are reduced, with the TGP lowered by 25 W. The memory sub-system appears untouched. We are also hearing that overclocking of the RTX 4090D will be limited, with lower slider limits—all to prevent end-users from regaining AI inference performance levels comparable to an RTX 4090. NVIDIA is pricing the RTX 4090D at a baseline price of RMB ¥12,999 ($1,840), which was the launch price of the original RTX 4090 in China, before it was scalped out of existence there.

No Overclocking and Lower TGP for NVIDIA GeForce RTX 4090 D Edition for China

NVIDIA is preparing to launch the GeForce RTX 4090 D, or "Dragon" edition, designed explicitly for China. Circumventing the US export rules of GPUs that could potentially be used for AI acceleration, the GeForce RTX 4090 D is reportedly cutting back on overclocking as a feature. According to BenchLife, the AD102-250 GPU used in the RTX 4090 D will be a stranger to overclocking, as the card will not support it, possibly being disabled by firmware and/or physically in the die. The information from @Zed__Wang suggests that the Dragon version will be running at 2280 MHz base frequency, higher than the 2235 MHz of AD102-300 found in the regular RTX 4090, and 2520 MHz boost, matching the regular version.

Interestingly, the RTX 4090 D for China will also feature a slightly lower Total Graphics Power (TGP) of 425 Watts, down from the 450 Watts of the regular model. With memory configuration appearing to be the same, this new China-specific model will most likely perform within a few percent of the original design. Higher base frequency probably indicates a lack of a few CUDA cores to comply with the US export regulation policy and serve the Chinese GPU market. The NVIDIA GeForce RTX 4090 D is scheduled for rollout in January 2024 in China, which is just a few weeks away.

Special Chinese Factories are Dismantling NVIDIA GeForce RTX 4090 Graphics Cards and Turning Them into AI-Friendly GPU Shape

The recent U.S. government restrictions on AI hardware exports to China have significantly impacted several key semiconductor players, including NVIDIA, AMD, and Intel, restricting them from selling high-performance AI chips to Chinese land. This ban has notably affected NVIDIA's GeForce RTX 4090 gaming GPUs, pushing them out of mainland China due to their high computational capabilities. In anticipation of these restrictions, NVIDIA reportedly moved a substantial inventory of its AD102 GPUs and GeForce RTX 4090 graphics cards to China, which we reported earlier. This could have contributed to the global RTX 4090 shortage, driving the prices of these cards up to 2000 USD. In an interesting turn of events, insiders from the Chinese Baidu forums have disclosed that specialized factories across China are repurposing these GPUs, which arrived before the ban, into AI solutions.

This transformation involves disassembling the gaming GPUs, removing the cooling systems and extracting the AD102 GPU and GDDR6X memory from the main PCBs. These components are then re-soldered onto a domestically manufactured "reference" PCB, better suited for AI applications, and equipped with dual-slot blower-style coolers designed for server environments. The third-party coolers that these GPUs come with are 3-4 slots in size, whereas the blower-style cooler is only two slots wide, and many of them can be placed in parallel in an AI server. After rigorous testing, these reconfigured RTX 4090 AI solutions are supplied to Chinese companies running AI workloads. This adaptation process has resulted in an influx of RTX 4090 coolers and bare PCBs into the Chinese reseller market at markedly low prices, given that the primary GPU and memory components have been removed.
Below, you can see the dismantling of AIB GPUs before getting turned into blower-style AI server-friendly graphics cards.

NVIDIA is Rushing GeForce RTX 4090 Orders to China Before Export Restrictions

NVIDIA is reportedly rushing shipments of GeForce RTX 4090 GPUs to China in anticipation of expected export restrictions. We have already reported that NVIDIA might be canceling 5 billion US Dollars worth of orders. The US government will require an export license for shipping RTX 4090s to China, effectively restricting sales to the country. NVIDIA's add-in-board (AIB) partners are reportedly working at full capacity to produce as many RTX 4090 products for the Chinese market as possible before the potential restriction on November 17. While it remains unclear whether the export restrictions will ultimately be implemented, the anticipation of such measures has prompted NVIDIA and its partners to accelerate their production.

The Tweet that feeds this information is coming from Zed Wang, a well-known hardware leaker with historically accurate insights into NVIDIA's operations, who claims that "NVIDIA has been shipping tons of AD102 for AICs this week to manufacture as much RTX 4090 as possible before the original restriction date of RTX 4090 in China. It is still unclear whether the restriction will become true or not. But all AICs are at their full power in producing RTX 4090, regardless of that."

NVIDIA GeForce RTX 4080 SUPER to Feature 20GB Memory, Based on AD102

NVIDIA's upcoming mid-life refresh for its GeForce RTX 40-series "Ada" product stack sees the introduction of three new SKUs, led by the GeForce RTX 4080 SUPER, as was reported last week. In the older report, we speculated how NVIDIA could go about creating the RTX 4080 SUPER. BenchLife reports that the RTX 4080 SUPER will be given 20 GB as its standard memory size, and will be based on the larger "AD102" silicon. The SKU will utilize a 320-bit wide memory interface carved out of the 384-bit available to the silicon. The "AD102" has 144 streaming multiprocessors (SM) on die, from which the flagship RTX 4090 is configured with 128, and so NVIDIA could pick an SM count that's lower than that of the RTX 4090, while being higher than the 76 of the current RTX 4080.

NVIDIA Readies GeForce RTX 4070 SUPER, RTX 4070 Ti SUPER, and RTX 4080 SUPER

NVIDIA is rumored to be working on a refresh of the higher end of its GeForce RTX 40-series "Ada" series, according to hongxing2020, a reliable source with NVIDIA leaks. The company could be bringing back the SUPER brand extension that it introduced with the RTX 20-series. As many as three SKUs are on the radar—GeForce RTX 4080 SUPER, GeForce RTX 4070 Ti SUPER, and the GeForce RTX 4070 SUPER.

There is no word on when the company plans to release these, or what their specifications are, but we can certainly speculate. The current RTX 4080, while based on the AD103 silicon, doesn't max it out—it uses 76 out of 80 SM (streaming multiprocessors) available on the silicon, but we doubt if those extra 4 SM could drive up enough performance to make a whole new SKU, especially given that the 256-bit memory bus of the AD103 is maxed out. We predict that the RTX 4080 SUPER could be based on the larger AD102 silicon that physically has 144 SM that the current RTX 4090 uses 128 out of. NVIDIA has the opportunity to pick an SM count such as, say, 96. AD102 also has a wider 384-bit memory bus, giving NVIDIA the option of either giving the RTX 4080 SUPER the same 24 GB memory configuration as the RTX 4090, or even 20 GB, across a 320-bit memory bus.

More Pictures of NVIDIA's Cinder Block-sized RTX 4090 Ti Cooler Surface

Back in January, we got our first look at the cinder block-like 4-slot cooling solution of NVIDIA's upcoming flagship graphics card (called either the RTX 4090 Ti, or the TITAN (Ada). "ExperteVallah" on Twitter scored additional pictures of the cooler. Its design sees the heat dissipation surface pushed to the entire thickness of the cooler, and ventilated the entire length.

The card's PCB isn't conventional—not perpendicular to the plane of the motherboard like any other add-in card—but is rather along the plane of the motherboard, with additional breakaway daughter cards interfacing with the sole 12VHPWR power connector, and the PCIe slot. This slender, ruler-shaped PCB spans the entire length of the card, without coming in the way of its heat dissipation surfaces. The length is used for the large AD102 ASIC that's probably maxed out (with all its 144 SM enabled), twelve GDDR6X (possibly faster 23 Gbps), and a mammoth VRM that nearly maxes out the 600 W continuous power delivery design limit of the 12VHPWR.

TechPowerUp GPU-Z 2.54.0 Released

TechPowerUp today released the latest version of TechPowerUp GPU-Z, the popular graphics sub-system information, monitoring, data-logging, and diagnostic tool for gamers, PC enthusiasts, overclockers, and engineers. The latest version 2.54.0 adds support for new graphics cards, and has several improvements that we're sure you'll find useful. Among the new NVIDIA GPUs supported are the GeForce RTX 4060, RTX 4060 Ti, RTX 3060 (based on GA104-B), RTX 3050 Laptop GPU 4 GB, RTX 3050 Mobile 6 GB (based on GA107-B), 40-2Q, L4, RTX A500 Mobile, RTX 2000 Ada Mobile, RTX 4000 SFF Ada, RTX 5000 Ada Mobile. The new AMD GPUs supported include Radeon RX 7600, Pro W7800, W7900, E8860, Ryzen Phoenix Radeon 7x0M, and Ryzen Z1 Extreme. The new Intel GPUs supported include Arc Pro A60, A60M, Flex 140, Iris Xe Max 100, additional Raptor Lake iGPU variants. Vendor support is added for Sparkle (Intel Arc board partner).

With this release, we've added the ability to monitor and log the real-time video memory read/write bandwidth usage for Intel Arc GPUs. Power monitoring for Intel Arc GPUs was broken after a recent driver update, this is fixed now. We've also improved the video codec capability detection in the DXVA section of the Advanced tab, for all GPUs. The XML Dump output file now includes GPU transistor-count and release date. The Fake GPU detection has been improved. Die-size values for NVIDIA GeForce Ada GPUs have been fixed, as is the transistor-count of RTX 4070 Mobile (based on AD106). Grab GPU-Z from the link below.

DOWNLOAD: TechPowerUp GPU-Z 2.54.0
The change-log follows.

NVIDIA RTX 5000 Ada Generation Workstation GPU Mentioned in Official Driver Documents

NVIDIA's rumored RTX 5000 Ada Generation GPU has been outed once again, according to VideoCardz - the cited source being a keen-eyed member posting information dumps on a laptop discussion forum. Team Green has released new driver documentation that makes mention of hardware ID "26B2" under an entry for a now supported device: "NVIDIA RTX 5000 Ada Generation." Forum admin StefanG3D posted the small discovery on their favored forum in the small hours of Sunday morning (April 23).

As reported last month, the NVIDIA RTX 5000 Ada is destined to sit between existing sibling workstation GPUs - the AD102-based RTX 6000 and AD104-based RTX 4000 SFF. Hardware tipster kopite7kimi has learned enough to theorize that the NVIDIA RTX 5000 Ada Generation workstation graphics card will feature 15,360 CUDA cores and 32 GB of GDDR6 memory. The AD102 GPU is expected to sit at the heart of this unannounced card.

NVIDIA Enables More Encoding Streams on GeForce Consumer GPUs

NVIDIA has quietly removed some video encoding limitations on its consumer GeForce graphics processing units (GPUs), allowing encoding of up to five simultaneous streams. Previously, NVIDIA's consumer GeForce GPUs were limited to three simultaneous NVENC encodes. The same limitation did not apply to professional GPUs.

According to NVIDIA's own Video Encode and Decode GPU Support Matrix document, the number of concurrent NVENC encodes on consumer GPUs have been increased from three to five. This includes certain GeForce GPUs based on Maxwell 2nd Gen, Pascal, Turing, Ampere, and Ada Lovelace GPU architectures. While the number of concurrent NVDEC decodes were never limited, there is a limitation on how many streams you can encode by certain GPU, depending on the resolution of the stream and the codec.

NVIDIA Preparing RTX 5000 Ada Generation Workstation GPU

In addition to the RTX 4000 SFF Ada Generation workstation GPU launched at the GTC 2023, NVIDIA is apparently also working on the NVIDIA RTX 5000 Ada Generation, which should fit between the previously available AD102-based RTX 6000 Ada Generation workstation graphics card and the new AD104-based RTX 4000 SFF Ada Generation.

According to a fresh report coming from kopite7kimi, the NVIDIA RTX 5000 Ada Generation workstation GPU packs 15,360 CUDA cores and 32 GB of GDDR6 memory. If these specifications are spot on, the RTX 5000 Ada Generation GPU should be also based on the AD102 GPU, with a memory interface cut-down to 256-bit to match the 32 GB of GDDR6 memory. NVIDIA also has enough room to fill the rest of the lineup, but judging from this information, there will be a pretty big gap between the RTX 6000 and RTX 5000 Ada Generation workstation GPUs.

NVIDIA RTX 4090 Ti / RTX TITAN (Ada) Pictured, Behold the 4-slot Cinder Block

Here's the very first picture of an alleged upcoming NVIDIA flagship/halo product to be positioned above the GeForce RTX 4090. There are two distinct brand names being rumored for this product—the GeForce RTX 4090 Ti, and the NVIDIA RTX TITAN (Ada). The RTX 4090 only uses 128 out of 144 (88 percent) of the streaming multiprocessors (SM) on the 4 nm "AD102" silicon, leaving NVIDIA with plenty of room to design a halo product that maxes it out. Besides maxing out the silicon, NVIDIA has the opportunity to increase the typical graphics power closer to the 600 W continuous power-delivery limit of the 16-pin ATX 12VHPWR connector; and use faster 24 Gbps-rated GDDR6X memory chips (the RTX 4090 uses 21 Gbps memory).

The card is 4 slots thick, with the rear I/O bracket covering all 4 slots. The card's display outputs are arranged along the thickness of the card, rather than along the base. The cooler is a monstrous scale-up of the Dual-Axial Flow Through cooler of the RTX 4090 Founders Edition. The card is designed such that the PCB doesn't come up perpendicular to the plane of the motherboard like any other add-on card, but rather, the PCB is parallel to the plane of the motherboard. The PCB is arranged along the thickness of the card. This has probably been done to maximize the spatial volume occupied by the cooling solution, and probably even make room for a third fan. We also predict that the PCB is split in such a way that a smaller PCB has the display I/O, and yet another PCB handles the PCI-Express slot interface. Sufficed to say, the RTX 4090 Ti / RTX TITAN will be an engineering masterpiece by NVIDIA.

NVIDIA Could Give TITAN RTX Another Swing as Maxed-Out AD102 in an Unabashed 4-slot Monstrosity

A report by Moore's Law is Dead claims that NVIDIA is preparing to launch a new TITAN RTX halo product, based on a maxed-out 4 nm "AD102" silicon. Where does this put the RTX 4090 Ti? Somewhere in between the RTX 4090 and the TITAN RTX Ada, as NVIDIA gave itself plenty of segmentation headroom with the AD102 silicon, by using just 128 out of 144 SM physically present on the silicon, besides the same 21 Gbps GDDR6X memory as the previous-generation. NVIDIA's options with the new TITAN RTX include enabling all 144 SM (18,432 CUDA cores), and using faster 24 Gbps memory, giving the silicon (1152 GB/s memory bandwidth), a stock power-limit closer to the 600 W design limit of the 12VHPWR power connector (RTX 4090 stock typical board power is 450 W).

Moore's Law is Dead also posted what they claim to be the first real-world pictures of the upcoming TITAN RTX Ada. The card is an unabashed 4-slot enlargement of the dual-axial flow-through RTX 4090 Founders Edition, with the cooler capable of higher thermal loads. TITAN RTX cards are marketed as first-party Founders Edition cards only, and not through NVIDIA's AIC board partners as custom-designs. A maxed out AD102, with higher clock speeds, higher power-limit, and faster memory, should be unassailable for custom-design RTX 4090 cards, if NVIDIA wants to sell this card at the kind of prices its last TITAN RTX product sold at—USD $2,500.

GALAX Blurts Out GeForce RTX 4090 Ti HOF Product Branding

GALAX in its website's front-page carousel, may have inadvertently blurted out the existence of a GeForce RTX 4090 Ti "Ada" SKU in the works. This may well be a typo by the designer of its carousel graphic, but the existence of an RTX 4090 Ti SKU isn't a question of if, but when. We know from our September 2022 article that the RTX 4090 only uses 88% of the streaming multiprocessors (SM) physically present on the 4 nm AD102 silicon (that's 128 out of 144 SM, or 16,384 out of 18,432 CUDA cores), although it maxes out its 384-bit GDDR6X memory bus.

The way NVIDIA carved the RTX 4090 out of the AD102 leaves it with plenty of room to create a faster SKU that maxes out the silicon, backing it with more GPU clock speed, possibly even 23 Gbps-rated GDDR6X memory, resulting in a top-spec flagship with ≥10% higher performance than the RTX 4090, to consolidate NVIDIA's position in the high-end segment—not that it's under much of a threat from AMD right now. The Radeon RX 7900 XTX trades blows with the RTX 4080, and is barely a threat to the RTX 4090. NVIDIA would still want something to sell at $2,000 if not more, and the only way it can do so is by maxing out the AD102 and hope that enthusiasts wanting to climb performance leaderboards would want such a card.

NVIDIA GeForce RTX 4090 16 GB Laptop SKU Spotted in Next-Gen HP Omen 17 Laptop

According to the well-known hardware leaker @momomo_us, HP is preparing the launch of its next-generation Omen 17 gaming laptops. And with a new generation of chips coming to consumers, HP accidentally made some information about laptop SKUs public. Four models are listed, and they represent a combination of Intel's 13th-generation Raptor Lake mobile processors with NVIDIA's Ada Lovelace RTX 40 series graphics cards for the mobile/laptop sector. The four SKUs are: CM2007NQ/CM2005NQ with Core i7-13700HX & RTX 4060 8 GB; CM2001NQ with Core i7-13700HX & RTX 4070 8 GB; CK2007NQ/CK2004NQ with Core i7-13700HX & RTX 4080 12 GB; CK2001NQ with Core i7-13700HX & RTX 4090 16 GB.

The most exciting find here is the appearance of the xx90 series in the mobile/laptop form factor, which has not been the case before. The GeForce RTX 4090 laptop edition is supposedly equipped with 16 GB of VRAM, and the GPU SKU should be a cut-down version of AD102 GPU adjusted for power and clock constraints so it can run within a reasonable TDP. With NVIDIA seemingly giving its clients an RTX 4090 SKU option, we have to wait and see what the CUDA core counts are and how clocks scale in a more restricted laptop environment.

NVIDIA Gives RTX A6000 "Ada" Professional Graphics a Quiet Launch, Starting $7377

NVIDIA is ready to launch its RTX A6000 series "Ada" professional-visualization graphics cards. These cards are targeted at the same market demographic as the NVIDIA Quadro series of the old—serious 3D content creation. The RTX A6000 leads the pack, and is based on the 4 nm "AD102" silicon (the same one powering the GeForce RTX 4090). The A6000 is better endowed than the RTX 4090 at the silicon-level, although operating at lower GPU clock-speeds, for its tighter 300 W power-limit (compared to 450 W of the RTX 4090).

The A6000 "Ada" is endowed with 18,176 CUDA cores across 142 SM, compared to the 16,384 CUDA cores across 128 SM of the RTX 4090. It also gets a higher number of Tensor cores, at 568. The defining differentiator between the A6000 and RTX 4090 has to be memory, with the pro-vis card getting 48 GB of ECC GDDR6 memory across the chip's 384-bit memory bus, clocked at 20 Gbps (960 GB/s memory bandwidth); compared to the 24 GB of 21 Gbps GDDR6X (1008 GB/s) of the RTX 4090. Also, the card enables all three NVDEC and NVENC video hardware-accelerators physically present on the AD102, for six independent accelerated transcoding streams.

AMD Explains the Economics Behind Chiplets for GPUs

AMD, in its technical presentation for the new Radeon RX 7900 series "Navi 31" GPU, gave us an elaborate explanation on why it had to take the chiplets route for high-end GPUs, devices that are far more complex than CPUs. The company also enlightened us on what sets chiplet-based packages apart from classic multi-chip modules (MCMs). An MCM is a package that consists of multiple independent devices sharing a fiberglass substrate.

An example of an MCM would be a mobile Intel Core processor, in which the CPU die and the PCH die share a substrate. Here, the CPU and the PCH are independent pieces of silicon that can otherwise exist on their own packages (as they do on the desktop platform), but have been paired together on a single substrate to minimize PCB footprint, which is precious on a mobile platform. A chiplet-based device is one where a substrate is made up of multiple dies that cannot otherwise independently exist on their own packages without an impact on inter-die bandwidth or latency. They are essentially what should have been components on a monolithic die, but disintegrated into separate dies built on different semiconductor foundry nodes, with a purely cost-driven motive.

NVIDIA GeForce RTX 4070 isn't a Rebadged RTX 4080 12GB, To Be Cut Down

It turns out that NVIDIA didn't just cancel (unlaunch) the GeForce RTX 4080 12 GB last week, but also shelved the SKU until it is needed in the product stack. This is probably because NVIDIA intended to sell it at $900, and will find it difficult to justify a xx70-class SKU at this price-point. A Moore's Law is Dead report goes into the possible reasons NVIDIA shelved the RTX 4080 12 GB, and why it won't be rebadged as the RTX 4070.

The RTX 4070, although expected to be based on the same AD104 silicon as the RTX 4080 12 GB, won't have the same configuration. The RTX 4080 12 GB maxed out the AD104, enabling all 7,680 CUDA cores on the silicon. It's likely that the RTX 4070 will have fewer CUDA cores, even if it retains the 192-bit memory interface and 12 GB memory size. The memory clock could be changed, too. The RTX 4080 12 GB was essentially NVIDIA trying to upsell the successor of the RTX 3070 Ti (maxed out GA104) as an xx80-class SKU, at a higher price-point. Moore's Law is Dead also showed off possible designs of the RTX 4070 Founders Edition, revealing a compact design with many of the same design improvements implemented with the RTX 4090 FE. This card comes in a strictly 2-slot design.

NVIDIA AD103 and AD104 Chips Powering RTX 4080 Series Detailed

Here's our first look at the "AD103" and "AD104" chips powering the GeForce RTX 4080 16 GB and RTX 4080 12 GB, respectively, thanks to Ryan Smith from Anandtech. These are the second- and third-largest implementations of the GeForce "Ada" graphics architecture, with the "AD102" powering the RTX 4090 being the largest. Both chips are built on the same TSMC 4N (4 nm EUV) silicon fabrication process as the AD102, but are significantly distant from it in specifications. For example, the AD102 has a staggering 80 percent more number-crunching machinery than the AD103, and a 50 percent wider memory interface. The sheer numbers at play here, enable NVIDIA to carve out dozens of SKUs based on the three chips alone, before we're shown the mid-range "AD106" in the future.

The AD103 die measures 378.6 mm², significantly smaller than the 608 mm² of the AD102, and it reflects in a much lower transistor count of 45.9 billion. The chip physically features 80 streaming multiprocessors (SM), which work out to 10,240 CUDA cores, 320 Tensor cores, 80 RT cores, and 320 TMUs. The chip is endowed with a healthy ROP count of 112, and has a 256-bit wide GDDR6X memory interface. The AD104 is smaller still, with a die-size of 294.5 mm², a transistor count of 35.8 billion, 60 SM, 7,680 CUDA cores, 240 Tensor cores, 60 RT cores, 240 TMUs, and 80 ROPs. Ryan Smith says that the RTX 4080 12 GB maxes out the AD104, which means its memory interface is physically just 192-bit wide.

NVIDIA Ada AD102 Block Diagram and New Architectural Features Detailed

At the heart of the GeForce RTX 4090 is the gigantic AD102 silicon, which we broadly detailed in an older article. Built on the 4 nm silicon fabrication process, this chip measures 608 mm² in die-area, and crams in 76.3 billion transistors. We now have our first look into the silicon-level block diagram of the AD102, including the introduction of several new components.

The AD102 features a PCI-Express 4.0 x16 host interface, and a 384-bit GDDR6X memory interface. The Gigathread Engine acts as a the main resource allocation component of the silicon. Ada introduces the Optical Flow Accelerator, a component crucial for DLSS 3 to generate entire frames without involving the graphics rendering machinery. The chip features double the number of media-encoding hardware engines as "Ampere," including hardware-accelerated AV1 encode/decode. Multiple accelerators mean that multiple streams of videos can be transcoded (helpful in a media production environment), or transcoding is performed at twice the FPS rate (each encoder takes turns at encoding a single frame).

NVIDIA Introduces L40 Omniverse Graphics Card

During its GTC 2022 session, NVIDIA introduced its new generation of gaming graphics cards based on the novel Ada Lovelace architecture. Dubbed NVIDIA GeForce RTX 40 series, it brings various updates like more CUDA cores, a new DLSS 3 version, 4th generation Tensor cores, 3rd generation Ray Tracing cores, and much more, which you can read about here. However, today, we also got a new Ada Lovelace card intended for the data center. Called the L40, NVIDIA updated its previous Ampere-based A40 design. While the NVIDIA website provides sparse, the new L40 GPU uses 48 GB GDDR6 memory with ECC error correction. Using NVLink, you can get 96GBs of VRAM. Paired with an unknown SKU, we assume that it uses AD102 with adjusted frequencies to lower the TDP and allow for passive cooling.

NVIDIA is calling this their Omniverse GPU, as it is a part of the push to separate its GPUs used for graphics and AI/HPC models. The "L" model in the current product stack is used to accelerate graphics, with display ports installed on the GPU, while the "H" models (H100) are there to accelerate HPC/AI installments where visual elements are a secondary task. This is a further separation of the entire GPU market, where the HPC/AI SKUs get their own architecture, and GPUs for graphics processing are built on a new architecture as well. You can see the specifications provided by NVIDIA below.

NVIDIA RTX 4090 Doesn't Max-Out AD102, Ample Room Left for Future RTX 4090 Ti

The AD102 silicon on which NVIDIA's new flagship graphics card, the GeForce RTX 4090, is based, is a marvel of semiconductor engineering. Built on the 4 nm EUV (TSMC 4N) silicon fabrication process, the chip has a gargantuan transistor-count of 76.3 billion, a nearly 170% increase over the previous GA102, and a die-size of 608 mm², which is in fact smaller than the 628 mm² die-area of the GA102. This is thanks to TSMC 4N offering nearly thrice the transistor-density of the Samsung 8LPP node on which the GA102 is based.

The AD102 physically features 18,432 CUDA cores, 568 fourth-generation Tensor cores, and 142 third-generation RT cores. The streaming multiprocessors (SM) come with special components that enable the Shader Execution Reordering optimization, which has a significant performance impact on both raster- and ray traced graphics rendering performance. The silicon supports up to 24 GB of GDDR6X or up to 48 GB of GDDR6+ECC memory (the latter will be seen in the RTX Ada professional-visualization card), across a 384-bit wide memory bus. There are 568 TMUs, and a mammoth 192 ROPs on the silicon.
Return to Keyword Browsing
May 1st, 2024 03:23 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts