• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

NVIDIA Ada AD102 Block Diagram and New Architectural Features Detailed

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
47,299 (7.53/day)
Location
Hyderabad, India
System Name RBMK-1000
Processor AMD Ryzen 7 5700G
Motherboard ASUS ROG Strix B450-E Gaming
Cooling DeepCool Gammax L240 V2
Memory 2x 8GB G.Skill Sniper X
Video Card(s) Palit GeForce RTX 2080 SUPER GameRock
Storage Western Digital Black NVMe 512GB
Display(s) BenQ 1440p 60 Hz 27-inch
Case Corsair Carbide 100R
Audio Device(s) ASUS SupremeFX S1220A
Power Supply Cooler Master MWE Gold 650W
Mouse ASUS ROG Strix Impact
Keyboard Gamdias Hermes E2
Software Windows 11 Pro
At the heart of the GeForce RTX 4090 is the gigantic AD102 silicon, which we broadly detailed in an older article. Built on the 4 nm silicon fabrication process, this chip measures 608 mm² in die-area, and crams in 76.3 billion transistors. We now have our first look into the silicon-level block diagram of the AD102, including the introduction of several new components.

The AD102 features a PCI-Express 4.0 x16 host interface, and a 384-bit GDDR6X memory interface. The Gigathread Engine acts as a the main resource allocation component of the silicon. Ada introduces the Optical Flow Accelerator, a component crucial for DLSS 3 to generate entire frames without involving the graphics rendering machinery. The chip features double the number of media-encoding hardware engines as "Ampere," including hardware-accelerated AV1 encode/decode. Multiple accelerators mean that multiple streams of videos can be transcoded (helpful in a media production environment), or transcoding is performed at twice the FPS rate (each encoder takes turns at encoding a single frame).



The main graphics rendering components of the AD102 are the GPCs (graphics processing clusters). There are 12 of these, compared to 7 on the previous-generation GA102. Each GPC shares a raster engine and render backends with six TPCs (texture processing clusters). Each TPC packs two SMs (streaming multiprocessors), the indivisible number-crunching machinery of the NVIDIA GPU. The SM is where maximum architectural innovation is done by NVIDIA. Each SM packs a 3rd generation RT core, a 128 KB L1 cache, and four TMUs, among four clusters that each pack 16 FP32 CUDA cores, 16 concurrent FP32+INT32 CUDA cores, 4 load/store units, a tiny L0 cache with warp-scheduler and threat-dispatch; a register file, and the all-important 4th generation Tensor core.

Each SM hence packs a total of 128 CUDA cores, 4 Tensor cores, and an RT core. There are 12 SM per GPC, so 1,536 CUDA cores, 48 Tensor cores, and 12 RT cores; per GPC. Twelve GPCs hence add up to 18,432 CUDA cores, 576 Tensor cores, and 144 RT cores. Each GPC contributes 16 ROPs, so there are a mammoth 192 ROPs on the silicon. An L2 cache serves as town-square for the various GPCs, memory controllers, and the PCIe host interface, to exchange data. NVIDIA didn't mention the size of this L2 cache, but it is said to be significantly larger than the previous generation, and is playing a major role in lubricating the memory sub-system enough that NVIDIA can retain the same 21 Gbps @ 384-bit data-rate of the previous-generation.


NVIDIA is introducing shader execution reordering, (SER), a new technology that reorganizes math workloads to be relevant to each worker thread, so it is more efficiently processed by the SIMD components. This is expected to have a particularly big impact on rendering games with ray tracing. A GPU works best when the same operation can be executed on multiple targets. For example, when rendering a triangle, each pixel runs the same shader in parallel. With ray tracing, each ray at a time can execute a completely different piece of code, because it goes in a slightly different direction. With SER, the GPU will "sort" the operations, to create chunks of identical tasks and execute them in parallel. In Cyberpunk 2077 with its new Overdrive graphics preset that significantly dials up RT calculations per pixel, SER improves performance up to 44 percent. NVIDIA is developing Portal RTX, a mod for the original game with RTX effects added. Here, SER improves performance by 29 percent. It is also said to have a 20 percent performance impact on the Racer RTX interactive tech-demo we'll see this November. NVIDIA commented that there's various SER approaches and the best choice vary by-game, so they exposed the shader reordering functionality to game developers as an API, so they have control over how the sorting algorithm works, to best optimize their performance.


Displaced micro-mesh engine is a revolutionary feature introduced with the new 3rd generation RT core, which accelerates the displaced micro-mesh feature. Just as mesh shaders and tessellation have had a profound impact on improving performance with complex raster geometry, allowing game developers to significantly increase geometric complexity; DMMs is a method to reduce the complexity of the bounding-volume hierarchy (BVH) data-structure, which is used to determine where a ray hits geometry. Previously the BVH had to capture even the smallest details, to properly determine the intersection point.


The BVH now needn't have data for every single triangle on an object, but can represent objects with complex geometry as a coarse mesh of base triangles, which greatly simplifies the BVH data structure. A simpler BVH means less memory consumed and helps to greatly reduce ray tracing CPU load, because the CPU only has to generate a smaller structure. With older "Ampere" and "Turing" RT cores, each triangle on an object had to be sampled at high overhead, so the RT core could precisely calculate ray intersection for each triangle. With Ada, the simpler BVH, plus the displacement maps can be sent to the RT core, which is now able to figure out the exact hit point on its own. NVIDIA has seen 11:1 to 28:1 compression in total triangle counts. This reduces BVH compile speedups by 7.6x to over 15x, in comparison to the older RT core; and reducing its storage footprint by anywhere between 6.5 to 20 times. DMMs could reduce disk- and memory bandwidth utilization, utilization of the PCIe bus, as well as reduce CPU utilization. NVIDIA worked with Simplygon and Adobe to add DMM support for their tool chains.


Opacity Micro Meshes (OMM) is a new feature introduced with Ada to improve rasterization performance, particularly with objects that have alpha (transparency data). Most low-priority objects in a 3D scene, such as leaves on a tree, are essentially rectangles with textures on the leaves where the transparency (alpha) creates the shape of the leaf. RT cores have a hard time intersecting rays with such objects, because they're not really in the shape that they appear (they're really just rectangles with textures that give you the illusion of shape. Previous-generation RT cores had to have multiple interactions with the rendering stage to figure out the shape of a transparent object, because they couldn't test for alpha by themselves.


This has been solved by using OMMs. Just as DMMs simplify geometry by creating meshes of micro-triangles; OMMs create meshes of rectangular textures that align with parts of the texture that aren't alpha, so the RT core has a better understanding of the geometry of the object, and can correctly calculate ray intersections. This has a significant performance impact on shading performance in non-RT applications, too. Practical applications of OMMs aren't just low-priority objects such as vegetation, but also smoke-sprites and localized fog. Traditionally there was a lot of overdraw for such effects, because they layered multiple textures on top of each other, that all had to be fully processed by the shaders. Now only the non-opaque pixels get executed—OMMs provide a 30 percent speedup with graphics buffer fill-rates, and a 10 percent impact on frame-rates.


DLSS 3 introduces a revolutionary new feature that promises a doubling in frame-rate at comparable quality, it's called AI frame-generation. While it has all the features of DLSS 2 and its AI super-resolution (scaling up a lower-resolution frame to native resolution with minimal quality loss); DLSS 3 can generate entire frames simply using AI, without involving the graphics rendering pipeline. Every alternating frame with DLSS 3 is hence AI-generated, without being a replica of the previous rendered frame.


This is possible only on the Ada graphics architecture, because of a hardware component called optical flow accelerator (OFA), which assists in predicting what the next frame could look like, by creating what NVIDIA calls an optical flow-field. OFA ensures that the DLSS 3 algorithm isn't confused by static objects in a rapidly-changing 3D scene (such as a race sim). The process heavily relies on the performance uplift introduced by the FP8 math format of the 4th generation Tensor core.


A third key ingredient of DLSS 3 is Reflex. By reducing the rendering queue to zero, Reflex plays a vital role in ensuring the frame-times with DLSS 3 are at an acceptable level, and a render-queue doesn't confuse the upscaler. A combination of OFA and 4th gen Tensor core is why the Ada architecture is required to use DLSS 3, and why it won't work on older architectures.

View at TechPowerUp Main Site
 

dgianstefani

TPU Proofreader
Staff member
Joined
Dec 29, 2017
Messages
5,092 (2.00/day)
Location
Swansea, Wales
System Name Silent
Processor Ryzen 7800X3D @ 5.15ghz BCLK OC, TG AM5 High Performance Heatspreader
Motherboard ASUS ROG Strix X670E-I, chipset fans replaced with Noctua A14x25 G2
Cooling Optimus Block, HWLabs Copper 240/40 + 240/30, D5/Res, 4x Noctua A12x25, 1x A14G2, Mayhems Ultra Pure
Memory 32 GB Dominator Platinum 6150 MT 26-36-36-48, 56.6ns AIDA, 2050 FCLK, 160 ns tRFC, active cooled
Video Card(s) RTX 3080 Ti Founders Edition, Conductonaut Extreme, 18 W/mK MinusPad Extreme, Corsair XG7 Waterblock
Storage Intel Optane DC P1600X 118 GB, Samsung 990 Pro 2 TB
Display(s) 32" 240 Hz 1440p Samsung G7, 31.5" 165 Hz 1440p LG NanoIPS Ultragear, MX900 dual gas VESA mount
Case Sliger SM570 CNC Aluminium 13-Litre, 3D printed feet, custom front, LINKUP Ultra PCIe 4.0 x16 white
Audio Device(s) Audeze Maxwell Ultraviolet w/upgrade pads & LCD headband, Galaxy Buds 3 Pro, Razer Nommo Pro
Power Supply SF750 Plat, full transparent custom cables, Sentinel Pro 1500 Online Double Conversion UPS w/Noctua
Mouse Razer Viper V3 Pro 8 KHz Mercury White w/Tiger Ice Skates & Pulsar Supergrip tape, Razer Atlas
Keyboard Wooting 60HE+ module, TOFU-R CNC Alu/Brass, SS Prismcaps W+Jellykey, LekkerV2 mod, TLabs Leath/Suede
Software Windows 11 IoT Enterprise LTSC 24H2
Benchmark Scores Legendary
Interesting stuff. Cool to see RTRT really being refined so much, and all that dedicated tensor hardware being put to work.
 
Joined
Feb 1, 2013
Messages
1,270 (0.29/day)
System Name Gentoo64 /w Cold Coffee
Processor 9900K 5.2GHz @1.312v
Motherboard MXI APEX
Cooling Raystorm Pro + 1260mm Super Nova
Memory 2x16GB TridentZ 4000-14-14-28-2T @1.6v
Video Card(s) RTX 4090 LiquidX Barrow 3015MHz @1.1v
Storage 660P 1TB, 860 QVO 2TB
Display(s) LG C1 + Predator XB1 QHD
Case Open Benchtable V2
Audio Device(s) SB X-Fi
Power Supply MSI A1000G
Mouse G502
Keyboard G815
Software Gentoo/Windows 10
Benchmark Scores Always only ever very fast
Neat, Ada is a VGPU.
 
Joined
Jan 14, 2019
Messages
12,577 (5.80/day)
Location
Midlands, UK
System Name Nebulon B
Processor AMD Ryzen 7 7800X3D
Motherboard MSi PRO B650M-A WiFi
Cooling be quiet! Dark Rock 4
Memory 2x 24 GB Corsair Vengeance DDR5-4800
Video Card(s) AMD Radeon RX 6750 XT 12 GB
Storage 2 TB Corsair MP600 GS, 2 TB Corsair MP600 R2
Display(s) Dell S3422DWG, 7" Waveshare touchscreen
Case Kolink Citadel Mesh black
Audio Device(s) Logitech Z333 2.1 speakers, AKG Y50 headphones
Power Supply Seasonic Prime GX-750
Mouse Logitech MX Master 2S
Keyboard Logitech G413 SE
Software Bazzite (Fedora Linux) KDE
So the GPU is essentially Ampere (in other words: Turing Refresh 2), just bigger. How innovative! :sleep:

The technological innovations are cool, though. I hope at least the RT-related ones won't be Nvidia exclusive.
 
Joined
May 11, 2018
Messages
1,292 (0.53/day)
I see lots of proprietary new arhitecture that also demands the game to have these features to show it's potential. New and improved raytracing? It has to be supportet in game. New and improved DLSS 3.0? Also only for games that support it.

Will we see the push in reviews and benchmarks to include as much of these new games and architecture as possible, to really skew the results against the Ampere and AMD cards?

The way I see it, 2x and 3x claims from Nvidia keynote presentation are all achieved by using such architecture and game changes, not by actually be 2x, 3x faster...
 
D

Deleted member 185088

Guest
Lots of waste of silicon, instead of wasting silicon find a way to make DLSS work in all games, as for RT cores leave them for a Titan card and once we achieve full path tracing then bring it to the masses. The rich/fans...etc can buy the Titan to get the latest things, and the rest of us get fast relatively cheap GPUs.
 
Joined
Jan 14, 2019
Messages
12,577 (5.80/day)
Location
Midlands, UK
System Name Nebulon B
Processor AMD Ryzen 7 7800X3D
Motherboard MSi PRO B650M-A WiFi
Cooling be quiet! Dark Rock 4
Memory 2x 24 GB Corsair Vengeance DDR5-4800
Video Card(s) AMD Radeon RX 6750 XT 12 GB
Storage 2 TB Corsair MP600 GS, 2 TB Corsair MP600 R2
Display(s) Dell S3422DWG, 7" Waveshare touchscreen
Case Kolink Citadel Mesh black
Audio Device(s) Logitech Z333 2.1 speakers, AKG Y50 headphones
Power Supply Seasonic Prime GX-750
Mouse Logitech MX Master 2S
Keyboard Logitech G413 SE
Software Bazzite (Fedora Linux) KDE
Lots of waste of silicon, instead of wasting silicon find a way to make DLSS work in all games, as for RT cores leave them for a Titan card and once we achieve full path tracing then bring it to the masses. The rich/fans...etc can buy the Titan to get the latest things, and the rest of us get fast relatively cheap GPUs.
Alternatively, they could release another RT-free affordable GPU range like the Turing GTX 16-series.
 
Joined
Aug 3, 2022
Messages
133 (0.15/day)
Processor i7-7700k @5ghz
Motherboard Asus strix Z270-F
Cooling EK AIO 240mm
Memory Hyper-X ( 16 GB - XMP )
Video Card(s) RTX 2080 super OC
Storage 512GB - WD(Nvme) + 1TB WD SDD
Display(s) Acer Nitro 165Hz OC
Case Deepcool Mesh 55
Audio Device(s) Razer Karken X
Power Supply Asus TUF gaming 650W brozen
Mouse Razer Mamba Wireless & Glorious Model D Wireless
Keyboard Cooler Master K70
Software Win 10
Tricky launch and pretty interesting launch thb - Turing refresh ( a bigger version )
 
Joined
Sep 17, 2014
Messages
22,673 (6.05/day)
Location
The Washing Machine
System Name Tiny the White Yeti
Processor 7800X3D
Motherboard MSI MAG Mortar b650m wifi
Cooling CPU: Thermalright Peerless Assassin / Case: Phanteks T30-120 x3
Memory 32GB Corsair Vengeance 30CL6000
Video Card(s) ASRock RX7900XT Phantom Gaming
Storage Lexar NM790 4TB + Samsung 850 EVO 1TB + Samsung 980 1TB + Crucial BX100 250GB
Display(s) Gigabyte G34QWC (3440x1440)
Case Lian Li A3 mATX White
Audio Device(s) Harman Kardon AVR137 + 2.1
Power Supply EVGA Supernova G2 750W
Mouse Steelseries Aerox 5
Keyboard Lenovo Thinkpad Trackpoint II
VR HMD HD 420 - Green Edition ;)
Software W11 IoT Enterprise LTSC
Benchmark Scores Over 9000
@bug
"NVIDIA is introducing shader execution reordering, (SER), a new technology that reorganizes math workloads to be relevant to each worker thread, so it is more efficiently processed by the SIMD components. This is expected to have a particularly big impact on rendering games with ray tracing."

;)
Nvidia already picked all the low hanging fruit on raster
 

bug

Joined
May 22, 2015
Messages
13,843 (3.95/day)
Processor Intel i5-12600k
Motherboard Asus H670 TUF
Cooling Arctic Freezer 34
Memory 2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s) EVGA GTX 1060 SC
Storage 500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s) Dell U3219Q + HP ZR24w
Case Raijintek Thetis
Audio Device(s) Audioquest Dragonfly Red :D
Power Supply Seasonic 620W M12
Mouse Logitech G502 Proteus Core
Keyboard G.Skill KM780R
Software Arch Linux + Win10
@bug
"NVIDIA is introducing shader execution reordering, (SER), a new technology that reorganizes math workloads to be relevant to each worker thread, so it is more efficiently processed by the SIMD components. This is expected to have a particularly big impact on rendering games with ray tracing."

;)
Nvidia already picked all the low hanging fruit on raster
Everybody did, by now. And yet, it seems SER still benefits raster, even if RT can be more fitting for instruction reordering.

How about waiting for benchmarks before declaring IPC has/hasn't improved? Does that work for you?
 
Joined
Jul 10, 2015
Messages
754 (0.22/day)
Location
Sokovia
System Name Alienation from family
Processor i7 7700k
Motherboard Hero VIII
Cooling Macho revB
Memory 16gb Hyperx
Video Card(s) Asus 1080ti Strix OC
Storage 960evo 500gb
Display(s) AOC 4k
Case Define R2 XL
Power Supply Be f*ing Quiet 600W M Gold
Mouse NoName
Keyboard NoNameless HP
Software You have nothing on me
Benchmark Scores Personal record 100m sprint: 60m
Lots of waste of silicon, instead of wasting silicon find a way to make DLSS work in all games, as for RT cores leave them for a Titan card and once we achieve full path tracing then bring it to the masses. The rich/fans...etc can buy the Titan to get the latest things, and the rest of us get fast relatively cheap GPUs.
Its for sake of their ecosystem... AMD will be wasting sillicon for traditional rasterization, you have to wait till November 3rd or buy 6000 series, 6950xt is the fastest in TPU benchmarks.
 
Joined
Sep 17, 2014
Messages
22,673 (6.05/day)
Location
The Washing Machine
System Name Tiny the White Yeti
Processor 7800X3D
Motherboard MSI MAG Mortar b650m wifi
Cooling CPU: Thermalright Peerless Assassin / Case: Phanteks T30-120 x3
Memory 32GB Corsair Vengeance 30CL6000
Video Card(s) ASRock RX7900XT Phantom Gaming
Storage Lexar NM790 4TB + Samsung 850 EVO 1TB + Samsung 980 1TB + Crucial BX100 250GB
Display(s) Gigabyte G34QWC (3440x1440)
Case Lian Li A3 mATX White
Audio Device(s) Harman Kardon AVR137 + 2.1
Power Supply EVGA Supernova G2 750W
Mouse Steelseries Aerox 5
Keyboard Lenovo Thinkpad Trackpoint II
VR HMD HD 420 - Green Edition ;)
Software W11 IoT Enterprise LTSC
Benchmark Scores Over 9000
Everybody did, by now. And yet, it seems SER still benefits raster, even if RT can be more fitting for instruction reordering.

How about waiting for benchmarks before declaring IPC has/hasn't improved? Does that work for you?
Absolutely, just pointing out the fun that marketing is :)
 
  • Like
Reactions: bug
Joined
Dec 12, 2016
Messages
1,950 (0.66/day)
So its for sure 192 ROPs. Other news outlets continue to report 384. The number of pipelines have a huge impact on game performance.
 
Joined
Oct 6, 2021
Messages
1,605 (1.37/day)
The chips would be about 30% smaller without the dedicated Tensor cores and RT trinket.

Poll: Would you buy a RTX 4xxx, without RT, but 15-20% cheaper ?
 
Last edited:
Joined
Jul 15, 2020
Messages
1,021 (0.63/day)
System Name Dirt Sheep | Silent Sheep
Processor i5-2400 | 13900K (-0.02mV offset)
Motherboard Asus P8H67-M LE | Gigabyte AERO Z690-G, bios F29e Intel baseline
Cooling Scythe Katana Type 1 | Noctua NH-U12A chromax.black
Memory G-skill 2*8GB DDR3 | Corsair Vengeance 4*32GB DDR5 5200Mhz C40 @4000MHz
Video Card(s) Gigabyte 970GTX Mini | NV 1080TI FE (cap at 50%, 800mV)
Storage 2*SN850 1TB, 230S 4TB, 840EVO 128GB, WD green 2TB HDD, IronWolf 6TB, 2*HC550 18TB in RAID1
Display(s) LG 21` FHD W2261VP | Lenovo 27` 4K Qreator 27
Case Thermaltake V3 Black|Define 7 Solid, stock 3*14 fans+ 2*12 front&buttom+ out 1*8 (on expansion slot)
Audio Device(s) Beyerdynamic DT 990 (or the screen speakers when I'm too lazy)
Power Supply Enermax Pro82+ 525W | Corsair RM650x (2021)
Mouse Logitech Master 3
Keyboard Roccat Isku FX
VR HMD Nop.
Software WIN 10 | WIN 11
Benchmark Scores CB23 SC: i5-2400=641 | i9-13900k=2325-2281 MC: i5-2400=i9 13900k SC | i9-13900k=37240-35500
I see lots of proprietary new arhitecture that also demands the game to have these features to show it's potential. New and improved raytracing? It has to be supportet in game. New and improved DLSS 3.0? Also only for games that support it.

Will we see the push in reviews and benchmarks to include as much of these new games and architecture as possible, to really skew the results against the Ampere and AMD cards?

The way I see it, 2x and 3x claims from Nvidia keynote presentation are all achieved by using such architecture and game changes, not by actually be 2x, 3x faster...
Yep, NV go`s all-in on AI to do the real performance uplift from gen to gen at the same $$ level.
and as you said, all the RT-DLSS3 stuff is just a gimmick until every game will support it from day 1 (or be automatically backward compatible - which will not happen of course).
I just want every game with DLSS2-FSR2 with no RTX at all. I live happily with 'baked scenes' just as I know the every game is fictional.
 

bug

Joined
May 22, 2015
Messages
13,843 (3.95/day)
Processor Intel i5-12600k
Motherboard Asus H670 TUF
Cooling Arctic Freezer 34
Memory 2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s) EVGA GTX 1060 SC
Storage 500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s) Dell U3219Q + HP ZR24w
Case Raijintek Thetis
Audio Device(s) Audioquest Dragonfly Red :D
Power Supply Seasonic 620W M12
Mouse Logitech G502 Proteus Core
Keyboard G.Skill KM780R
Software Arch Linux + Win10
Absolutely, just pointing out the fun that marketing is :)
To me it's not fun, it's painful. I tend to read only the major announcements and take the very basic information from them. Too much b to waste my time on it.
 
Joined
Oct 27, 2020
Messages
797 (0.53/day)
So its for sure 192 ROPs. Other news outlets continue to report 384. The number of pipelines have a huge impact on game performance.
My original assumption was 192 due to memory bandwidth limitation, but when i saw news outlets 384 ROPs reporting, i thought that maybe it was a mixed solution like the 128 RB+ of RDNA2, with no improvement in 64b pixel color write/cycle and pixel color blend/cycle (Navi 10 can do also 64 despite having 64RB+)
 
Joined
Dec 12, 2012
Messages
777 (0.18/day)
Location
Poland
System Name THU
Processor Intel Core i5-13600KF
Motherboard ASUS PRIME Z790-P D4
Cooling SilentiumPC Fortis 3 v2 + Arctic Cooling MX-2
Memory Crucial Ballistix 2x16 GB DDR4-3600 CL16 (dual rank)
Video Card(s) MSI GeForce RTX 4070 Ventus 3X OC 12 GB GDDR6X (2610/21000 @ 0.91 V)
Storage Lexar NM790 2 TB + Corsair MP510 960 GB + PNY XLR8 CS3030 500 GB + Toshiba E300 3 TB
Display(s) LG OLED C8 55" + ASUS VP229Q
Case Fractal Design Define R6
Audio Device(s) Yamaha RX-V381 + Monitor Audio Bronze 6 + Bronze FX | FiiO E10K-TC + Sony MDR-7506
Power Supply Corsair RM650
Mouse Logitech M705 Marathon
Keyboard Corsair K55 RGB PRO
Software Windows 10 Home
Benchmark Scores Benchmarks in 2024?
The chips would be about 30% smaller without the dedicated Tensor cores and RT trinket.

Poll: Would you buy a RTX 4xxx, without RT, but 15-20% cheaper ?
I would not. The 4080-12 will be about 20% faster than the 3080 in rasterized games. If it was 20% cheaper, it would still be $720, which is a terrible price for 20% more performance. And I do not even need that performance, I can run all rasterized games almost maxed out.

I actually would want to pay just for more RT performance. And I absolutely do not want to give up tensor cores. DLSS is one of the best things ever invented for games.

GPUs have gotten too big when manufacturing them was still cheap. If high-end cards had stayed below 400 mm2, we would not be having this problem. Currently new processes are very expensive, and GPUs still have to be big to get any performance gains. I will just wait out this transition period until mid-range cards can offer a performance increase for me.
 
Joined
Oct 6, 2021
Messages
1,605 (1.37/day)
I would not. The 4080-12 will be about 20% faster than the 3080 in rasterized games. If it was 20% cheaper, it would still be $720, which is a terrible price for 20% more performance. And I do not even need that performance, I can run all rasterized games almost maxed out.

I actually would want to pay just for more RT performance. And I absolutely do not want to give up tensor cores. DLSS is one of the best things ever invented for games.

GPUs have gotten too big when manufacturing them was still cheap. If high-end cards had stayed below 400 mm2, we would not be having this problem. Currently new processes are very expensive, and GPUs still have to be big to get any performance gains. I will just wait out this transition period until mid-range cards can offer a performance increase for me.
AMD has already proven that you can get the same effect of DLSS via Software. I don't know what magic you see in RT, running CP2077 at 30fps with a 3090ti looks like a bad joke.

Realistic reflections do not make a game realistic when everything else is not.
 
Joined
Dec 12, 2012
Messages
777 (0.18/day)
Location
Poland
System Name THU
Processor Intel Core i5-13600KF
Motherboard ASUS PRIME Z790-P D4
Cooling SilentiumPC Fortis 3 v2 + Arctic Cooling MX-2
Memory Crucial Ballistix 2x16 GB DDR4-3600 CL16 (dual rank)
Video Card(s) MSI GeForce RTX 4070 Ventus 3X OC 12 GB GDDR6X (2610/21000 @ 0.91 V)
Storage Lexar NM790 2 TB + Corsair MP510 960 GB + PNY XLR8 CS3030 500 GB + Toshiba E300 3 TB
Display(s) LG OLED C8 55" + ASUS VP229Q
Case Fractal Design Define R6
Audio Device(s) Yamaha RX-V381 + Monitor Audio Bronze 6 + Bronze FX | FiiO E10K-TC + Sony MDR-7506
Power Supply Corsair RM650
Mouse Logitech M705 Marathon
Keyboard Corsair K55 RGB PRO
Software Windows 10 Home
Benchmark Scores Benchmarks in 2024?
Two games have impressed me with ray tracing - Metro: Exodus EE and Dying Light 2.
Why? Because they feature RTGI on top of other effects. And lighting is the primary factor affecting visual realism (and do not confuse realistic light behavior with artistic design, two completely different things).

But both games are difficult to run on a 3080 even with DLSS Performance when you turn RT on.
I usually do not bother with RT in games that only use it for shadows or reflections, unless I do not have to sacrifice anything to turn those on.

RTGI is incredible. Just look at Lumen in UE5, the Matrix demo for example.

The worst thing about devs implementing RTGI is that the non-RT lighting model suffers greatly. Without RTGI, both Metro and DL2 look much worse compared to beautiful games like Horizon Forbidden West or even Far Cry 6. The Matrix demo without Lumen also looked bad.
You can fake really good GI with rasterization, but they do not bother doing both if they can use RT. Crysis 3 had some impressive voxel-based GI, but that is difficult to do and has a high performance cost as well, and it is nowhere near as accurate as path tracing.

RTGI is very uncommon right now, because of performance reasons. Ada GPUs can change that, but if nobody can afford those cards, nobody will play with RTGI anyway.


As for DLSS, you could run it without tensor cores, but performance would be lower. Could that be balanced by adding more CUDA cores instead? Possibly. Did they lock it to tensor cores only to sell more RTX cards? Possibly, but they could have locked it to RTX cards even without tensor cores, so why do it this way? We will probably never know.
 
  • Like
Reactions: bug
Top