• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

NVIDIA Ada's 4th Gen Tensor Core, 3rd Gen RT Core, and Latest CUDA Core at a Glance

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
47,233 (7.55/day)
Location
Hyderabad, India
System Name RBMK-1000
Processor AMD Ryzen 7 5700G
Motherboard ASUS ROG Strix B450-E Gaming
Cooling DeepCool Gammax L240 V2
Memory 2x 8GB G.Skill Sniper X
Video Card(s) Palit GeForce RTX 2080 SUPER GameRock
Storage Western Digital Black NVMe 512GB
Display(s) BenQ 1440p 60 Hz 27-inch
Case Corsair Carbide 100R
Audio Device(s) ASUS SupremeFX S1220A
Power Supply Cooler Master MWE Gold 650W
Mouse ASUS ROG Strix Impact
Keyboard Gamdias Hermes E2
Software Windows 11 Pro
Yesterday, NVIDIA launched its GeForce RTX 40-series, based on the "Ada" graphics architecture. We're yet to receive a technical briefing about the architecture itself, and the various hardware components that make up the silicon; but NVIDIA on its website gave us a first look at what's in store with the key number-crunching components of "Ada," namely the Ada CUDA core, 4th generation Tensor core, and 3rd generation RT core. Besides generational IPC and clock speed improvements, the latest CUDA core benefits from SER (shader execution reordering), an SM or GPC-level feature that reorders execution waves/threads to optimally load each CUDA core and improve parallelism.

Despite using specialized hardware such as the RT cores, the ray tracing pipeline still relies on CUDA cores and the CPU for a handful tasks, and here NVIDIA claims that SER contributes to a 3X ray tracing performance uplift (the performance contribution of CUDA cores). With traditional raster graphics, SER contributes a meaty 25% performance uplift. With Ada, NVIDIA is introducing its 4th generation of Tensor core (after Volta, Turing, and Ampere). The Tensor cores deployed on Ada are functionally identical to the ones on the Hopper H100 Tensor Core HPC processor, featuring the new FP8 Transformer Engine, which delivers up to 5X the AI inference performance over the previous generation Ampere Tensor Core (which itself delivered a similar leap by leveraging sparsity).



The third-generation RT Core being introduced with Ada offers twice the ray-triangle intersection performance over the "Ampere" RT core, and introduces two new hardware components—Opacity Micromap (OMM) Engine, and Displaced Micro-Mesh (DMM) Engine. OMM accelerates alpha textures often used for elements such as foliage, particles, and fences; while the DMM accelerates BVH build times by a stunning 10X. DLSS 3 will be exclusive to Ada as it relies on the 4th Gen Tensor cores, and the Optical Flow Accelerator component on Ada GPUs, to deliver on the promise of drawing new frames purely using AI, without involving the main graphics rendering pipeline.

We'll give you a more detailed run-down of the Ada architecture as soon as we can.

View at TechPowerUp Main Site
 
Joined
Sep 17, 2014
Messages
22,439 (6.03/day)
Location
The Washing Machine
Processor 7800X3D
Motherboard MSI MAG Mortar b650m wifi
Cooling Thermalright Peerless Assassin
Memory 32GB Corsair Vengeance 30CL6000
Video Card(s) ASRock RX7900XT Phantom Gaming
Storage Lexar NM790 4TB + Samsung 850 EVO 1TB + Samsung 980 1TB + Crucial BX100 250GB
Display(s) Gigabyte G34QWC (3440x1440)
Case Lian Li A3 mATX White
Audio Device(s) Harman Kardon AVR137 + 2.1
Power Supply EVGA Supernova G2 750W
Mouse Steelseries Aerox 5
Keyboard Lenovo Thinkpad Trackpoint II
Software W11 IoT Enterprise LTSC
Benchmark Scores Over 9000
Yadayadaya 'we're going to push harder on our RT nonsense to hide the lacking generational performance increase'
 
Joined
May 31, 2016
Messages
4,437 (1.43/day)
Location
Currently Norway
System Name Bro2
Processor Ryzen 5800X
Motherboard Gigabyte X570 Aorus Elite
Cooling Corsair h115i pro rgb
Memory 32GB G.Skill Flare X 3200 CL14 @3800Mhz CL16
Video Card(s) Powercolor 6900 XT Red Devil 1.1v@2400Mhz
Storage M.2 Samsung 970 Evo Plus 500MB/ Samsung 860 Evo 1TB
Display(s) LG 27UD69 UHD / LG 27GN950
Case Fractal Design G
Audio Device(s) Realtec 5.1
Power Supply Seasonic 750W GOLD
Mouse Logitech G402
Keyboard Logitech slim
Software Windows 10 64 bit
Yadayadaya 'we're going to push harder on our RT nonsense to hide the lacking generational performance increase'
and excessive power consumption requirements.
 

bug

Joined
May 22, 2015
Messages
13,764 (3.96/day)
Processor Intel i5-12600k
Motherboard Asus H670 TUF
Cooling Arctic Freezer 34
Memory 2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s) EVGA GTX 1060 SC
Storage 500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s) Dell U3219Q + HP ZR24w
Case Raijintek Thetis
Audio Device(s) Audioquest Dragonfly Red :D
Power Supply Seasonic 620W M12
Mouse Logitech G502 Proteus Core
Keyboard G.Skill KM780R
Software Arch Linux + Win10
Yadayadaya 'we're going to push harder on our RT nonsense to hide the lacking generational performance increase'
With traditional raster graphics, SER contributes a meaty 25% performance uplift.
:wtf:
 
Joined
Sep 17, 2014
Messages
22,439 (6.03/day)
Location
The Washing Machine
Processor 7800X3D
Motherboard MSI MAG Mortar b650m wifi
Cooling Thermalright Peerless Assassin
Memory 32GB Corsair Vengeance 30CL6000
Video Card(s) ASRock RX7900XT Phantom Gaming
Storage Lexar NM790 4TB + Samsung 850 EVO 1TB + Samsung 980 1TB + Crucial BX100 250GB
Display(s) Gigabyte G34QWC (3440x1440)
Case Lian Li A3 mATX White
Audio Device(s) Harman Kardon AVR137 + 2.1
Power Supply EVGA Supernova G2 750W
Mouse Steelseries Aerox 5
Keyboard Lenovo Thinkpad Trackpoint II
Software W11 IoT Enterprise LTSC
Benchmark Scores Over 9000
Joined
Apr 17, 2021
Messages
564 (0.43/day)
System Name Jedi Survivor Gaming PC
Processor AMD Ryzen 7800X3D
Motherboard Asus TUF B650M Plus Wifi
Cooling ThermalRight CPU Cooler
Memory G.Skill 32GB DDR5-5600 CL28
Video Card(s) MSI RTX 3080 10GB
Storage 2TB Samsung 990 Pro SSD
Display(s) MSI 32" 4K OLED 240hz Monitor
Case Asus Prime AP201
Power Supply FSP 1000W Platinum PSU
Mouse Logitech G403
Keyboard Asus Mechanical Keyboard
Oh they gained 25% raster IPC you think? Per halved shader compared to the past? Interesting!

so they say, nvidia says a lot of things, most of which should be ignored

if it was true they could have made a simple demo to prove it... they didn't
 
Joined
Feb 23, 2016
Messages
135 (0.04/day)
System Name Computer!
Processor i7-6700K
Motherboard AsRock Z170 Extreme 7+
Cooling EKWB on CPU & GPU, 240 slim and 360 Monsta, Aquacomputer Aquabus D5, Aquaaero 6 Pro.
Memory 32Gb Kingston Hyper-X 3Ghz
Video Card(s) Asus 980 Ti Strix
Storage 2 x 950 Pro
Display(s) Old Acer thing
Case NZXT 440 Modded
Audio Device(s) onboard
Power Supply Seasonic PII 600W Platinum
Mouse Razer Deathadder Chroma
Keyboard Logitech G15
Software Win 10 Pro
Wow! Look at how many extra picture tiles they had to add to 3rd Gen Full Stack Inventions to make it bigger than the others! I am very impressed with 3rd Gen and the marketing department in general, much Wow /s
japan people GIF
 

bug

Joined
May 22, 2015
Messages
13,764 (3.96/day)
Processor Intel i5-12600k
Motherboard Asus H670 TUF
Cooling Arctic Freezer 34
Memory 2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s) EVGA GTX 1060 SC
Storage 500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s) Dell U3219Q + HP ZR24w
Case Raijintek Thetis
Audio Device(s) Audioquest Dragonfly Red :D
Power Supply Seasonic 620W M12
Mouse Logitech G502 Proteus Core
Keyboard G.Skill KM780R
Software Arch Linux + Win10
so they say, nvidia says a lot of things, most of which should be ignored

if it was true they could have made a simple demo to prove it... they didn't
How would you demo that? Build a special Ada GPU that didn't include SER? Because you can't control that from software any more than you can control micro-op reordering on an Intel or AMD CPU.
The way to judge that is to wait for reviews and see where performance lands. (Yes, I'm not taking 25% at face value either. But even if inflated, it still points to some beefy generational improvements that @Vayra86 claimed were nowhere to be seen.)
 
Joined
Aug 3, 2022
Messages
133 (0.16/day)
Processor i7-7700k @5ghz
Motherboard Asus strix Z270-F
Cooling EK AIO 240mm
Memory Hyper-X ( 16 GB - XMP )
Video Card(s) RTX 2080 super OC
Storage 512GB - WD(Nvme) + 1TB WD SDD
Display(s) Acer Nitro 165Hz OC
Case Deepcool Mesh 55
Audio Device(s) Razer Karken X
Power Supply Asus TUF gaming 650W brozen
Mouse Razer Mamba Wireless & Glorious Model D Wireless
Keyboard Cooler Master K70
Software Win 10
D

Deleted member 185088

Guest
More waste of silicon, why can't they just create gaming GPUs without all the nonsense, and professional ones or bring back Titan series with all these technologies.
 
Joined
Dec 12, 2016
Messages
1,840 (0.63/day)
There seems to be some ambiguity around the ROP count. Is there an official number yet?
 
Joined
May 17, 2021
Messages
3,005 (2.33/day)
Processor Ryzen 5 5700x
Motherboard B550 Elite
Cooling Thermalright Perless Assassin 120 SE
Memory 32GB Fury Beast DDR4 3200Mhz
Video Card(s) Gigabyte 3060 ti gaming oc pro
Storage Samsung 970 Evo 1TB, WD SN850x 1TB, plus some random HDDs
Display(s) LG 27gp850 1440p 165Hz 27''
Case Lian Li Lancool II performance
Power Supply MSI 750w
Mouse G502
szhvklbxi1p91.png


saw this on reddit. So 20 and 30 series are not getting dlss 3.0, what kind of crap is this? is there any technical reason or are we being tricked again.

im94428ll3p91.png


edit: This is the official answer, i don't get it. So a 4050 will be able to use it but a 3090ti wouldn't be able to, that doesn't seem right. The difference had to be insane.


edit 2: it's also confirmed the 4080 cut down version is not using the same die as the big brother 16gb
 
Last edited:
Joined
Jan 18, 2020
Messages
817 (0.46/day)
No surprise at all. There's mountains of 3 series stock still. Hence crazy 4 series prices.

Nvidia got to try sell 4 series somehow so making features 4 series exclusive an obvious move. That technical explanation likely to be BS.

Personally doubt it'll work. Not enough consumer care enough to pay the price.

Probably see Nvidia rip their guidance up in a few months.
 
Joined
Jun 18, 2021
Messages
2,547 (2.03/day)
Wow! Look at how many extra picture tiles they had to add to 3rd Gen Full Stack Inventions to make it bigger than the others! I am very impressed with 3rd Gen and the marketing department in general, much Wow /s
japan people GIF

Haha you're not wrong, though the "3rd gen full stack" (whatever the hell that means :confused: ) also includes everything from the previous generations so the extra pictures could be replaced with that, oh well whatever :D

View attachment 262475

saw this on reddit. So 20 and 30 series are not getting dlss 3.0, what kind of crap is this? is there any technical reason or are we being tricked again.

View attachment 262476

edit: This is the official answer, i don't get it. So a 4050 will be able to use it but a 3090ti wouldn't be able to, that doesn't seem right. The difference had to be insane.


edit 2: it's also confirmed the 4080 cut down version is not using the same die as the big brother 16gb

They want to force 4000 series sales but will only further contribute to the demise of DLSS. Why invest in optimizing for a technology that only a couple percent of the market can use when the competitors FSR and (soonTM) XeSS support almost the entire market!?

Regarding the 12gb 4080, it's a 4070 with extra marketing shenanigans on top :nutkick:
 
Joined
Oct 27, 2020
Messages
791 (0.53/day)
Frame interpolation is an essential AI based graphics feature, it isn't a gimmick.
Everyone will use it in the future not just Nvidia.
Turing launched in Q3 2018 and still 4 years later AMD haven't incorporated matrix processors in their designs (RDNA3 logically will have) nor their raytracing performance (msec needed) is near Turing's.
I don't know if DLSS 3.0 can be used in a meaningful way in Turing/Ampere cards but on this I tend to believe what Nvidia's saying.
If AMD RDNA3 matrix implementation is at a similar level as Turing/Ampere maybe there will be a hack like DLSS2.0/FSR 2.0 hacks and we can test performance and quality in a future FSR 3.0/4.0 technology, so we will know (who knows maybe there will be a hack earlier for Turing/Ampere also)
 
Joined
Nov 4, 2005
Messages
11,982 (1.72/day)
System Name Compy 386
Processor 7800X3D
Motherboard Asus
Cooling Air for now.....
Memory 64 GB DDR5 6400Mhz
Video Card(s) 7900XTX 310 Merc
Storage Samsung 990 2TB, 2 SP 2TB SSDs, 24TB Enterprise drives
Display(s) 55" Samsung 4K HDR
Audio Device(s) ATI HDMI
Mouse Logitech MX518
Keyboard Razer
Software A lot.
Benchmark Scores Its fast. Enough.
I’m interested to see what the mesh and opacity hardware can do, the biggest fail of RT is the lack of realistic lighting on objects that has been done on traditional hardware with ease for years now. Not everything is a perfect mirror and RT calculations are compute intensive enough that without dedicated hardware “prebaked goods” opacity/translucency it still doesn’t look real. How many more Ms does it add since it is still another step in the pipeline, or is it a shared resource that is stored and a lookup is all that is required?
 
Joined
Oct 28, 2012
Messages
1,190 (0.27/day)
Processor AMD Ryzen 3700x
Motherboard asus ROG Strix B-350I Gaming
Cooling Deepcool LS520 SE
Memory crucial ballistix 32Gb DDR4
Video Card(s) RTX 3070 FE
Storage WD sn550 1To/WD ssd sata 1To /WD black sn750 1To/Seagate 2To/WD book 4 To back-up
Display(s) LG GL850
Case Dan A4 H2O
Audio Device(s) sennheiser HD58X
Power Supply Corsair SF600
Mouse MX master 3
Keyboard Master Key Mx
Software win 11 pro
Yadayadaya 'we're going to push harder on our RT nonsense to hide the lacking generational performance increase'
Maybe the engineers have actually reached a block ? DLSS doesn't seem like a small R&D thing, and everyone ended up doing a similar tech (even Apple with metal FX)
 
Joined
Jun 18, 2017
Messages
117 (0.04/day)
Frame interpolation is an essential AI based graphics feature, it isn't a gimmick.
Everyone will use it in the future not just Nvidia.
Turing launched in Q3 2018 and still 4 years later AMD haven't incorporated matrix processors in their designs (RDNA3 logically will have) nor their raytracing performance (msec needed) is near Turing's.
I don't know if DLSS 3.0 can be used in a meaningful way in Turing/Ampere cards but on this I tend to believe what Nvidia's saying.
If AMD RDNA3 matrix implementation is at a similar level as Turing/Ampere maybe there will be a hack like DLSS2.0/FSR 2.0 hacks and we can test performance and quality in a future FSR 3.0/4.0 technology, so we will know (who knows maybe there will be a hack earlier for Turing/Ampere also)
I'm curious, but what's to stop Nvidia (or AMD and Intel with their future equivalents) from just pumping up the frame rate numbers artificially? The 4090 is up to 4 times faster apparently with DLSS 3.
 
Joined
Dec 14, 2011
Messages
1,035 (0.22/day)
Location
South-Africa
Processor AMD Ryzen 9 5900X
Motherboard ASUS ROG STRIX B550-F GAMING (WI-FI)
Cooling Corsair iCUE H115i Elite Capellix 280mm
Memory 32GB G.Skill DDR4 3600Mhz CL18
Video Card(s) ASUS GTX 1650 TUF
Storage Sabrent Rocket 1TB M.2
Display(s) Dell S3220DGF
Case Corsair iCUE 4000X
Audio Device(s) ASUS Xonar D2X
Power Supply Corsair AX760 Platinum
Mouse Razer DeathAdder V2 - Wireless
Keyboard Redragon K618 RGB PRO
Software Microsoft Windows 11 - Enterprise (64-bit)
View attachment 262475

saw this on reddit. So 20 and 30 series are not getting dlss 3.0, what kind of crap is this? is there any technical reason or are we being tricked again.

View attachment 262476

edit: This is the official answer, i don't get it. So a 4050 will be able to use it but a 3090ti wouldn't be able to, that doesn't seem right. The difference had to be insane.


edit 2: it's also confirmed the 4080 cut down version is not using the same die as the big brother 16gb

Suffice it to say, I don't believe a word of it. They have lied so many times before, take G-Sync as an example, lol... <insert french voice> *a few drivers later....*

Hope someone can come up with a solution.
 
Joined
Jun 18, 2021
Messages
2,547 (2.03/day)
Suffice it to say, I don't believe a word of it. They have lied so many times before, take G-Sync as an example, lol... <insert french voice> *a few drivers later....*

Hope someone can come up with a solution.

Their argument seems to be that it will run slower than on newer cards, so an old card is running new games with new technologies slower, what else is new!? :D
 
Joined
Dec 14, 2011
Messages
115 (0.02/day)
Frame interpolation is an essential AI based graphics feature, it isn't a gimmick.
Everyone will use it in the future not just Nvidia.
Turing launched in Q3 2018 and still 4 years later AMD haven't incorporated matrix processors in their designs (RDNA3 logically will have) nor their raytracing performance (msec needed) is near Turing's.
I don't know if DLSS 3.0 can be used in a meaningful way in Turing/Ampere cards but on this I tend to believe what Nvidia's saying.
If AMD RDNA3 matrix implementation is at a similar level as Turing/Ampere maybe there will be a hack like DLSS2.0/FSR 2.0 hacks and we can test performance and quality in a future FSR 3.0/4.0 technology, so we will know (who knows maybe there will be a hack earlier for Turing/Ampere also)
If it works how I think it does then it is just an useless gimmick. I think it needs at least two frames worth of buffer to generate fake frames - you need to generate fake frames from something and you will get less artifacts this way, because it is realtively easy to generate intermediate frames between known frames as compared to prediction how future frame will look like when you only have one frame to work with. You will have 300 interpolated fps but input latency of 30 fps (assuming that is what gpu can do natively without fps interpolation) because additional frames are generated outside of game engine and thus outside input and world state update loop. Incerase of performance comes not only from frames but also from reduced latency. Nvidia only gives you increased frames without lower latency. Would love to be wrong about this of course.
 
Joined
Oct 27, 2020
Messages
791 (0.53/day)
I'm curious, but what's to stop Nvidia (or AMD and Intel with their future equivalents) from just pumping up the frame rate numbers artificially? The 4090 is up to 4 times faster apparently with DLSS 3.
latency increase, independent testing regarding image quality/stability for example.
DLSS 3.0 is pumping the frame rate artificially (I know you didn't mean it like that), there is no rendering or CPU involved, it just takes information from previous frame history, motion vector etc and using the trained tensor cores, generating a make-up image lol. (I expect first gen related issues but eventually it will all pan out)

If it works how I think it does then it is just an useless gimmick. I think it needs at least two frames worth of buffer to generate fake frames - you need to generate fake frames from something and you will get less artifacts this way, because it is realtively easy to generate intermediate frames between known frames as compared to prediction how future frame will look like when you only have one frame to work with. You will have 300 interpolated fps but input latency of 30 fps (assuming that is what gpu can do natively without fps interpolation) because additional frames are generated outside of game engine and thus outside input and world state update loop. Incerase of performance comes not only from frames but also from reduced latency. Nvidia only gives you increased frames without lower latency. Would love to be wrong about this of course.
I don't know exactly the implementation and how Nvidia combat incurred latency, so I will wait for the white paper, but if I understood it seems that use of DLSS 3.0 necessitate use of reflex also.
I expect some issues as this is first gen implementation, but eventually it will pan out.
Also regarding adoption it will be prebuilt in Unreal and Unity so in time we are going to see possibly more and more games using it.
 
Last edited:
Top