Wednesday, September 21st 2022
NVIDIA Ada's 4th Gen Tensor Core, 3rd Gen RT Core, and Latest CUDA Core at a Glance
Yesterday, NVIDIA launched its GeForce RTX 40-series, based on the "Ada" graphics architecture. We're yet to receive a technical briefing about the architecture itself, and the various hardware components that make up the silicon; but NVIDIA on its website gave us a first look at what's in store with the key number-crunching components of "Ada," namely the Ada CUDA core, 4th generation Tensor core, and 3rd generation RT core. Besides generational IPC and clock speed improvements, the latest CUDA core benefits from SER (shader execution reordering), an SM or GPC-level feature that reorders execution waves/threads to optimally load each CUDA core and improve parallelism.
Despite using specialized hardware such as the RT cores, the ray tracing pipeline still relies on CUDA cores and the CPU for a handful tasks, and here NVIDIA claims that SER contributes to a 3X ray tracing performance uplift (the performance contribution of CUDA cores). With traditional raster graphics, SER contributes a meaty 25% performance uplift. With Ada, NVIDIA is introducing its 4th generation of Tensor core (after Volta, Turing, and Ampere). The Tensor cores deployed on Ada are functionally identical to the ones on the Hopper H100 Tensor Core HPC processor, featuring the new FP8 Transformer Engine, which delivers up to 5X the AI inference performance over the previous generation Ampere Tensor Core (which itself delivered a similar leap by leveraging sparsity).The third-generation RT Core being introduced with Ada offers twice the ray-triangle intersection performance over the "Ampere" RT core, and introduces two new hardware components—Opacity Micromap (OMM) Engine, and Displaced Micro-Mesh (DMM) Engine. OMM accelerates alpha textures often used for elements such as foliage, particles, and fences; while the DMM accelerates BVH build times by a stunning 10X. DLSS 3 will be exclusive to Ada as it relies on the 4th Gen Tensor cores, and the Optical Flow Accelerator component on Ada GPUs, to deliver on the promise of drawing new frames purely using AI, without involving the main graphics rendering pipeline.
We'll give you a more detailed run-down of the Ada architecture as soon as we can.
Despite using specialized hardware such as the RT cores, the ray tracing pipeline still relies on CUDA cores and the CPU for a handful tasks, and here NVIDIA claims that SER contributes to a 3X ray tracing performance uplift (the performance contribution of CUDA cores). With traditional raster graphics, SER contributes a meaty 25% performance uplift. With Ada, NVIDIA is introducing its 4th generation of Tensor core (after Volta, Turing, and Ampere). The Tensor cores deployed on Ada are functionally identical to the ones on the Hopper H100 Tensor Core HPC processor, featuring the new FP8 Transformer Engine, which delivers up to 5X the AI inference performance over the previous generation Ampere Tensor Core (which itself delivered a similar leap by leveraging sparsity).The third-generation RT Core being introduced with Ada offers twice the ray-triangle intersection performance over the "Ampere" RT core, and introduces two new hardware components—Opacity Micromap (OMM) Engine, and Displaced Micro-Mesh (DMM) Engine. OMM accelerates alpha textures often used for elements such as foliage, particles, and fences; while the DMM accelerates BVH build times by a stunning 10X. DLSS 3 will be exclusive to Ada as it relies on the 4th Gen Tensor cores, and the Optical Flow Accelerator component on Ada GPUs, to deliver on the promise of drawing new frames purely using AI, without involving the main graphics rendering pipeline.
We'll give you a more detailed run-down of the Ada architecture as soon as we can.
22 Comments on NVIDIA Ada's 4th Gen Tensor Core, 3rd Gen RT Core, and Latest CUDA Core at a Glance
if it was true they could have made a simple demo to prove it... they didn't
The way to judge that is to wait for reviews and see where performance lands. (Yes, I'm not taking 25% at face value either. But even if inflated, it still points to some beefy generational improvements that @Vayra86 claimed were nowhere to be seen.)
saw this on reddit. So 20 and 30 series are not getting dlss 3.0, what kind of crap is this? is there any technical reason or are we being tricked again.
edit: This is the official answer, i don't get it. So a 4050 will be able to use it but a 3090ti wouldn't be able to, that doesn't seem right. The difference had to be insane.
edit 2: it's also confirmed the 4080 cut down version is not using the same die as the big brother 16gb
videocardz.com/newz/galax-confirms-ad102-300-ad103-300-and-ad104-400-gpus-for-geforce-rtx-4090-4080-series
Nvidia got to try sell 4 series somehow so making features 4 series exclusive an obvious move. That technical explanation likely to be BS.
Personally doubt it'll work. Not enough consumer care enough to pay the price.
Probably see Nvidia rip their guidance up in a few months.
Regarding the 12gb 4080, it's a 4070 with extra marketing shenanigans on top :nutkick:
Everyone will use it in the future not just Nvidia.
Turing launched in Q3 2018 and still 4 years later AMD haven't incorporated matrix processors in their designs (RDNA3 logically will have) nor their raytracing performance (msec needed) is near Turing's.
I don't know if DLSS 3.0 can be used in a meaningful way in Turing/Ampere cards but on this I tend to believe what Nvidia's saying.
If AMD RDNA3 matrix implementation is at a similar level as Turing/Ampere maybe there will be a hack like DLSS2.0/FSR 2.0 hacks and we can test performance and quality in a future FSR 3.0/4.0 technology, so we will know (who knows maybe there will be a hack earlier for Turing/Ampere also)
Hope someone can come up with a solution.
DLSS 3.0 is pumping the frame rate artificially (I know you didn't mean it like that), there is no rendering or CPU involved, it just takes information from previous frame history, motion vector etc and using the trained tensor cores, generating a make-up image lol. (I expect first gen related issues but eventually it will all pan out) I don't know exactly the implementation and how Nvidia combat incurred latency, so I will wait for the white paper, but if I understood it seems that use of DLSS 3.0 necessitate use of reflex also.
I expect some issues as this is first gen implementation, but eventually it will pan out.
Also regarding adoption it will be prebuilt in Unreal and Unity so in time we are going to see possibly more and more games using it.