Thursday, August 23rd 2018

NVIDIA "TU102" RT Core and Tensor Core Counts Revealed

Aug 23rd, 2018 09:42 Discuss (65 Comments)

The GeForce RTX 2080 Ti is indeed based on an ASIC codenamed "TU102." NVIDIA was referring to this 775 mm² chip when talking about the 18.5 billion-transistor count in its keynote. The company also provided a breakdown of its various "cores," and a block-diagram. The GPU is still laid out like its predecessors, but each of the 72 streaming multiprocessors (SMs) packs RT cores and Tensor cores in addition to CUDA cores.

The TU102 features six GPCs (graphics processing clusters), which each pack 12 SMs. Each SM packs 64 CUDA cores, 8 Tensor cores, and 1 RT core. Each GPC packs six geometry units. The GPU also packs 288 TMUs and 96 ROPs. The TU102 supports a 384-bit wide GDDR6 memory bus, supporting 14 Gbps memory. There are also two NVLink channels, which NVIDIA plans to later launch as its next-generation multi-GPU technology.

Source: VideoCardz

Add your own comment

65 Comments on NVIDIA "TU102" RT Core and Tensor Core Counts Revealed

#26

efikkan

Vayra863. Conclusion: take the Shadows of the Tomb Raider bar for a realistic performance scenario of 1080 vs 2080. Give or take 30-35%. In other words you're better off upgrading to a 1080ti.

You can find more hints and confirmations of a 30 odd percent jump when you compare clocks and shader counts between 1080 and 2080 as well.

Thank me later ;)

Nvidia is claiming ~50% performance gains, but that remains to be confirmed.
Comparing clocks and CUDA core count is useless when you know nothing about their performance.

Even if we're pessimistic, and Nvidia delivers a conservative 30% over the previous generation, that's still a significant upgrade. AMD could only dream of such improvements. Most people don't buy a new card every generation.

#27

cyneater

I wonder if nvidia is going to do an intel.

with there I740 sometime back in 98 96?

Craped on about it for ages then when it came out it sucked.

Also is the real time ray tracing some proprietary tech? If so.... hmm could be intresting if AMD plays catch up after all the Next gen console might has it...
or is this like physis it was pretty lame and about as exciting as a paper clip.

#28

Xzibit

cyneaterAlso is the real time ray tracing some proprietary tech? If so.... hmm could be intresting if AMD plays catch up after all the Next gen console might has it...
or is this like physis it was pretty lame and about as exciting as a paper clip.

No, its not proprietary. Microsoft introduced DXR (DirectX Raytracing) to DX12.

DXR will not introduce any new execution engines in the DX12 model – so the primary two engines remain the graphics (3D) and compute engines – and indeed Microsoft is treating DXR as a compute task, meaning it can be run on top of either engine.

Its DirectCompute accelerated

Right now very few games adopting
DX12

Microsoft just introduced DXR this year for Windows Insiders RS4 update. DXR SDK is still experimental. DXR for consumers is expected in the Windows 10 Autumn update
DX12+DXR

Nvidia introduced their RTX solution (Hardware+Software) for ray tracing
DX12+DXR+Nvidia RTX

#29

londiste

Considering that low ROP count was suspected to be the weak link on Titan V when it comes to gaming performance, that 96 ROPs does sound like too little. Even more so the 88 ROPs in 2080Ti. Depends somewhat on the clocks but still :(

cucker tarlsonIt should do async really well.Steve from gamers nexus said it's very asynchronous in its nature, and this graph confirms it. Only game that hits over 1.5x performance is one with async (wolfenstein, also disregard the two last ones with HDR, this is more like regained performance that pascal lost in HDR)

It has to be asynchronous in its nature, Tensor and RT cores run in a different queue :)
It's still not confirmed if Turing will do Rapid Packed Math. If it does, that would explain the boost in Wolfenstein.

#30

Vayra86

efikkanNvidia is claiming ~50% performance gains, but that remains to be confirmed.
Comparing clocks and CUDA core count is useless when you know nothing about their performance.

Even if we're pessimistic, and Nvidia delivers a conservative 30% over the previous generation, that's still a significant upgrade. AMD could only dream of such improvements. Most people don't buy a new card every generation.

Its really not an upgrade if you factor in the price. A 1080ti is cheaper for similar or better performance - at least that is my guess right now. It is purely the RTX nonsense that is supposed to win us over.

#31

Zyll Goliat

efikkanNvidia is claiming ~50% performance gains, but that remains to be confirmed.
Comparing clocks and CUDA core count is useless when you know nothing about their performance.

Even if we're pessimistic, and Nvidia delivers a conservative 30% over the previous generation, that's still a significant upgrade. AMD could only dream of such improvements. Most people don't buy a new card every generation.

Well....I guess we will see soon....here is some interesting comparison by UFD Tech with the FPS numbers Nvidia provide....seems like 1080TI=2080 in performance

#32

efikkan

Vayra86Its really not an upgrade if you factor in the price. A 1080ti is cheaper for similar or better performance - at least that is my guess right now. It is purely the RTX nonsense that is supposed to win us over.

Even with die space spent on RTX, 2080 is going to outperform GTX 1080 Ti at a lower TDP.

#33

Prima.Vera

I thought the 2070 should had been at 1080Ti levels....

#34

Vayra86

efikkanEven with die space spent on RTX, 2080 is going to outperform GTX 1080 Ti at a lower TDP.

I think its going to be marginal at best (~10%)

#35

efikkan

Vayra86I think its going to be marginal at best (~10%)

Based on which benchmarks?

#36

Vayra86

efikkanBased on which benchmarks?

Common sense. Get back to it when we have the reviews and we'll see

#37

efikkan

Vayra86Common sense. Get back to it when we have the reviews and we'll see

People are spreading way too much FUD about the Turing cards, all based on speculation on theoretical specs.

#38

Aquinus

Resident Wat-man

btarunrNVIDIA was referring to this 775 mm² chip when talking about the 18.5 billion-transistor count in its keynote.

That is a massive chip. Am I the only person thinking that maybe prices are really high not because AMD can't compete but because yields are terrible for a chip that big? If that's the size of the die, it's literally at least twice as big as an i9 7900X. Just let that sink in for a minute.

#39

Vayra86

efikkanPeople are spreading way too much FUD about the Turing cards, all based on speculation on theoretical specs.

I agree - surely you too can see that it's near impossible to see the jumps forward some people speak of...

I'm a bit more conservative with that. But the specs really aren't that theoretical and if you or anyone else really thinks its plausible for Nvidia to massively increase IPC without speaking highly of it in the keynote (which they haven't, it was RTX and DLSS front to back), well... its silly.

What we have is very very clear:
- overinflated and misleading Nvidia keynote statements on perf increases, placing Pascal next to Turing in tasks Pascal was never designed for
- higher TDP to cater for additional resources on die - RT and tensor cores
- A marginally smaller node 16>12nm
- Subtle changes to the shaders themselves
- Lower baseclocks and boosts
- Only slightly higher shader counts

There is just no way this is going to massively outperform Pascal per shader. If you then factor in the price and shader counts, its easy to come to conservative numbers. And its wishful thinking to see 50% perf increases - except if all you look at is the 2080ti and place it next to a 1080 non ti. But you can buy two 1080's at that price ;)

AquinusThat is a massive chip. Am I the only person thinking that maybe prices are really high not because AMD can't compete but because yields are terrible for a chip that big? If that's the size of the die, it's literally at least twice as big as an i9 7900X. Just let that sink in for a minute.

You're not the only one. But the real question should be: what's the point of such a big die just to cater for some extra fancy effects that are completely detrimental to overall performance for its primary use case... The hands on with a 2080ti is telling: minor effects cause massive performance hits. Nvidia can only do this because they own the market; so that brings us back to the no-AMD-competition statement but in a different way :)

Another massive hurdle is in fact AMD's dominance of the console space; when it comes to gaming, AMD really has more control of the market than Nvidia because Nv only hits the niche of high end PC gaming with RTX. That's not a big market and it means Nv has to pony up lots of money to get their tech adopted. There is no ecosystem for it. If you look at (semi) pro then its a different story: the Quadros I do understand, for non-realtime production work, RT is a step forward.

There has always been good reason in gaming to pre-cook and design everything and make it efficient. The only trade off is labor, and the more work you put in, the better it can perform. It can even be made dynamic for a long stretch with our current technology. RT is grossly inefficient in comparison and it only really shines if it's there without you really 'noticing' it. If you have to look for your RT effects, its just a gimmick. An expensive one.

#40

jabbadap

Vayra86I agree - surely you too can see that it's near impossible to see the jumps forward some people speak of...

I'm a bit more conservative with that. But the specs really aren't that theoretical and if you or anyone else really thinks its plausible for Nvidia to massively increase IPC without speaking highly of it in the keynote (which they haven't, it was RTX and DLSS front to back), well... its silly.

What we have is very very clear:
- overinflated and misleading Nvidia keynote statements on perf increases, placing Pascal next to Turing in tasks Pascal was never designed for
- higher TDP to cater for additional resources on die - RT and tensor cores
- A marginally smaller node 16>12nm
- Subtle changes to the shaders themselves
- Lower baseclocks and boosts
- Only slightly higher shader counts

There is just no way this is going to massively outperform Pascal per shader. If you then factor in the price and shader counts, its easy to come to conservative numbers. And its wishful thinking to see 50% perf increases - except if all you look at is the 2080ti and place it next to a 1080 non ti. But you can buy two 1080's at that price ;)

You're not the only one. But the real question should be: what's the point of such a big die just to cater for some extra fancy effects that are completely detrimental to overall performance for its primary use case... The hands on with a 2080ti is telling: minor effects cause massive performance hits. Nvidia can only do this because they own the market; so that brings us back to the no-AMD-competition statement but in a different way :)

Well shader performance might increase, they have done that before(See kepler vs maxwell clock to clock). So comparing fp32 flops between pascal and turing might be like comparing amd and nvidia fp32 flops and say more is better on gaming. And yeah I would not put much weight on marketed gpu clocks from nvidia either. a) Are they still using boost 3.0 or is that changed and b) marketed boost and maximum boost have been different, so clocks while gaming is usually higher than marketed boost clocks...

But yeah these Turings are very big chips, heck tu104 is much bigger than gp102. Maybe Turing was meant to be 7nm gpu in the first place and Volta without RT or tensors battling with high end 2018 radeons(like gv102 with ~4608cc and ~600mm² and gv104 with ~3072cc and ~400mm²). But without competition Nvidia went for Turing at 12nm and canned "little" Voltas. One thing is missing though, fp64 Turing I doubt with all those different cores turing has full fp64 compute.

#41

Crap Daddy

jabbadapWell shader performance might increase, they have done that before(See kepler vs maxwell clock to clock). So comparing fp32 flops between pascal and turing might be like comparing amd and nvidia fp32 flops and say more is better on gaming. And yeah I would not put much weight on marketed gpu clocks from nvidia either. a) Are they still using boost 3.0 or is that changed and b) marketed boost and maximum boost have been different, so clocks while gaming is usually higher than marketed boost clocks...

But yeah these Turings are very big chips, heck tu104 is much bigger than gp102. Maybe Turing was meant to be 7nm gpu in the first place and Volta without RT or tensors battling with high end 2018 radeons(like gv102 with ~4608cc and ~600mm² and gv104 with ~3072cc and ~400mm²). But without competition Nvidia went for Turing at 12nm and canned "little" Voltas. One thing is missing though, fp64 Turing I doubt with all those different cores turing has full fp64 compute.

Don't know if this has been posted yet:

#42

Vayra86

jabbadapWell shader performance might increase, they have done that before(See kepler vs maxwell clock to clock). So comparing fp32 flops between pascal and turing might be like comparing amd and nvidia fp32 flops and say more is better on gaming. And yeah I would not put much weight on marketed gpu clocks from nvidia either. a) Are they still using boost 3.0 or is that changed and b) marketed boost and maximum boost have been different, so clocks while gaming is usually higher than marketed boost clocks...

But yeah these Turings are very big chips, heck tu104 is much bigger than gp102. Maybe Turing was meant to be 7nm gpu in the first place and Volta without RT or tensors battling with high end 2018 radeons(like gv102 with ~4608cc and ~600mm² and gv104 with ~3072cc and ~400mm²). But without competition Nvidia went for Turing at 12nm and canned "little" Voltas. One thing is missing though, fp64 Turing I doubt with all those different cores turing has full fp64 compute.

You make some interesting points!

#43

jabbadap

Crap DaddyDon't know if this has been posted yet:

Well yeah it's been posted in so many places that I don't remember if it posted anywhere in tpu...

But anyhow that shader mambojambo reminded me about those advanced shadings on Turing. To my surprise nvidia removed things on their developer RTX platfrom site. There were four methods under Rasterization in the morning: Mesh shading, Variable Rate Shading, Texture base shading and Multi-view Rendering. Mesh shading is removed completely and all the details have removed. However they are still on the html code:

<h4>Mesh Shading</h4>
<p>Mesh shading is a work spawning geometry pipeline using the compute-shader programming model of cooperative threads and offers a powerful, flexible alternative to the traditional fixed multi-stage geometry pipeline. Compact meshes of triangles (meshlets) are output and passed to the Rasterizer. The increased flexibility combined with the compute-based cooperative thread programming model enables much faster geometry processing and more efficient culling. There is a particularly large benefit for games and applications dealing with high geometric complexity. The Mesh Shader is part of a continuing trend toward compute-based graphics processing.</p>
</div>
</div>

<BR>

<div class="row">
<div class="col-md-2">
<BR>
<img class="img-responsive" width="80%" src="/sites/default/files/akamai/RTX/images/advsh_vrs_1.png" />
</div>
<div class="col-md-10">
<h4>Variable Rate Shading (VRS)</h4>
<p>VRS gives the developer fine-grained control over pixel shading rate using three different techniques: Motion Adaptive Shading, Content Adaptive Shading, and Foveated Rendering. The developer can vary shading frequency between one shade per sixteen pixels and sixteen shades per one pixel. The application specifies shading rate using a combination of a shading-rate surface and a per-primitive (triangle) value. The controls allow the developer to lower shading rate in the presence of factors like fast motion (Motion Adaptive Shading), blur, lens distortion, foveation (Foveated Rendering), and content frequencies (Content Adaptive Shading). The combined shading savings deliver large performance gains.</p>
</div>
</div>

<BR>

<div class="row">
<div class="col-md-2">
<img class="img-responsive" width="80%" src="/sites/default/files/akamai/RTX/images/advsh_ts.png" />
</div>
<div class="col-md-10">
<h4>Texture-Space Shading</h4>
<p>In Turing, we included computational primitives critical to the construction of efficient texture space shading systems. Texture space shading uses an object’s texture parameterization as an alternative to the usual screen-space grid. The decoupling from the screen-space grid affords multiple benefits including dramatically reduced aliasing, fine-grained workload control, and substantial shade reuse.</p>
</div>
</div>

<BR>

<div class="row">
<div class="col-md-2">
<img class="img-responsive" width="80%" src="/sites/default/files/akamai/RTX/images/advsh_mvr.png" />
</div>
<div class="col-md-10">
<h4>Multi-View Rendering (MVR)</h4>
<p>MVR is a powerful extension of Pascal’s Single Pass Stereo. The GPU renders multiple completely independent views in a single pass with full hardware support for view-dependent attributes. Access is via a simple programming model where the compiler automatically factors out view independent code, while identifying view-dependent attributes for optimal execution.</p>
</div>
</div>

#44

Aquinus

Resident Wat-man

jabbadapWell yeah it's been posted in so many places that I don't remember if it posted anywhere in tpu...

But anyhow that shader mambojambo reminded me about those advanced shadings on Turing. To my surprise nvidia removed things on their developer RTX platfrom site. There were four methods under Rasterization in the morning: Mesh shading, Variable Rate Shading, Texture base shading and Multi-view Rendering. Mesh shading is removed completely and all the details have removed. However they are still on the html code:

The usage of bootstrap is cute. :laugh:

#45

Ruru

S.T.A.R.S.

Gimped card for over 1000 eur/dollars, that's great. I knew that memory bus was gimped because of memory amount (like in 1080 Ti), but shaders also...

#46

efikkan

Crap DaddyDon't know if this has been posted yet:

And this one:

People need to stop guessing Turing's performance based on Pascal figures.

I see the embargo date on these, is this also the review embargo?

#47

Xzibit

efikkanAnd this one:

People need to stop guessing Turing's performance based on Pascal figures.

I see the embargo date on these, is this also the review embargo?

Its the same Cache & Shader Mem Arc as Volta. Tech sites did Titan V game benchmarks

#48

Captain_Tom

natr0nThey are going to be milking varients of this chip for a few years im sure.

This generation is a quick cash grab just like the 700 series. They will get rid of these by December of next year when they launch a 6144-SP card on 7nm with 4x the raytracing cores.

#49

Vayra86

efikkanAnd this one:

People need to stop guessing Turing's performance based on Pascal figures.

I see the embargo date on these, is this also the review embargo?

Nope. Remember how Pascal lost some perf clock for clock to Maxwell? The best we will see is some of that returned. And we already now Titan V was mostly fastr due to extra shaders. And not a whole lot either.

The low hanging fruit is gone by now for CUDA. You can refer to Intel Core for an indicator of how IPC can grow.

#50

gamerman

if we took 'milkins' we must take head turn amd gpus.
they are,as all remembe release 3 times old gpu with old tech but new names. vega different version..
and looks amd release 4th of vega gpu,but it has build 7nm line... huh! its hould banned.

so forget nvidia 'milkins'..
nvidia this release is brand new turing and has as always again brand new gpu with new graphics option and and new connectors and it is 40-50% faster than old pascal.

little respect, if nvidia stop release gpus for desktop users we all must uselausy junk amd terrible power eat junkies, bcoz truth is that if we look vega power eats its slow gpu.
nvidia gtx 1000 series is much better choice.

last.

nvidia is factory and its build great gpus abut not doing it fun,its want property lke all business.
nvidia moust common csh coming indestry,more than 3/4 coming there so respect.

planning,building,testing cost alot cash billions!

so new gpus cost little more than amds junk,but when u buy nvidia gpu,you get high qualitymfast,excellent efficiency latest tech gpu.

its cost many,but when you buy nvidia gpu,you can trust you get next time same excellent gpu again,next is 2019.

p.s. im not nvidia worker or so, i just see things what they are, i use bfore amd product cpu and gpu..not evermore again. i think amd cheat clear customer... i remembe again that terrible hypeting and commercial from vega gpu...

and what you get, old lausy tech slow junk,overpriced gpu with 500W power eat. phyi!

Add your own comment

NVIDIA "TU102" RT Core and Tensor Core Counts Revealed

65 Comments on NVIDIA "TU102" RT Core and Tensor Core Counts Revealed

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts

NVIDIA "TU102" RT Core and Tensor Core Counts Revealed

Related News

65 Comments on NVIDIA "TU102" RT Core and Tensor Core Counts Revealed

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts