• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Nintendo Switch 2 to Feature NVIDIA Ampere GPU with DLSS

Soupsammich

New Member
Joined
Nov 18, 2021
Messages
29 (0.03/day)
Sorry, in my haste to reply whilst trying to get my daughter out the house I originally missed that you did mention the power limitation so my post was framed all wrong & the questions to you weren't needed. We do generally agree on the rough performance, I was just using some numbers to explore it & came to the same conclusion as you.

Good thinking on the 3050 Max-Q, that's actually the closest thing we've got for comparison in the 30-series, I wonder if there's some 4k benchmarks of that.
I raise you the 2050 mobile. Surprise its ampere.


At its non boost clock of 735 mhz this thing basically is as close to the ga10f @ 1ghz as you can get. Just has way less ram capacity, but has the same bandwidth.

 
Last edited:

Mussels

Freshwater Moderator
Joined
Oct 6, 2004
Messages
58,413 (7.95/day)
Location
Oystralia
System Name Rainbow Sparkles (Power efficient, <350W gaming load)
Processor Ryzen R7 5800x3D (Undervolted, 4.45GHz all core)
Motherboard Asus x570-F (BIOS Modded)
Cooling Alphacool Apex UV - Alphacool Eisblock XPX Aurora + EK Quantum ARGB 3090 w/ active backplate
Memory 2x32GB DDR4 3600 Corsair Vengeance RGB @3866 C18-22-22-22-42 TRFC704 (1.4V Hynix MJR - SoC 1.15V)
Video Card(s) Galax RTX 3090 SG 24GB: Underclocked to 1700Mhz 0.750v (375W down to 250W))
Storage 2TB WD SN850 NVME + 1TB Sasmsung 970 Pro NVME + 1TB Intel 6000P NVME USB 3.2
Display(s) Phillips 32 32M1N5800A (4k144), LG 32" (4K60) | Gigabyte G32QC (2k165) | Phillips 328m6fjrmb (2K144)
Case Fractal Design R6
Audio Device(s) Logitech G560 | Corsair Void pro RGB |Blue Yeti mic
Power Supply Fractal Ion+ 2 860W (Platinum) (This thing is God-tier. Silent and TINY)
Mouse Logitech G Pro wireless + Steelseries Prisma XL
Keyboard Razer Huntsman TE ( Sexy white keycaps)
VR HMD Oculus Rift S + Quest 2
Software Windows 11 pro x64 (Yes, it's genuinely a good OS) OpenRGB - ditch the branded bloatware!
Benchmark Scores Nyooom.
At its non boost clock of 735 mhz this thing basically is as close to the ga10f @ 1ghz as you can get. Just has way less ram capacity, but has the same bandwidth.
You know that these have nothing in common with a device that's meant to use about 5W of power, right?

The VRAM alone on that GPU would use more power than the entire devices power budget
 
Joined
Apr 9, 2013
Messages
289 (0.07/day)
Location
Chippenham, UK
System Name Hulk
Processor 7800X3D
Motherboard Asus ROG Strix X670E-F Gaming Wi-Fi
Cooling Custom water
Memory 32GB 3600 CL18
Video Card(s) 4090
Display(s) LG 42C2 + Gigabyte Aorus FI32U 32" 4k 120Hz IPS
Case Corsair 750D
Power Supply beQuiet Dark Power Pro 1200W
Mouse SteelSeries Rival 700
Keyboard Logitech G815 GL-Tactile
VR HMD Quest 2
You know that these have nothing in common with a device that's meant to use about 5W of power, right?

The VRAM alone on that GPU would use more power than the entire devices power budget
Yes, we're trying to give evidence to show that 4k120 gaming on the Switch 2 is a ludicrous pipedream, & these are the closest thing to the Switch 2 that we have numbers for, & they clearly show they can't do even close to 4k60 let alone 4k120 even with a vastly higher power budget!

I raise you the 2050 mobile. Surprise its ampere.


At its non boost clock of 735 mhz this thing basically is as close to the ga10f @ 1ghz as you can get. Just has way less ram capacity, but has the same bandwidth.

I didn't realise they released an Ampere 20-series! I guess it must have been right at the end of the 20-series cycle. Good find.
 

Soupsammich

New Member
Joined
Nov 18, 2021
Messages
29 (0.03/day)
You know that these have nothing in common with a device that's meant to use about 5W of power, right?

The VRAM alone on that GPU would use more power than the entire devices power budget
Yup that's the idea. And no 4k 120 fps surprise surprise. Well probably not a surprise to most.

Yeah that gddr6 is probably taking the majority of that 30 watts.

Good thing there's lpddr.
 
Joined
Jul 5, 2013
Messages
27,705 (6.66/day)
I'm sorry if I came off as inflammatory, I assumed you had some counter examples in mind that help give some evidence to your stance & I was hoping throwing out one example that goes against your point would get you to give me a counter example.
It did come off that way a bit, but I'm used to people giving me flak.

My real point was this:The NVidia SOC Nintendo is reported to be using is very capible and while some visual effect will have to be scaled down, playable 4k30 or 4k60 is not outside the realm of possibility. For anyone to make the blanket statement that it's NOT possible needs to take a step back and look at the bigger picture for the simple reason that the Jetson Platform can already do so and the Nintendo SOC is going to be a customized and enhanced version of that.
 

Soupsammich

New Member
Joined
Nov 18, 2021
Messages
29 (0.03/day)
It did come off that way a bit, but I'm used to people giving me flak.

My real point was this:The NVidia SOC Nintendo is reported to be using is very capible and while some visual effect will have to be scaled down, playable 4k30 or 4k60 is not outside the realm of possibility. For anyone to make the blanket statement that it's NOT possible needs to take a step back and look at the bigger picture for the simple reason that the Jetson Platform can already do so and the Nintendo SOC is going to be a customized and enhanced version of that.

No one said 4k30 or even 4k60 (input 1080) was outside the realm of possibility.
 
Joined
Jul 5, 2013
Messages
27,705 (6.66/day)
Looking at RTX 3050 that is probably twice that Switch iGPU, with extra memory bandwidth and no limitations in how that bandwidth will be split between the GPU and the CPU part of the SOC and more importantly no power limitations that the Switch will have, it will be difficult, even with the advantage of games tailored for Switch 2's specific hardware and capabilities. Graphics will be low to mid settings at best and probably DLSS performance will be used at 4K. Of course some games will have simpler graphics and lower needs by design. Those will play nicely.
It's way too early to call or make any exact conclusions.
 

Soupsammich

New Member
Joined
Nov 18, 2021
Messages
29 (0.03/day)
It's way too early to call or make any exact conclusions.

It's really not, we've literally known the render config and tested clocks for over a year because of the lapsu$ attack.

We know the architecture, the number of cuda cores, the tensor cores, the tmu's the rops.

We know the bus width, we know the ram type. We know it's a unified memory architecture.

We have literally never known this much about a nintendo system so early ever.

There is literally precious little mystery for your appeal to the mysterious.
 

Mussels

Freshwater Moderator
Joined
Oct 6, 2004
Messages
58,413 (7.95/day)
Location
Oystralia
System Name Rainbow Sparkles (Power efficient, <350W gaming load)
Processor Ryzen R7 5800x3D (Undervolted, 4.45GHz all core)
Motherboard Asus x570-F (BIOS Modded)
Cooling Alphacool Apex UV - Alphacool Eisblock XPX Aurora + EK Quantum ARGB 3090 w/ active backplate
Memory 2x32GB DDR4 3600 Corsair Vengeance RGB @3866 C18-22-22-22-42 TRFC704 (1.4V Hynix MJR - SoC 1.15V)
Video Card(s) Galax RTX 3090 SG 24GB: Underclocked to 1700Mhz 0.750v (375W down to 250W))
Storage 2TB WD SN850 NVME + 1TB Sasmsung 970 Pro NVME + 1TB Intel 6000P NVME USB 3.2
Display(s) Phillips 32 32M1N5800A (4k144), LG 32" (4K60) | Gigabyte G32QC (2k165) | Phillips 328m6fjrmb (2K144)
Case Fractal Design R6
Audio Device(s) Logitech G560 | Corsair Void pro RGB |Blue Yeti mic
Power Supply Fractal Ion+ 2 860W (Platinum) (This thing is God-tier. Silent and TINY)
Mouse Logitech G Pro wireless + Steelseries Prisma XL
Keyboard Razer Huntsman TE ( Sexy white keycaps)
VR HMD Oculus Rift S + Quest 2
Software Windows 11 pro x64 (Yes, it's genuinely a good OS) OpenRGB - ditch the branded bloatware!
Benchmark Scores Nyooom.
Yes, we're trying to give evidence to show that 4k120 gaming on the Switch 2 is a ludicrous pipedream, & these are the closest thing to the Switch 2 that we have numbers for, & they clearly show they can't do even close to 4k60 let alone 4k120 even with a vastly higher power budget!


I didn't realise they released an Ampere 20-series! I guess it must have been right at the end of the 20-series cycle. Good find.
They'll be lucky to do 1080p 60 at native res with these wattages
Things are just getting absurd and derailed in here.
 
Joined
Feb 1, 2019
Messages
3,580 (1.69/day)
Location
UK, Midlands
System Name Main PC
Processor 13700k
Motherboard Asrock Z690 Steel Legend D4 - Bios 13.02
Cooling Noctua NH-D15S
Memory 32 Gig 3200CL14
Video Card(s) 4080 RTX SUPER FE 16G
Storage 1TB 980 PRO, 2TB SN850X, 2TB DC P4600, 1TB 860 EVO, 2x 3TB WD Red, 2x 4TB WD Red
Display(s) LG 27GL850
Case Fractal Define R4
Audio Device(s) Soundblaster AE-9
Power Supply Antec HCG 750 Gold
Software Windows 10 21H2 LTSC
I think we will see maybe 720p native resolution, DLSS upscaled to 1080p on Zelda games, and on top of that better draw distances, more assets on screen using saved cycles from DLSS. Less complex games should be able to run 1080p upscaled to 1440p.
 

Soupsammich

New Member
Joined
Nov 18, 2021
Messages
29 (0.03/day)
They'll be lucky to do 1080p 60 at native res with these wattages
Things are just getting absurd and derailed in here.
1080p is all you need for performance mode 4k.

*Edit* Somebody reminded me that I forgot ultra performance is no longer 8k only, so all you need for ultra performance 4k is 720p, and 360p for ultr perf to 1080 for portable.*

We know the clock speeds and the render config from the stolen nvn2 api. The wattage is a function of the lithography and feature set.

1536 ampere cuda cores downclocked to 1ghz (the nvn2 test docked clock was something like 1.125 GHz btw, ive been low balling) is 3.072 tflops for fp32, and 24.576 tflops sparse fp16 on the tensor cores for dlss.

It has a dual channel 128 bit bus for its lpddr5, for a standard 102 gbps.

It will be able to handle 1080p 60 native just fine if someone wants to target that, and the stated performance must be within the target tdp, or it would have never been taped out. And again, 1080p is all you need for a 4k input res.

One of the closed doors demos nintendo showed off at gamescom was botw, literally running at 4k60 fps with no loading times.

Nintendo probably wouldn't be showing this to people they want to make games for their system, if it wasn't something that was feasible to do.... or that they did not have the intent of doing themselves.

4k 120 is ridiculous, 4k 60 docked is going to happen on the system, just like 1080p 60fps happened on switch. Like switch It's not going to be the standard, and its mostly going to be switch and ps4/xbone ports, but it's going to happen.

I think we will see maybe 720p native resolution, DLSS upscaled to 1080p on Zelda games, and on top of that better draw distances, more assets on screen using saved cycles from DLSS. Less complex games should be able to run 1080p upscaled to 1440p.

Scene Complexity doesn't really directly matter for dlss, it's a fixed render time based on input resolution to output resolution, no matter how simple or complex the fidelity of the source inputs are.

Using quality or balanced mode, is likely something that will never happen on this system, it will almost undoubtedly always be performance.

If they were targeting 1440p, the input res would be 720p. As I stated earlier, this is my bet for the standard on the system.

If you have the render time to make a 1080p frame of your desired fidelity and frequency on the cuda cores, you can run dlss performance to 4k on the tensor cores. Concurrently. I really can't imagine anyone hitting 1080p native, and wanting to do 1440p quality instead of 4k performance on this thing.
 
Last edited:

Soupsammich

New Member
Joined
Nov 18, 2021
Messages
29 (0.03/day)
Had a little time, so decided to do a little referential power breakdown of the rtx 2050 mobile, and how that would be different in an actual mobile/hybrid device. I'm looking at docked power draw for the switch 2.

So let's break it down:
2048 Cuda cores, 25% more to power that the 1536 in the t239

4 GB gddr6, massive power hog compared to lpddr5 in t239.

Clock 1.245 GHz, 20% faster than my downclock estimate of 1 GHz for the t239.

There are other factors I am not accounting for as I have not found isolated/normalized power draws for them, like the aux power draw shown in the power breakdown article, i dont know how that applies to the lenovo laptop, so im leaving whatever watts they may be in, same for pcb loss, and the fact the 2050 has to power 2 GPC's and the IO crossbar between them, is more additional watt expenditure over the t239, but that won't be included. So whatever we end up with, is going to be higher than if I had all the data broken down.

So let's get rid of the ram first:

Starting point, 30 watts:

Gddr6x was measured to have a power draw of 2.5 watts per GB here:

However, the 2050 mobile does not appear to use gddr6x, but gddr6, gddr6x was shown by micron to have a 15% power efficiency over gddr6 here:


So 15% more than 2.5 is 2.875, 2.875 X 4Gb =11.5 watts drawn for the 4 Gb gddr6 ram.

Lpddr4, which was used in the switch, has been measured to have a power draw around 2 watts for a complete 2x 32 bit bus x3 gb capacity =6Gb unit for the familiar 25 GB/s bandwidth (switch had 2 32 bit x2 gb units for the same bus and bandwidth.) From the university of maryland Here:


Lpddr5 has been stated by samsung to be 30% more efficient than lpddr4x shown here:


which was stated to be 20% more power efficient over lpddr4 here:

So 70% of 2 watts = 1.4 watts, and 80% of that = 1.12 watts for a 2 unit lpddr5 block.

Gddr6 11.5 watts, - lpddr5 1.12 watts = 10.38 watts dropped just by switching to low power ram from gddr.

Now we are at 30 watts -10.38 watts, 19.62 watts.

Now let's get rid of the fan difference. The lenovo rtx 2050 being used in this example uses a 5 watt fan, sourced by searching replacement parts for it.

The switch uses a 3 watt fan, which shouldnt really need to change.

so 2 more watts down for 17.62 watts

Now we're down to the gpc/sm's cause I'm just leaving the watts from stuff I couldn't find for certain in.

So the 2050 mobile has 2058 ampere cuda cores, or 16 sm's, 25% more to power than t239.

17.62 × .75 = 13.215 watts.

And peak performance/power draw is calculated using its boost clock of 1.245, which is 20% more than 1 ghz so.

13.215 * .80 = 10.572 watts.

And that's of course, assuming T239 is still on samsung 8nm, which doesn't have to be the case.
 

Soupsammich

New Member
Joined
Nov 18, 2021
Messages
29 (0.03/day)
Hmm. Isn't this interesting...

I certainly hope you aren't trying to imply thats validating anything you've been trying to claim, because it brutally, mercilessly, slaughters your claims. 18 ms execution time for 4k dlss Just leaves your claims face down in a ditch. To be clear, your claim of 4k 60fps, or good god didnt you say120 at one point? That ALL needs to fit within 16.66 ms INCLUDING the time needed to do the 3d rendering. That video showed it took 18ms JUST for dlss ALONE. Do you understand yet? 4k 60fps is IMPOSSIBLE by that video the article you posted is talking about. Also why didn't you just post Riches video instead of some vulture trying to ride on his work?

Also It's not a coincidence rich used the exact performance specs I've been listing.

On the bright side, the one part of this experiment that was really off that rich admitted he could not mitigate was the severe vram bottleneck, which was starving the tensor cores in a way that won't happen on a device with 12 GB of Unified memory. So it won't be anywhere near 18 ms. But it still will be way out of the ballpark needed for your claims.
 
Last edited:
Joined
Dec 12, 2012
Messages
773 (0.18/day)
Location
Poland
System Name THU
Processor Intel Core i5-13600KF
Motherboard ASUS PRIME Z790-P D4
Cooling SilentiumPC Fortis 3 v2 + Arctic Cooling MX-2
Memory Crucial Ballistix 2x16 GB DDR4-3600 CL16 (dual rank)
Video Card(s) MSI GeForce RTX 4070 Ventus 3X OC 12 GB GDDR6X (2610/21000 @ 0.91 V)
Storage Lexar NM790 2 TB + Corsair MP510 960 GB + PNY XLR8 CS3030 500 GB + Toshiba E300 3 TB
Display(s) LG OLED C8 55" + ASUS VP229Q
Case Fractal Design Define R6
Audio Device(s) Yamaha RX-V381 + Monitor Audio Bronze 6 + Bronze FX | FiiO E10K-TC + Sony MDR-7506
Power Supply Corsair RM650
Mouse Logitech M705 Marathon
Keyboard Corsair K55 RGB PRO
Software Windows 10 Home
Benchmark Scores Benchmarks in 2024?
Interesting video from Digital Foundry showing a laptop with performance comparable to what is expected from Switch 2.


What is shown here is that DLSS has a very high cost with such a low power GPU. Reconstructing from 720p to 4K has a cost of over 18 ms in Death Stranding, and it lowers performance by about 50% compared to native 720p.
But the chip might feature a dedicated deep learning accelerator along tensor cores, which could help significantly reduce DLSS processing time.

It's all speculation, though. But even if 4K isn't viable, 720p performance (or 1080p with DLSS) looks really good. Even Cyberpunk is playable, and that's just a PC laptop, without any dedicated console optimization.


For me personally, what will make or break this console is backwards compatibility, for both physical and digital games.
 

Soupsammich

New Member
Joined
Nov 18, 2021
Messages
29 (0.03/day)
Interesting video from Digital Foundry showing a laptop with performance comparable to what is expected from Switch 2.


What is shown here is that DLSS has a very high cost with such a low power GPU. Reconstructing from 720p to 4K has a cost of over 18 ms in Death Stranding, and it lowers performance by about 50% compared to native 720p.
But the chip might feature a dedicated deep learning accelerator along tensor cores, which could help significantly reduce DLSS processing time.

It's all speculation, though. But even if 4K isn't viable, 720p performance (or 1080p with DLSS) looks really good. Even Cyberpunk is playable, and that's just a PC laptop, without any dedicated console optimization.


For me personally, what will make or break this console is backwards compatibility, for both physical and digital games.

Yup, it was an Interesting video, with some important caveats.

1. Orins/a100's double tensor cores/DLA is off the table.

A. You can already tell from the nvidia employees t239 initialization kernels vs the t234's it's been removed. This space is used for rt cores on rtx arches.

B. The... er... "Source" Richard was conversing with on this topic immediately confirmed it was a miscommunication after the video went live.

2. This was not because of the tensor cores, Tensor cores are massive overkill and are not the bottleneck. See the performance of the 2080ti in the dlss 3.5 programming guide. It only has 68 gen 1 tensor cores = 68/4 = equivalent of 17 gen 2 tensor cores, yet it beats the 3060ti and 3070 in dlss 4k execution time.

The bottlenecknwas the 4 GB of Vram capacity, (not bandwidth for once). This is actually demonstrated by rich in this video, in particular when he went in depth with death stranding, where the vram.was causing stuttering because it was constantly swapping in and out assets it couldn't hold.

Dlss to 4k requires 200 MB vram set aside to feed tensor core opmem for full performance, if it doesn't get it, your tensor cores stall. As clearly demonstrated, there was no way these tensor cores were getting that.
 
Last edited:

Soupsammich

New Member
Joined
Nov 18, 2021
Messages
29 (0.03/day)
Yup, it was an Interesting video, with some important caveats.

1. Orins/a100's double tensor cores/DLA is off the table.

A. You can already tell from the nvidia employees t239 initialization kernels vs the t234's it's been removed. This space is used for rt cores on rtx arches.

B. The... er... "Source" Richard was conversing with on this topic immediately confirmed it was a miscommunication after the video went live.

2. This was not because of the tensor cores, Tensor cores are massive overkill and are not the bottleneck. See the performance of the 2080ti in the dlss 3.5 programming guide. It only has 68 gen 1 tensor cores = 68/4 = equivalent of 17 gen 2 tensor cores, yet it beats the 3060ti and 3070 in dlss 4k execution time.

The bottlenecknwas the 4 GB of Vram capacity, (not bandwidth for once). This is actually demonstrated by rich in this video, in particular when he went in depth with death stranding, where the vram.was causing stuttering because it was constantly swapping in and out assets it couldn't hold.

Dlss to 4k requires 200 MB vram set aside to feed tensor core opmem for full performance, if it doesn't get it, your tensor cores stall. As clearly demonstrated, there was no way these tensor cores were getting that.

Correction to this, I copied from 1 row too high, which was rt cores instead of tensor cores, it has the equivalent to 136 tensor cores, to the 3070's 184.
 
Joined
Aug 10, 2020
Messages
313 (0.20/day)
These specs sound like a very reasonable generation upgrade for Switch 2. 5W TDP is great for a handheld, newer Tegra chip with A78 and Ampere cores all very reasonable. Would have been nice to see Lovelace cores but they were a generation behind last time too if I recall with Maxwell.

Here is to hoping a new Nvidia Shield TV Pro follows with this SoC too.
 
Top