Nintendo Switch 2 to Feature NVIDIA Ampere GPU with DLSS

Soupsammich · Oct 15, 2023

HOkay said:
Sorry, in my haste to reply whilst trying to get my daughter out the house I originally missed that you did mention the power limitation so my post was framed all wrong & the questions to you weren't needed. We do generally agree on the rough performance, I was just using some numbers to explore it & came to the same conclusion as you.

Good thinking on the 3050 Max-Q, that's actually the closest thing we've got for comparison in the 30-series, I wonder if there's some 4k benchmarks of that.

I raise you the 2050 mobile. Surprise its ampere.

NVIDIA GeForce RTX 2050 Mobile Specs

NVIDIA GA107, 1245 MHz, 2048 Cores, 64 TMUs, 32 ROPs, 4096 MB GDDR6, 1750 MHz, 64 bit

www.techpowerup.com

At its non boost clock of 735 mhz this thing basically is as close to the ga10f @ 1ghz as you can get. Just has way less ram capacity, but has the same bandwidth.

Mussels · Oct 15, 2023

Soupsammich said:
At its non boost clock of 735 mhz this thing basically is as close to the ga10f @ 1ghz as you can get. Just has way less ram capacity, but has the same bandwidth.

You know that these have nothing in common with a device that's meant to use about 5W of power, right?

The VRAM alone on that GPU would use more power than the entire devices power budget

HOkay · Oct 15, 2023

Mussels said:
You know that these have nothing in common with a device that's meant to use about 5W of power, right?

The VRAM alone on that GPU would use more power than the entire devices power budget

Yes, we're trying to give evidence to show that 4k120 gaming on the Switch 2 is a ludicrous pipedream, & these are the closest thing to the Switch 2 that we have numbers for, & they clearly show they can't do even close to 4k60 let alone 4k120 even with a vastly higher power budget!

Soupsammich said:
I raise you the 2050 mobile. Surprise its ampere.

NVIDIA GeForce RTX 2050 Mobile Specs

NVIDIA GA107, 1245 MHz, 2048 Cores, 64 TMUs, 32 ROPs, 4096 MB GDDR6, 1750 MHz, 64 bit

www.techpowerup.com

At its non boost clock of 735 mhz this thing basically is as close to the ga10f @ 1ghz as you can get. Just has way less ram capacity, but has the same bandwidth.

I didn't realise they released an Ampere 20-series! I guess it must have been right at the end of the 20-series cycle. Good find.

Soupsammich · Oct 15, 2023

Mussels said:
You know that these have nothing in common with a device that's meant to use about 5W of power, right?

The VRAM alone on that GPU would use more power than the entire devices power budget

Yup that's the idea. And no 4k 120 fps surprise surprise. Well probably not a surprise to most.

Yeah that gddr6 is probably taking the majority of that 30 watts.

Good thing there's lpddr.

lexluthermiester · Oct 16, 2023

HOkay said:
I'm sorry if I came off as inflammatory, I assumed you had some counter examples in mind that help give some evidence to your stance & I was hoping throwing out one example that goes against your point would get you to give me a counter example.

It did come off that way a bit, but I'm used to people giving me flak.

My real point was this:The NVidia SOC Nintendo is reported to be using is very capible and while some visual effect will have to be scaled down, playable 4k30 or 4k60 is not outside the realm of possibility. For anyone to make the blanket statement that it's NOT possible needs to take a step back and look at the bigger picture for the simple reason that the Jetson Platform can already do so and the Nintendo SOC is going to be a customized and enhanced version of that.

Soupsammich · Oct 16, 2023

lexluthermiester said:
It did come off that way a bit, but I'm used to people giving me flak.

My real point was this:The NVidia SOC Nintendo is reported to be using is very capible and while some visual effect will have to be scaled down, playable 4k30 or 4k60 is not outside the realm of possibility. For anyone to make the blanket statement that it's NOT possible needs to take a step back and look at the bigger picture for the simple reason that the Jetson Platform can already do so and the Nintendo SOC is going to be a customized and enhanced version of that.

No one said 4k30 or even 4k60 (input 1080) was outside the realm of possibility.

lexluthermiester · Oct 16, 2023

john_ said:
Looking at RTX 3050 that is probably twice that Switch iGPU, with extra memory bandwidth and no limitations in how that bandwidth will be split between the GPU and the CPU part of the SOC and more importantly no power limitations that the Switch will have, it will be difficult, even with the advantage of games tailored for Switch 2's specific hardware and capabilities. Graphics will be low to mid settings at best and probably DLSS performance will be used at 4K. Of course some games will have simpler graphics and lower needs by design. Those will play nicely.

It's way too early to call or make any exact conclusions.

Soupsammich · Oct 16, 2023

lexluthermiester said:
It's way too early to call or make any exact conclusions.

It's really not, we've literally known the render config and tested clocks for over a year because of the lapsu$ attack.

We know the architecture, the number of cuda cores, the tensor cores, the tmu's the rops.

We know the bus width, we know the ram type. We know it's a unified memory architecture.

We have literally never known this much about a nintendo system so early ever.

There is literally precious little mystery for your appeal to the mysterious.

Mussels · Oct 16, 2023

HOkay said:
Yes, we're trying to give evidence to show that 4k120 gaming on the Switch 2 is a ludicrous pipedream, & these are the closest thing to the Switch 2 that we have numbers for, & they clearly show they can't do even close to 4k60 let alone 4k120 even with a vastly higher power budget!

I didn't realise they released an Ampere 20-series! I guess it must have been right at the end of the 20-series cycle. Good find.

They'll be lucky to do 1080p 60 at native res with these wattages
Things are just getting absurd and derailed in here.

chrcoluk · Oct 16, 2023

I think we will see maybe 720p native resolution, DLSS upscaled to 1080p on Zelda games, and on top of that better draw distances, more assets on screen using saved cycles from DLSS. Less complex games should be able to run 1080p upscaled to 1440p.

Soupsammich · Oct 16, 2023

Mussels said:
They'll be lucky to do 1080p 60 at native res with these wattages
Things are just getting absurd and derailed in here.

1080p is all you need for performance mode 4k.

*Edit* Somebody reminded me that I forgot ultra performance is no longer 8k only, so all you need for ultra performance 4k is 720p, and 360p for ultr perf to 1080 for portable.*

We know the clock speeds and the render config from the stolen nvn2 api. The wattage is a function of the lithography and feature set.

1536 ampere cuda cores downclocked to 1ghz (the nvn2 test docked clock was something like 1.125 GHz btw, ive been low balling) is 3.072 tflops for fp32, and 24.576 tflops sparse fp16 on the tensor cores for dlss.

It has a dual channel 128 bit bus for its lpddr5, for a standard 102 gbps.

It will be able to handle 1080p 60 native just fine if someone wants to target that, and the stated performance must be within the target tdp, or it would have never been taped out. And again, 1080p is all you need for a 4k input res.

One of the closed doors demos nintendo showed off at gamescom was botw, literally running at 4k60 fps with no loading times.

Nintendo probably wouldn't be showing this to people they want to make games for their system, if it wasn't something that was feasible to do.... or that they did not have the intent of doing themselves.

4k 120 is ridiculous, 4k 60 docked is going to happen on the system, just like 1080p 60fps happened on switch. Like switch It's not going to be the standard, and its mostly going to be switch and ps4/xbone ports, but it's going to happen.

chrcoluk said:
I think we will see maybe 720p native resolution, DLSS upscaled to 1080p on Zelda games, and on top of that better draw distances, more assets on screen using saved cycles from DLSS. Less complex games should be able to run 1080p upscaled to 1440p.

Scene Complexity doesn't really directly matter for dlss, it's a fixed render time based on input resolution to output resolution, no matter how simple or complex the fidelity of the source inputs are.

Using quality or balanced mode, is likely something that will never happen on this system, it will almost undoubtedly always be performance.

If they were targeting 1440p, the input res would be 720p. As I stated earlier, this is my bet for the standard on the system.

If you have the render time to make a 1080p frame of your desired fidelity and frequency on the cuda cores, you can run dlss performance to 4k on the tensor cores. Concurrently. I really can't imagine anyone hitting 1080p native, and wanting to do 1440p quality instead of 4k performance on this thing.

Soupsammich · Oct 19, 2023

Had a little time, so decided to do a little referential power breakdown of the rtx 2050 mobile, and how that would be different in an actual mobile/hybrid device. I'm looking at docked power draw for the switch 2.

So let's break it down:
2048 Cuda cores, 25% more to power that the 1536 in the t239

4 GB gddr6, massive power hog compared to lpddr5 in t239.

Clock 1.245 GHz, 20% faster than my downclock estimate of 1 GHz for the t239.

There are other factors I am not accounting for as I have not found isolated/normalized power draws for them, like the aux power draw shown in the power breakdown article, i dont know how that applies to the lenovo laptop, so im leaving whatever watts they may be in, same for pcb loss, and the fact the 2050 has to power 2 GPC's and the IO crossbar between them, is more additional watt expenditure over the t239, but that won't be included. So whatever we end up with, is going to be higher than if I had all the data broken down.

So let's get rid of the ram first:

Starting point, 30 watts:

Gddr6x was measured to have a power draw of 2.5 watts per GB here:

350 watts for NVIDIA’s new top-of-the-line GeForce RTX “3090” Ampere model explained, chip area calculated and boards compared | igor´sLAB

Well, meanwhile there are several leaks of “pre-release” models of the upcoming GeForce RTX 3080, but I don’t really trust the roast published here, because I just assume design validation samples.

www.igorslab.de

However, the 2050 mobile does not appear to use gddr6x, but gddr6, gddr6x was shown by micron to have a 15% power efficiency over gddr6 here:

Micron Reveals GDDR6X Details: The Future of Memory, or a Proprietary DRAM?

Micron discusses graphics memory: GDDR6X is here, PAM4 & PAM8 for HBM patented

www.tomshardware.com

So 15% more than 2.5 is 2.875, 2.875 X 4Gb =11.5 watts drawn for the 4 Gb gddr6 ram.

Lpddr4, which was used in the switch, has been measured to have a power draw around 2 watts for a complete 2x 32 bit bus x3 gb capacity =6Gb unit for the familiar 25 GB/s bandwidth (switch had 2 32 bit x2 gb units for the same bus and bandwidth.) From the university of maryland Here:

https://www.google.com/url?sa=t&source=web&rct=j&opi=89978449&url=https://user.eng.umd.edu/~blj/papers/isca99.pdf&ved=2ahUKEwjmi7LApICCAxWtE1kFHeapDJoQFnoECA4QAQ&usg=AOvVaw19OhDE7wDcNfAAVybZSm-d

Lpddr5 has been stated by samsung to be 30% more efficient than lpddr4x shown here:

What is LPDDR5 RAM? List of LPDDR5 phones that you can buy now - Smartprix

We have now proceeded to the X iteration of the fifth generation of DRAM technology on phones – LPDDR5X. Apart from phones, LPDDR5X will also find applications in cloud computing, autonomous cars, and AR systems. Mobile computing has become increasingly demanding, and factors like industry...

www.google.com

which was stated to be 20% more power efficient over lpddr4 here:

LPDDR4 vs LPDDR4X - Whats the Difference? Specifications Comparison

Understand whats the Differences are between LPDDR4 vs LPDDR4X. Plus, specifications comparison between the two memory types. Click to know more!

www.google.com

So 70% of 2 watts = 1.4 watts, and 80% of that = 1.12 watts for a 2 unit lpddr5 block.

Gddr6 11.5 watts, - lpddr5 1.12 watts = 10.38 watts dropped just by switching to low power ram from gddr.

Now we are at 30 watts -10.38 watts, 19.62 watts.

Now let's get rid of the fan difference. The lenovo rtx 2050 being used in this example uses a 5 watt fan, sourced by searching replacement parts for it.

The switch uses a 3 watt fan, which shouldnt really need to change.

so 2 more watts down for 17.62 watts

Now we're down to the gpc/sm's cause I'm just leaving the watts from stuff I couldn't find for certain in.

So the 2050 mobile has 2058 ampere cuda cores, or 16 sm's, 25% more to power than t239.

17.62 × .75 = 13.215 watts.

And peak performance/power draw is calculated using its boost clock of 1.245, which is 20% more than 1 ghz so.

13.215 * .80 = 10.572 watts.

And that's of course, assuming T239 is still on samsung 8nm, which doesn't have to be the case.

lexluthermiester · Nov 6, 2023

Hypothetical Nintendo Switch 2 benchmarks show DLSS, ray-tracing, and even 4K performance

The Nintendo Switch 2 is set to feature NVIDIA hardware with DLSS support, so Digital Foundry found PC-equivelant specs and put it to the test.

www.tweaktown.com

Hmm. Isn't this interesting...

Soupsammich · Nov 6, 2023

lexluthermiester said:
Hypothetical Nintendo Switch 2 benchmarks show DLSS, ray-tracing, and even 4K performance

The Nintendo Switch 2 is set to feature NVIDIA hardware with DLSS support, so Digital Foundry found PC-equivelant specs and put it to the test.

www.tweaktown.com

Hmm. Isn't this interesting...

I certainly hope you aren't trying to imply thats validating anything you've been trying to claim, because it brutally, mercilessly, slaughters your claims. 18 ms execution time for 4k dlss Just leaves your claims face down in a ditch. To be clear, your claim of 4k 60fps, or good god didnt you say120 at one point? That ALL needs to fit within 16.66 ms INCLUDING the time needed to do the 3d rendering. That video showed it took 18ms JUST for dlss ALONE. Do you understand yet? 4k 60fps is IMPOSSIBLE by that video the article you posted is talking about. Also why didn't you just post Riches video instead of some vulture trying to ride on his work?

Also It's not a coincidence rich used the exact performance specs I've been listing.

On the bright side, the one part of this experiment that was really off that rich admitted he could not mitigate was the severe vram bottleneck, which was starving the tensor cores in a way that won't happen on a device with 12 GB of Unified memory. So it won't be anywhere near 18 ms. But it still will be way out of the ballpark needed for your claims.

THU31 · Nov 6, 2023

Interesting video from Digital Foundry showing a laptop with performance comparable to what is expected from Switch 2.

What is shown here is that DLSS has a very high cost with such a low power GPU. Reconstructing from 720p to 4K has a cost of over 18 ms in Death Stranding, and it lowers performance by about 50% compared to native 720p.
But the chip might feature a dedicated deep learning accelerator along tensor cores, which could help significantly reduce DLSS processing time.

It's all speculation, though. But even if 4K isn't viable, 720p performance (or 1080p with DLSS) looks really good. Even Cyberpunk is playable, and that's just a PC laptop, without any dedicated console optimization.

For me personally, what will make or break this console is backwards compatibility, for both physical and digital games.

lexluthermiester · Nov 6, 2023

THU31 said:
For me personally, what will make or break this console is backwards compatibility, for both physical and digital games.

With you there, though I only care about physical carts.

Soupsammich · Nov 6, 2023

THU31 said:
Interesting video from Digital Foundry showing a laptop with performance comparable to what is expected from Switch 2.

What is shown here is that DLSS has a very high cost with such a low power GPU. Reconstructing from 720p to 4K has a cost of over 18 ms in Death Stranding, and it lowers performance by about 50% compared to native 720p.
But the chip might feature a dedicated deep learning accelerator along tensor cores, which could help significantly reduce DLSS processing time.

It's all speculation, though. But even if 4K isn't viable, 720p performance (or 1080p with DLSS) looks really good. Even Cyberpunk is playable, and that's just a PC laptop, without any dedicated console optimization.

For me personally, what will make or break this console is backwards compatibility, for both physical and digital games.

Yup, it was an Interesting video, with some important caveats.

1. Orins/a100's double tensor cores/DLA is off the table.

A. You can already tell from the nvidia employees t239 initialization kernels vs the t234's it's been removed. This space is used for rt cores on rtx arches.

B. The... er... "Source" Richard was conversing with on this topic immediately confirmed it was a miscommunication after the video went live.

2. This was not because of the tensor cores, Tensor cores are massive overkill and are not the bottleneck. See the performance of the 2080ti in the dlss 3.5 programming guide. It only has 68 gen 1 tensor cores = 68/4 = equivalent of 17 gen 2 tensor cores, yet it beats the 3060ti and 3070 in dlss 4k execution time.

The bottlenecknwas the 4 GB of Vram capacity, (not bandwidth for once). This is actually demonstrated by rich in this video, in particular when he went in depth with death stranding, where the vram.was causing stuttering because it was constantly swapping in and out assets it couldn't hold.

Dlss to 4k requires 200 MB vram set aside to feed tensor core opmem for full performance, if it doesn't get it, your tensor cores stall. As clearly demonstrated, there was no way these tensor cores were getting that.

Soupsammich · Nov 8, 2023

Soupsammich said:
Yup, it was an Interesting video, with some important caveats.

1. Orins/a100's double tensor cores/DLA is off the table.

A. You can already tell from the nvidia employees t239 initialization kernels vs the t234's it's been removed. This space is used for rt cores on rtx arches.

B. The... er... "Source" Richard was conversing with on this topic immediately confirmed it was a miscommunication after the video went live.

2. This was not because of the tensor cores, Tensor cores are massive overkill and are not the bottleneck. See the performance of the 2080ti in the dlss 3.5 programming guide. It only has 68 gen 1 tensor cores = 68/4 = equivalent of 17 gen 2 tensor cores, yet it beats the 3060ti and 3070 in dlss 4k execution time.

The bottlenecknwas the 4 GB of Vram capacity, (not bandwidth for once). This is actually demonstrated by rich in this video, in particular when he went in depth with death stranding, where the vram.was causing stuttering because it was constantly swapping in and out assets it couldn't hold.

Dlss to 4k requires 200 MB vram set aside to feed tensor core opmem for full performance, if it doesn't get it, your tensor cores stall. As clearly demonstrated, there was no way these tensor cores were getting that.

Correction to this, I copied from 1 row too high, which was rt cores instead of tensor cores, it has the equivalent to 136 tensor cores, to the 3070's 184.

phints · Sep 24, 2024

These specs sound like a very reasonable generation upgrade for Switch 2. 5W TDP is great for a handheld, newer Tegra chip with A78 and Ampere cores all very reasonable. Would have been nice to see Lovelace cores but they were a generation behind last time too if I recall with Maxwell.

Here is to hoping a new Nvidia Shield TV Pro follows with this SoC too.

System Name	Rainbow Sparkles (Power efficient, <350W gaming load)
Processor	Ryzen R7 5800x3D (Undervolted, 4.45GHz all core)
Motherboard	Asus x570-F (BIOS Modded)
Cooling	Alphacool Apex UV - Alphacool Eisblock XPX Aurora + EK Quantum ARGB 3090 w/ active backplate
Memory	2x32GB DDR4 3600 Corsair Vengeance RGB @3866 C18-22-22-22-42 TRFC704 (1.4V Hynix MJR - SoC 1.15V)
Video Card(s)	Galax RTX 3090 SG 24GB: Underclocked to 1700Mhz 0.750v (375W down to 250W))
Storage	2TB WD SN850 NVME + 1TB Sasmsung 970 Pro NVME + 1TB Intel 6000P NVME USB 3.2
Display(s)	Phillips 32 32M1N5800A (4k144), LG 32" (4K60) \| Gigabyte G32QC (2k165) \| Phillips 328m6fjrmb (2K144)
Case	Fractal Design R6
Audio Device(s)	Logitech G560 \| Corsair Void pro RGB \|Blue Yeti mic
Power Supply	Fractal Ion+ 2 860W (Platinum) (This thing is God-tier. Silent and TINY)
Mouse	Logitech G Pro wireless + Steelseries Prisma XL
Keyboard	Razer Huntsman TE ( Sexy white keycaps)
VR HMD	Oculus Rift S + Quest 2
Software	Windows 11 pro x64 (Yes, it's genuinely a good OS) OpenRGB - ditch the branded bloatware!
Benchmark Scores	Nyooom.

System Name	Hulk
Processor	7800X3D
Motherboard	Asus ROG Strix X670E-F Gaming Wi-Fi
Cooling	Custom water
Memory	32GB 3600 CL18
Video Card(s)	4090
Display(s)	LG 42C2 + Gigabyte Aorus FI32U 32" 4k 120Hz IPS
Case	Corsair 750D
Power Supply	beQuiet Dark Power Pro 1200W
Mouse	SteelSeries Rival 700
Keyboard	Logitech G815 GL-Tactile
VR HMD	Quest 2

System Name	Rainbow Sparkles (Power efficient, <350W gaming load)
Processor	Ryzen R7 5800x3D (Undervolted, 4.45GHz all core)
Motherboard	Asus x570-F (BIOS Modded)
Cooling	Alphacool Apex UV - Alphacool Eisblock XPX Aurora + EK Quantum ARGB 3090 w/ active backplate
Memory	2x32GB DDR4 3600 Corsair Vengeance RGB @3866 C18-22-22-22-42 TRFC704 (1.4V Hynix MJR - SoC 1.15V)
Video Card(s)	Galax RTX 3090 SG 24GB: Underclocked to 1700Mhz 0.750v (375W down to 250W))
Storage	2TB WD SN850 NVME + 1TB Sasmsung 970 Pro NVME + 1TB Intel 6000P NVME USB 3.2
Display(s)	Phillips 32 32M1N5800A (4k144), LG 32" (4K60) \| Gigabyte G32QC (2k165) \| Phillips 328m6fjrmb (2K144)
Case	Fractal Design R6
Audio Device(s)	Logitech G560 \| Corsair Void pro RGB \|Blue Yeti mic
Power Supply	Fractal Ion+ 2 860W (Platinum) (This thing is God-tier. Silent and TINY)
Mouse	Logitech G Pro wireless + Steelseries Prisma XL
Keyboard	Razer Huntsman TE ( Sexy white keycaps)
VR HMD	Oculus Rift S + Quest 2
Software	Windows 11 pro x64 (Yes, it's genuinely a good OS) OpenRGB - ditch the branded bloatware!
Benchmark Scores	Nyooom.

System Name	Main PC
Processor	13700k
Motherboard	Asrock Z690 Steel Legend D4 - Bios 13.02
Cooling	Noctua NH-D15S
Memory	32 Gig 3200CL14
Video Card(s)	4080 RTX SUPER FE 16G
Storage	1TB 980 PRO, 2TB SN850X, 2TB DC P4600, 1TB 860 EVO, 2x 3TB WD Red, 2x 4TB WD Red
Display(s)	LG 27GL850
Case	Fractal Define R4
Audio Device(s)	Soundblaster AE-9
Power Supply	Antec HCG 750 Gold
Software	Windows 10 21H2 LTSC

System Name	THU
Processor	Intel Core i5-13600KF
Motherboard	ASUS PRIME Z790-P D4
Cooling	SilentiumPC Fortis 3 v2 + Arctic Cooling MX-2
Memory	Crucial Ballistix 2x16 GB DDR4-3600 CL16 (dual rank)
Video Card(s)	MSI GeForce RTX 4070 Ventus 3X OC 12 GB GDDR6X (2610/21000 @ 0.91 V)
Storage	Lexar NM790 2 TB + Corsair MP510 960 GB + PNY XLR8 CS3030 500 GB + Toshiba E300 3 TB
Display(s)	LG OLED C8 55" + ASUS VP229Q
Case	Fractal Design Define R6
Audio Device(s)	Yamaha RX-V381 + Monitor Audio Bronze 6 + Bronze FX \| FiiO E10K-TC + Sony MDR-7506
Power Supply	Corsair RM650
Mouse	Logitech M705 Marathon
Keyboard	Corsair K55 RGB PRO
Software	Windows 10 Home
Benchmark Scores	Benchmarks in 2024?

Nintendo Switch 2 to Feature NVIDIA Ampere GPU with DLSS

Soupsammich

New Member

NVIDIA GeForce RTX 2050 Mobile Specs

Mussels

Freshwater Moderator

HOkay

NVIDIA GeForce RTX 2050 Mobile Specs

Soupsammich

New Member

lexluthermiester

Soupsammich

New Member

lexluthermiester

Soupsammich

New Member

Mussels

Freshwater Moderator

chrcoluk

Soupsammich

New Member

Soupsammich

New Member

350 watts for NVIDIA’s new top-of-the-line GeForce RTX “3090” Ampere model explained, chip area calculated and boards compared | igor´sLAB

Micron Reveals GDDR6X Details: The Future of Memory, or a Proprietary DRAM?

What is LPDDR5 RAM? List of LPDDR5 phones that you can buy now - Smartprix

LPDDR4 vs LPDDR4X - Whats the Difference? Specifications Comparison

lexluthermiester

Hypothetical Nintendo Switch 2 benchmarks show DLSS, ray-tracing, and even 4K performance

Soupsammich

New Member

Hypothetical Nintendo Switch 2 benchmarks show DLSS, ray-tracing, and even 4K performance

THU31

lexluthermiester

Soupsammich

New Member

Soupsammich

New Member

phints