Wednesday, September 13th 2023
Nintendo Switch 2 to Feature NVIDIA Ampere GPU with DLSS
The rumors of Nintendo's next-generation Switch handheld gaming console have been piling up ever since the competition in the handheld console market got more intense. Since the release of the original Switch, Valve has released Steam Deck, ASUS made ROG Ally, and others are also exploring the market. However, the next-generation Nintendo Switch 2 is closer and closer, as we have information about the chipset that will power this device. Thanks to Kepler_L2 on Twitter/X, we have the codenames of the upcoming processors. The first generation Switch came with NVIDIA's Tegra X1 SoC built on a 20 nm node. However, later on, NVIDIA supplied Nintendo with a Tegra X1+ SoC made on a 16 nm node. There were no performance increases recorded, just improved power efficiency. Both of them used four Cortex-A57 and four Cortex-A53 cores with GM20B Maxwell GPUs.
For the Nintendo Switch 2, NVIDIA is said to utilize a customized variant of NVIDIA Jetson Orin SoC for automotive applications. The reference Orin SoC carries a codename T234, while this alleged adaptation has a T239 codename; the version is most likely optimized for power efficiency. The reference Orin design is a considerable uplift compared to the Tegra X1, as it boasts 12 Cortex-A78AE cores and LPDDR5 memory, along with Ampere GPU microarchitecture. Built on Samsung's 8 nm node, the efficiency would likely yield better battery life and position the second-generation Switch well among the now extended handheld gaming console market. However, including Ampere architecture would also bring technologies like DLSS, which would benefit the low-power SoC.
Sources:
@Kepler_L2, GitHub, via Tom's Hardware
For the Nintendo Switch 2, NVIDIA is said to utilize a customized variant of NVIDIA Jetson Orin SoC for automotive applications. The reference Orin SoC carries a codename T234, while this alleged adaptation has a T239 codename; the version is most likely optimized for power efficiency. The reference Orin design is a considerable uplift compared to the Tegra X1, as it boasts 12 Cortex-A78AE cores and LPDDR5 memory, along with Ampere GPU microarchitecture. Built on Samsung's 8 nm node, the efficiency would likely yield better battery life and position the second-generation Switch well among the now extended handheld gaming console market. However, including Ampere architecture would also bring technologies like DLSS, which would benefit the low-power SoC.
118 Comments on Nintendo Switch 2 to Feature NVIDIA Ampere GPU with DLSS
Also, lots of people game on their phones
This was probably designed years ago, so that's why they went with Ampere. A pity they didn't wait a year or two for Ada, but I guess the manufacturing cost will be extremely low on Samsung's node. Will that affect the price of the console, though?
Half a year ago people were daydreaming how next generation Switch2 will be 4nm Ada based and DLSS3 frame generation is gonna murder every other console and 4k120Hz docked mode will be a thing and it's gonna be ultra efficient.
Aaaaand nintendo cheaped out like they always do so now we ll get the awful samsung 8nm node lmao
You keep mentioning switch sales numbers as some sort of ultimate flex stat, but unless you re a bean counter at nintendo or a share holder how is that relevant for consumers at all? Mobile gaming is waaay more popular but im not even touching it with a fishing rod
I actually find 30fps on a small screen mostly ok, but those dips pull me out of it immediately :( & panning the camera looks awful at 30fps of course.
At this point it's not wise to quote specs from an existing part as if it were etched in concrete. There could easily be another unreleased part that this device will be based on. All of these rumors are based on vague and sketchy reports from early engineering devkit viewings in a whisper suite after signing a hefty NDA.
Without a doubt, Nintendo has hundreds of prototypes in their labs with a vast array of hardware component combinations, not to mention the software that runs on these test units. This is not specific to Nintendo, all of the consumer electronics companies do lots of prototyping. It's not like they picked the BOM five years ago.
Remember that an automotive SoC isn't really optimized for performance-per-watt considerations that a handheld gaming device favors.
I have a strong suspicion that it will be some sort of variant. With Nintendo having sold 125+ million Switches, Nvidia probably has some incentive to listen to Nintendo about the latter's wishes.
My guess is DLSS is a given; it might actually be mandatory for docked mode which would be 4K@60Hz.
DLSS and other super sampling techniques don't work so well starting from a low resolution image so rendering might be 1440p native in handheld. It may not be rendered at 1080p to be upscaled to 1440p for handheld operation.
- no availability of PS5 & xBox during the chips crisis, plus scalping
- horrible pricing (see point #1)
- console exclusives (most PS & xBox games you can get now for PC, Nintendo = zero)
- very unique games
- child/family friendly games
But I think the chips shortage did the most damage. If you look back (List of best-selling game consoles) the previous PS & xBox generations where doing way better.That's the sort of win nintendo need - 720p portable, 1080p docked, 4K via upscaling.
The first-party nintendo games (mario, zelda etc) would easily be possible for 4K60 and possibly 4k120 under 15 watts, with current mobile tech and DLSS/FSR - and unlike Sony and MS with their consoles, these first party games would have the scaling settings dialed in correctly long before launch to avoid any artifacting or issues.
720p rendering and 1080p output obviously saves battery life, and that's key for a mobile device.
Even a single 12 sm ampere gpu with 48 tensor cores would overtake the 36 cu's in a ps5 at 2.23 GHz..... for ML tasks.
But it still needs to render the scene with the cuda cores, which will only be 1/3rd a ps5's compute docked.
It still needs to map textures, and it's texture performance will only be 48 GB/s compared to ps5's 321 GB/s. You can't dlss textures into existance.
It will still need to render polygons, and it will only get 3 Gtri/s a second compared to ps5's 7.2 Gtri/s. You can't dlss polygons.
(Nvidia = 1 polymorph engine per TPC, 6 TPC's in a 12 sm GPC, for 6 Polymorph engines which get 0.5 Tri per clock: 6*0.5 * 1ghz =3.
My polymorph knowledge may be outdated, last I really got into it was polymorph engine 2.0, if anyone has an update for me send it my way)
Amd = 1 Geometry Engine per SE, ps5 gpu =3.6 SE. Geometry engine gets 1 tri per clock, 3.6 * 2.23ghz = 7.2)
Most casual gamers havent even heard of it, and I have yet to play a RT game myself.
The DLSS/FSR stuff is far more important.
It's a completely different Tegra with a completely different gpu and a completely different cpu.
T234 Orins ga10b gpu is an automotive tegra version of the a100 Tensor gpu.
Down to the 2 absolutely massive banks of extra Tensor cores in the DLA. That's why the Orins gpu looks like it has 4 GPC's instead of 2.... And no Ray trace cores.
It's also chock full of tons of automotive hardware thats been removed from the t239's initialization file.
T239 Drakes ga10f gpu is a Tegra version of desktop rtx ampere, ga102, 12 sm's a gpc. Thats 1 single GPC, with 12 sm's, for 1,536 cuda cores, 48 gen 3 tensor cores, 48 tmu's, and 12 gen 2 Ray trace cores.
Downclocked to 1 ghz thats:
3.072 Tflops fp32, or 1.5 tflops fp32 and 1.5 tops int32 on the cuda cores.
3.072 Tflops fp16, 6.144 Tops Int8, and 12.288 Tops Int4 non tensor ops.
24.576 Tflops fp16, 49.152 Tops Int 8, and 98.304 Tops int4, Tensor ops with sparsity.
If t239 had the double tensor cores like orin it would be
6 tflops fp16, 12 tops int 8, 24 tops int 4, non tensor ops, and:
48 Tflops fp16, 98 tops int8, and 196 Tops int4 tensor ops with sparsity. 4k120 is very clearly not possible.
Nvidia Nsight was used to bench the tensor cores running dlss at ultra performance on a 4090. That's a gpu that's 26.6 times more powerful than 1 gpc of ampere at 1Ghz.
It benched at between 100 to 200 Micro seconds to complete dlss. That's 0.2 Milliseconds. On a platform with tensor cores 26.6 times weaker, you're looking at:
5.32 Ms to perform dlss. Thats pretty dang fantastic for 30 fps, which have 33 ms to render a frame.
60 fps, thats maybe doable, with say a fast forward renderer, you have 16 Ms, to render a frame at 60fps.
Thats just not possible at 120fps, which only have 8 Ms to render a frame, which means you have just 3 ms. Even with dlss using concurrency, thats just not enough time to render anything worth using dlss on. Also i dont think the cpu would be able to keep up.
Maybe super minimalist indie games, but definitely not nintendos tentpole titles like mario and zelda.
As for upscaling, others hate those and would never think of them as important. But even there we get videos with 10 times magnification and 10 time slow down in frames where it is proven that the superiority of DLSS is so great that using anything else will ruin gaming. Of course fun part where the same thing is said now with DLSS 3.5 with ray reconstruction against plain DLSS 3. What was good yesterday (DLSS 3.0), it's sub par today because of the newer version.
It's not me that overplay anything. It's marketing departments, the press, youtubers, individuals who overplay those and many consumers will believe that, yeah, DLSS specifically and Raytracing is everything today.
Let's look at this In detail:
A videocore vii is an extention of the videocore vi, it has an extra slice with 4 more qpu, and it's clocked 300 mhz faster.
Videocore 7= 3 slices, 4 qpu a slice, quad cycle rate, @ 800 MHz.
3 slices * 4 qpu * 4 cycles * 2 (fmac) *800 Mhz = 76.8 gflops. Over twice the videocore vi in the pi 4, as advertised..... but this things not even a switch gpu man. Like it's not even close. This is the equivalent of like a 48 cuda core nvidia product. Switch had 256 cuda cores.
Pi5 has the cpu decent enough, but not the gpu. This does enable it to run emulators, it can even run a switch emulator, as it has a better cpu than switch, and emulators main issues have been cpu bound, not gpu bound. but you need to overclock it to get playable performance out of cube and wii games, which are nowhere near 4k, and cant even get 2x native on psp games, and any switch game that is not a retro 2d platformer, like say hollowknight, but an actual 3d game, like links awakening, is going to get like 2-4 fps. Switch runs ps4 games like gow (the pc port via linux) through 3 translation layers, better (about 9 fps), than pi runs switch, this is because the pi5 gpu is nowhere near a switch, a switch gpu docked is nearly 400 gflops.
Pi architecture reference pdf:
www.google.com/url?sa=t&source=web&rct=j&opi=89978449&url=https://www.cs.ucr.edu/~mchow009/teaching/cs193/spring2021/slides/Raspberry_Pi_QPU.pdf&ved=2ahUKEwjCveeVse6BAxXyFlkFHZDDD8oQFnoECCEQAQ&usg=AOvVaw3hkzzAkQsb9VwsYhX9n3VN
The x1 maxwell could run 4k60fps video out too. It wasnt hooked up in the switch (only 2 of 4 display port outs wired), and fully wired in switch oled but disabled, but 4k 60fps ai video upscaling was the most popular use for the shield tv.
Being able to output a 4k video signal is not the same thing as being able to render a high fidelity game at 4k.
This is goofy.
So, since actually running games at 4k 60fps is what any normal person would assume you mean, when you say it can run games '4k 60fps no sweat', especially when you responded to a post talking about render times/dlss time ratios. Why dont you, in detail, lay out your abnormal reasoning for what you meant when you said 'hell the pi 5 can do 4k 60 without breaking a sweat' Since only you know what this coded message means, so that I can actually respond to that secret coded message.