Tuesday, May 9th 2023

NVIDIA GeForce RTX 4060 Ti Available as 8 GB and 16 GB, This Month. RTX 4060 in July

In what could explain the greater attention by leaky taps on the GeForce RTX 4060 Ti compared to its sibling, the RTX 4060, NVIDIA is preparing a staggered launch for its RTX 4060-series. We're also learning that there are as many as three SKUs in the series—the RTX 4060 Ti 8 GB, the RTX 4060 Ti 16 GB, and the RTX 4060. All three will be announced later this month, however, only the RTX 4060 Ti 8 GB will be available to purchase at the time. The RTX 4060 Ti 16 GB and RTX 4060 will be available from July.

At this point, little is known about what segments the 8 GB and 16 GB variants of the RTX 4060 Ti besides memory size. The RTX 4060 Ti 8 GB is rumored to feature 34 out of 36 streaming multiprocessors (SM) physically present on the 5 nm "AD106" silicon, which gives NVIDIA some theoretical headroom to enable a few more shaders. These 34 work out to 4,352 CUDA cores, while a fully unlocked AD106 has 4,608. The RTX 4060 is a significantly different SKU that's based on a maxed out "AD107" silicon, with 30 SM, or 3,840 CUDA cores, although it should be possible for some RTX 4060 cards be based on a heavily cut-down AD106.
Sources: MEGAsizeGPU (Twitter), VideoCardz
Add your own comment

120 Comments on NVIDIA GeForce RTX 4060 Ti Available as 8 GB and 16 GB, This Month. RTX 4060 in July

#103
Paranoir andando
MahboiOfc not WTF lol???
Adding 4Go of VRAM means redrawing the bus. That's redrawing the I/O, which means redrawing the chip. That's something they never ever do. They prefer doubling the VRAM on the bus (3060 12Go) than ever remaking a chip. And it's the same for AMD. We'll never have any 4070s with 16Go. We may have 4060 Tis with 16Go or 4070Tis with 24Go (highly doubtful on the the latter) though.
You take a mid-low card as 4070 (32% cudas) with 192 bit, eliminate 64 leaving only 128 bits as low end card and then you put the 16GB. It's so easy, XD

Something like: 2060 Super 8GB (TU106) vs 2060 12GB (TU106).

But 2060 Super is a midrange card with 47% of cudas and 256bit, not as the mid-low 4070.
Posted on Reply
#104
Mahboi
Paranoir andandoYou take a mid-low card as 4070 (32% cudas) with 192 bit, eliminate 64 leaving only 128 bits as low end card and then you put the 16GB. It's so easy, XD
.......
Posted on Reply
#105
GamerNerves
Since prices are so high, but availability good, and now we are getting RTX 4060 Ti with so much memory, this card vs RX 7700 XT is going to be the battle of the ages, though in the last two gens RX x700 matched the RTX xx70 tier, so I'm not sure if it is the RX 7700 that is the direct competitor, but either way, this match up fighting for customers is going to be historically an epic one in the GPU scene - market demand is so high. Refreshes are likely too, at least from AMD, and hopefully from Nvidia too as a counter measure, since RX 7000 series did not completely hit it's goals as was discussed in the tech news before, so when they get a stable stream of great chips or do some further redesign, RX 7x50 models will yet up the game.
The rest of the year is going be the exciting period for value oriented buyers, on the CPU front too, because Ryzen 7000 chips simply need to be priced down to move them anywhere. Mobo manufacturers' actions are unknown though, but I bet they could quite easily lower margins, at least a bit. Even if the components on many AM5 boards are certainly high quality, we have just seen how Asus has put pretty low effort to refine even their most expensive boards (which Gamers Nexus reported about), which doesn't point to increased costs in supporting the products, though who knows if this is a case of some fresh engineers in the business or something similar. Either way, I dare to wish the end of the year is finally the time of upgrading for a significant majority.
Posted on Reply
#106
cbb
huh. Latest news on videocardz.com shows the 4060Ti 16GB version has roughly half the bandwidth (and a half sized bus) of my 2018 2070?? that's seems a bizarre bit of "progress"? Roughly double the cores, and same power tho, so that is definitely progress, but curious about the bandwidth. Ofc it makes sense it has the same bus/bandwidth as the 8GB, otherwise that'd be a big change and arguably another card entirely, just that I hadn't compared that spec to my old card until now. It might not matter too much if it's using the extra vram to store textures & assets for quick re-use, rather than whole new scenes (which, presumably, would lean more on the bandwidth to the rest of the system?)? idk, I haven't done real engineering since the z80 was current, so I'll admit to handwaving a bit (!) here. And it wouldn't require a new psu, which was putting me off the discounted radeon 69xxXTs.
Normally, I kinda expect each gen to roughly equal the prior gen's next card up. And, in processing this probably exceeds that. But half the bandwidth is a surprise, although (as noted above) idk if it'll be an issue for users. My impression is a lot of the value of the larger vram is swap space/storage space so it doesn't have to fetch from the rest of the system (which is slow) as often, so maybe just fine? Guess we'll see!
Posted on Reply
#107
Punkenjoy
cbbhuh. Latest news on videocardz.com shows the 4060Ti 16GB version has roughly half the bandwidth (and a half sized bus) of my 2018 2070?? that's seems a bizarre bit of "progress"? Roughly double the cores, and same power tho, so that is definitely progress, but curious about the bandwidth. Ofc it makes sense it has the same bus/bandwidth as the 8GB, otherwise that'd be a big change and arguably another card entirely, just that I hadn't compared that spec to my old card until now. It might not matter too much if it's using the extra vram to store textures & assets for quick re-use, rather than whole new scenes (which, presumably, would lean more on the bandwidth to the rest of the system?)? idk, I haven't done real engineering since the z80 was current, so I'll admit to handwaving a bit (!) here. And it wouldn't require a new psu, which was putting me off the discounted radeon 69xxXTs.
Normally, I kinda expect each gen to roughly equal the prior gen's next card up. And, in processing this probably exceeds that. But half the bandwidth is a surprise, although (as noted above) idk if it'll be an issue for users. My impression is a lot of the value of the larger vram is swap space/storage space so it doesn't have to fetch from the rest of the system (which is slow) as often, so maybe just fine? Guess we'll see!
The 4060Ti have way more L2 cache than the 2070. The effective bandwidth of the VRAM + Cache subsystem is probably around the same or higher than the 2070..
Posted on Reply
#108
wheresmycar
cbbhuh. Latest news on videocardz.com shows the 4060Ti 16GB version has roughly half the bandwidth (and a half sized bus) of my 2018 2070?? that's seems a bizarre bit of "progress"? Roughly double the cores, and same power tho, so that is definitely progress, but curious about the bandwidth. Ofc it makes sense it has the same bus/bandwidth as the 8GB, otherwise that'd be a big change and arguably another card entirely, just that I hadn't compared that spec to my old card until now. It might not matter too much if it's using the extra vram to store textures & assets for quick re-use, rather than whole new scenes (which, presumably, would lean more on the bandwidth to the rest of the system?)? idk, I haven't done real engineering since the z80 was current, so I'll admit to handwaving a bit (!) here. And it wouldn't require a new psu, which was putting me off the discounted radeon 69xxXTs.
Normally, I kinda expect each gen to roughly equal the prior gen's next card up. And, in processing this probably exceeds that. But half the bandwidth is a surprise, although (as noted above) idk if it'll be an issue for users. My impression is a lot of the value of the larger vram is swap space/storage space so it doesn't have to fetch from the rest of the system (which is slow) as often, so maybe just fine? Guess we'll see!
This is definitely a concern. Bandwidths are crucial for snappier data access. More VRAM and higher bandwidths usually go hand in hand otherwise increased latency or lack of real-time VRAM utilisation will end up with adverse performance. Or, for the laymen, we end with the illusion more VRAM wasn't necessary in the first place.

The idea being, more VRAM "to store" rendering elements, textures or other visual effects alongside faster memory bandwidths "to access/transfer" real-time graphical data quickly. Nowadays smart game engines more-often rely on faster and wider bandwidths for real-time dynamic assets/effects swapping hence compromising on memory speed (or transfer rates) will most likely end up with reduced performance, frame drops and simply a bad case of poorer sustained visual fidelity. Breaking the balance is a kick in the teeth and as usual the easy way out in scapegoating developer optimisations as the primary culprit, well they usually share blame but hardware limitations are sometimes overlooked and present unappealing challenges which devs are probably not bothered to entertain.

Obviously the same doesn't apply to everyone, the balance between VRAM and memory bandwidths will depend on the users specific needs and use case, and its important to consider both factors when selecting a graphics card (for the less-informed, benchmarks and reviews often help to stay on top of all the riff raff)
Posted on Reply
#109
BoboOOZ
On this generation, Nvidia did what AMD did on the previous generation, added a large L2 cache to diminish the need for bandwidth. It worked fine for AMD, so I don't see any reason why it won't work for Nvidia. Bandwidth won't be a problem, but relatively low core counts and VRAM will.
Posted on Reply
#110
ValenOne
PunkenjoyThe 4060Ti have way more L2 cache than the 2070. The effective bandwidth of the VRAM + Cache subsystem is probably around the same or higher than the 2070..
4060 Ti's 32 MB L2 cache with delta color compression can hold the entire 1920x1080p frame buffers.

Xbox One's split render with on-chip 32 MB ESRAM (70% of 1080p render) and system memory example. NVIDIA has superior delta color compression (DCC).




NVIDIA's delta color compression conserves bandwidth and data storage.
Posted on Reply
#111
Paranoir andando
cbbhuh. Latest news on videocardz.com shows the 4060Ti 16GB version has roughly half the bandwidth (and a half sized bus) of my 2018 2070?? that's seems a bizarre bit of "progress"? . . . .
RTX 3050 8GB (2560 cudas) (128bit) has more cudas than RTX 2070 (2304cudas) (256bit). Yes, has more cudas and less bit. (are diferent SP)

There is nothing new, except a name change in the RTX 4000series. Look in my signature.
BoboOOZOn this generation, Nvidia did what AMD did on the previous generation, added a large L2 cache to diminish the need for bandwidth. It worked fine for AMD, so I don't see any reason why it won't work for Nvidia. Bandwidth won't be a problem, but relatively low core counts and VRAM will.
This is not true. This generation has exactly the same bus than previous generations.

100% - 67% cudas --> 384 bit
66% - 45% cudas --> 256 bit
44% - 30% cudas --> 192 bit
29% - 15% cudas --> 128 bit
14% - 0% cudas --> 64 bit

Everything is exactly the same as always.

100% cudas2080Ti12 - 384 bits3090Ti - 384 bits4090Ti - RTX6000 - 384 bit
55% cudas2070 Super (55%) - 256 bit3070 (55%) - 256 bit4080 (53%) - 256 bit
33% cudas1660Ti - 192 bit3060 - 192 bit4070 - 192 bit
25% cudas1650 Super (28%) - 128 bit3050 8GB (24%) - 128 bit4060Ti (24%) - 128 bit
Posted on Reply
#112
RainingTacco
ixiRtx 4060 ti with 16GB, what kind of miracle is this?
It's GDDR6.
Posted on Reply
#113
AusWolf
Paranoir andandoRTX 3050 8GB (2560 cudas) (128bit) has more cudas than RTX 2070 (2304cudas) (256bit). Yes, has more cudas and less bit. (are diferent SP)








You cannot compare CUDA cores across generations, especially since Nvidia changed what the term means (cheeky move, imo).

The 2070 has 2304 FP and 2304 INT cores. A pair counts as a CUDA core, that's why it has 2304.

The 3050 has 1280 FP and 1280 INT cores. Each one counts as a CUDA core, that's why it has 2560 (while it technically does not).

If there was a direct comparison, then the 3050 would be faster than the 2070, which it is clearly not.

Generally speaking, 1 Ampere/Ada core = ~0.5-0.75 Turing core in performance.
Posted on Reply
#114
BoboOOZ
Paranoir andandoThis is not true. This generation has exactly the same bus than previous generations.

100% - 67% cudas --> 384 bit
66% - 45% cudas --> 256 bit
44% - 30% cudas --> 192 bit
29% - 15% cudas --> 128 bit
14% - 0% cudas --> 64 bit

Everything is exactly the same as always.








Your % cuda indicator is a decent relative indicatior for analysing Nvidias market segmentation. However, that's where the utility stops.

Memory bus sizes allow balancing the power of the core with the output of the memory. Basically you have a 82 TFlop 4090 with 1 GB/s memory bandwith being balanced compared to a 35TFlop 3090 balanced with a 0.9 GB/s one. How? Because of the extra cache. %maximum cuda has nothing to do there.
Posted on Reply
#115
ValenOne
AusWolfYou cannot compare CUDA cores across generations, especially since Nvidia changed what the term means (cheeky move, imo).

The 2070 has 2304 FP and 2304 INT cores. A pair counts as a CUDA core, that's why it has 2304.

The 3050 has 1280 FP and 1280 INT cores. Each one counts as a CUDA core, that's why it has 2560 (while it technically does not).

If there was a direct comparison, then the 3050 would be faster than the 2070, which it is clearly not.

Generally speaking, 1 Ampere/Ada core = ~0.5-0.75 Turing core in performance.
RTX 3050 has 2560 CUDA cores with half of them being able to execute integer datatypes i.e. 1,280 CUDA FP and 1,280 CUDA FP/INT.



From www.nvidia.com/content/PDF/nvidia-ampere-ga-102-gpu-architecture-whitepaper-v2.pdf

GA10x SM evolved from TU10x SM when integer units gained floating-point support.

RTX 3070 has 184 TMUs, 96 ROPS, 2,944 CUDA FP32 cores and 2,944 CUDA FP32/INT32 cores.

RTX 3050 has 80 TMUs, 32 ROPS, 1,280 CUDA FP32 cores and 1,280 CUDA FP32/INT32 cores.

RTX 2080 has 144 TMUs, 64 ROPS, 2304 CUDA FP32 cores and 2304 CUDA INT32 cores.
Posted on Reply
#116
Paranoir andando
AusWolfYou cannot compare CUDA cores across generations, especially since Nvidia changed what the term means (cheeky move, imo).
...
...
I know it, I wrote "(are diferent SP)". Just that is what I meant to say, you can't compare different things. ValenOne has explained it well, are 1280 + 1280shared, but it doesn't matter now.

You have to compare inside the family and then look at the piece of the cake, piece of chocolate, is trying to sell us Nvidia and at what price.
1/2 of Lovelace, 1/3 of Lovelace, 1/4 of Lovelace.

Medium cake is mid-range? A quarter of cake is mid-range? Is mid-range a quarter of Lovelace?
4060 Ti is a quarter-range GPU.
BoboOOZYour % cuda indicator is a decent relative indicatior for analysing Nvidias market segmentation. However, that's where the utility stops.

Memory bus sizes allow balancing the power of the core with the output of the memory. Basically you have a 82 TFlop 4090 with 1 GB/s memory bandwith being balanced compared to a 35TFlop 3090 balanced with a 0.9 GB/s one. How? Because of the extra cache. %maximum cuda has nothing to do there.
The increase of cache is a characteristic of all Lovelace, so it becomes irrelevant for comparison. I only compare within the same family.

What he meant before is:
Lovelace has the same memory bus as the previous generations, Lovelace needs the same memory bus as the previous generations.
In Lovelace the speed of memory is higher (or the same) than the previous generations. Lovelace needs faster ram (or the same) than previous generations.
- Memory speed x bus = Bandwidth

Lovelace has higher bandwidth than previous generations, Lovelace needs higher bandwidth than previous generations. (or the same in the worst case)
To say the opposite is to lie.
I hope no one says Lovelace has or needs less bandwidth than previous generations.

Full Lovelace has and needs more bandwidth than previous
1/2 of Lovelace has and needs more bandwidth than previous
1/3 of Lovelace has and needs more bandwidth than previous
1/4 of Lovelace has and needs more bandwidth than previous
Repetitive but easy to understand
Posted on Reply
#117
ValenOne
Paranoir andandoI know it, I wrote "(are diferent SP)". Just that is what I meant to say, you can't compare different things. ValenOne has explained it well, are 1280 + 1280shared, but it doesn't matter now.

You have to compare inside the family and then look at the piece of the cake, piece of chocolate, is trying to sell us Nvidia and at what price.
1/2 of Lovelace, 1/3 of Lovelace, 1/4 of Lovelace.

Medium cake is mid-range? A quarter of cake is mid-range? Is mid-range a quarter of Lovelace?
4060 Ti is a quarter-range GPU.


The increase of cache is a characteristic of all Lovelace, so it becomes irrelevant for comparison. I only compare within the same family.

What he meant before is:
Lovelace has the same memory bus as the previous generations, Lovelace needs the same memory bus as the previous generations.
In Lovelace the speed of memory is higher (or the same) than the previous generations. Lovelace needs faster ram (or the same) than previous generations.
- Memory speed x bus = Bandwidth

Lovelace has higher bandwidth than previous generations, Lovelace needs higher bandwidth than previous generations. (or the same in the worst case)
To say the opposite is to lie. I hope no one says Lovelace has or needs less bandwidth than previous generations.

Full Lovelace has and needs more bandwidth than previous
1/2 of Lovelace has and needs more bandwidth than previous
1/3 of Lovelace has and needs more bandwidth than previous
1/4 of Lovelace has and needs more bandwidth than previous
Repetitive but easy to understand
For texture-mapped 3D games, the mid-range ADA SKU should be around the middle TMU (textures management units) count from the current flagship ADA SKU.

Pure TFLOPS debate is meaningless for texture-mapped 3D accelerated games.

RTX 4090 has 16384 CUDA cores with 82.58 TFLOPS and 512 TMUs (1,290 GTexel/s). AIB OC is higher e.g. 86.5 TFLOPS and 1,352 GTexel/s.

RTX 4080 has 9728 CUDA cores with 48.74 TFLOPS and 304 TMUs (761.5 GTexel/s). AIB OC is higher e.g. 51.36 TFLOPS and 802.6 GTexel/s. It can be higher with AIB's single-button auto OC e.g. 55 TFLOPS.

RTX 4070 Ti has 7680 CUDA cores with 40.09 TFLOPS and 240 TMUs (626.4 GTexel/s). AIB OC is higher e.g. 42 TFLOPS and 658.8 GTexel/s.

The mid-range textured 3D ADA is about RTX 4070 / 4070 Ti level.
Posted on Reply
#118
Paranoir andando
ValenOneFor texture-mapped 3D games, the mid-range ADA SKU should be around the middle TMU (textures management units) count from the current flagship ADA SKU.

Pure TFLOPS debate is meaningless for texture-mapped 3D accelerated games.

RTX 4090 has 16384 CUDA cores...
...
It's the same!! Do you prefer TMUs? Has the same %. It's OK
1/2 of cake, 1/3 of cake, 1/4 of cake, ...

First: No, the full Lovelace is not 16000 cudas...

RTX 6000 (and future 4090Ti) has: 18176 cudas & 568 TMUs (and it is not really the full Lovelace. It has 18432 cudas & 576 TMUs)
If you do not take the full cake you will have wrong percentages and you will not be able to compare with previous generations
1/2 of Lovelace are 284 TMUs
1/3 of Lovelace are 189 TMUs
1/4 of Lovelace are 142 TMUs

RTX 4090 has a 90% of TMUs
RTX 4080Ti has a 77% of TMUs
RTX 4080 has a 1/2 of TMUs
RTX 4070 has a 1/3 of TMUs
RTX 4060Ti has a 1/4 of TMUs

You can see in my signature.

Edit: TMU are Cuda/32 in Lovelace, therefore RTX-4060-Ti has 4352cudas/32 = 136 TMU (24%) --> In the web say 128 TMU ("may change in the future"), but it is wrong: www.techpowerup.com/gpu-specs/geforce-rtx-4060-ti.c3890
Posted on Reply
#119
wheresmycar
Paranoir andando1/2 of cake, 1/3 of cake, 1/4 of cake, ...
i'll have half pls. Can i get a cup of tea to go with it?
Posted on Reply
#120
ValenOne
Paranoir andandoIt's the same!! Do you prefer TMUs? Has the same %. It's OK
1/2 of cake, 1/3 of cake, 1/4 of cake, ...

First: No, the full Lovelace is not 16000 cudas...

RTX 6000 (and future 4090Ti) has: 18176 cudas & 568 TMUs (and it is not really the full Lovelace. It has 18432 cudas & 576 TMUs)
If you do not take the full cake you will have wrong percentages and you will not be able to compare with previous generations
1/2 of Lovelace are 284 TMUs
1/3 of Lovelace are 189 TMUs
1/4 of Lovelace are 142 TMUs

RTX 4090 has a 90% of TMUs
RTX 4080Ti has a 77% of TMUs
RTX 4080 has a 1/2 of TMUs
RTX 4070 has a 1/3 of TMUs
RTX 4060Ti has a 1/4 of TMUs

You can see in my signature.

Edit: TMU are Cuda/32 in Lovelace, therefore RTX-4060-Ti has 4352cudas/32 = 136 TMU (24%) --> In the web say 128 TMU ("may change in the future"), but it is wrong: www.techpowerup.com/gpu-specs/geforce-rtx-4060-ti.c3890
I stated the current flagship gaming ADA SKU. I am aware of the full AD102 cuda count.

RTX 4090 (16384 cuda) is not the full AD102 (18432 cuda).
RTX 4080 (9728 cuda) is not the full AD103 (10240 cuda).

NVIDIA is reserving the fully enabled AD102 and AD103 for the future product stack refresh which is useless for the current product stack.

GPU clock speed is part of the SKU characteristics, hence my use of GTexel/s scaling.

RTX 6000 ADA has 568 TMUs and 1,423 GTexel/s, 96 MB L2 cache. No AIB OC variants. Not a gaming SKU.

RTX 4090 has 512 TMUs and 1,290 GTexel/s, 72 MB L2 cache. AIB OC can reach 1,352 GTexel/s.

RTX 4080 has 304 TMUs and 761.5 GTexel/s, , 64 MB L2 cache. AIB OC can reach 802.6 GTexel/s. 59% of RTX 4090's GTexel/s. My Gigabyte RTX 4080 Gaming OC's heatsink was designed for RTX 4090 SKU, hence it's overkill for RTX 4080 i.e. AIB one button ~2.9 Ghz OC is easy.

RTX 4070 Ti has 240 TMUs, 626.4 GTexel/s, 42 MB L2 cache. AIB OC can reach 666.0 GTexel/s. ~49% of RTX 4090's GTexel/s.
Posted on Reply
Add your own comment
May 21st, 2024 17:24 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts