Tuesday, May 9th 2023
NVIDIA GeForce RTX 4060 Ti Available as 8 GB and 16 GB, This Month. RTX 4060 in July
In what could explain the greater attention by leaky taps on the GeForce RTX 4060 Ti compared to its sibling, the RTX 4060, NVIDIA is preparing a staggered launch for its RTX 4060-series. We're also learning that there are as many as three SKUs in the series—the RTX 4060 Ti 8 GB, the RTX 4060 Ti 16 GB, and the RTX 4060. All three will be announced later this month, however, only the RTX 4060 Ti 8 GB will be available to purchase at the time. The RTX 4060 Ti 16 GB and RTX 4060 will be available from July.
At this point, little is known about what segments the 8 GB and 16 GB variants of the RTX 4060 Ti besides memory size. The RTX 4060 Ti 8 GB is rumored to feature 34 out of 36 streaming multiprocessors (SM) physically present on the 5 nm "AD106" silicon, which gives NVIDIA some theoretical headroom to enable a few more shaders. These 34 work out to 4,352 CUDA cores, while a fully unlocked AD106 has 4,608. The RTX 4060 is a significantly different SKU that's based on a maxed out "AD107" silicon, with 30 SM, or 3,840 CUDA cores, although it should be possible for some RTX 4060 cards be based on a heavily cut-down AD106.
Sources:
MEGAsizeGPU (Twitter), VideoCardz
At this point, little is known about what segments the 8 GB and 16 GB variants of the RTX 4060 Ti besides memory size. The RTX 4060 Ti 8 GB is rumored to feature 34 out of 36 streaming multiprocessors (SM) physically present on the 5 nm "AD106" silicon, which gives NVIDIA some theoretical headroom to enable a few more shaders. These 34 work out to 4,352 CUDA cores, while a fully unlocked AD106 has 4,608. The RTX 4060 is a significantly different SKU that's based on a maxed out "AD107" silicon, with 30 SM, or 3,840 CUDA cores, although it should be possible for some RTX 4060 cards be based on a heavily cut-down AD106.
120 Comments on NVIDIA GeForce RTX 4060 Ti Available as 8 GB and 16 GB, This Month. RTX 4060 in July
Something like: 2060 Super 8GB (TU106) vs 2060 12GB (TU106).
But 2060 Super is a midrange card with 47% of cudas and 256bit, not as the mid-low 4070.
The rest of the year is going be the exciting period for value oriented buyers, on the CPU front too, because Ryzen 7000 chips simply need to be priced down to move them anywhere. Mobo manufacturers' actions are unknown though, but I bet they could quite easily lower margins, at least a bit. Even if the components on many AM5 boards are certainly high quality, we have just seen how Asus has put pretty low effort to refine even their most expensive boards (which Gamers Nexus reported about), which doesn't point to increased costs in supporting the products, though who knows if this is a case of some fresh engineers in the business or something similar. Either way, I dare to wish the end of the year is finally the time of upgrading for a significant majority.
Normally, I kinda expect each gen to roughly equal the prior gen's next card up. And, in processing this probably exceeds that. But half the bandwidth is a surprise, although (as noted above) idk if it'll be an issue for users. My impression is a lot of the value of the larger vram is swap space/storage space so it doesn't have to fetch from the rest of the system (which is slow) as often, so maybe just fine? Guess we'll see!
The idea being, more VRAM "to store" rendering elements, textures or other visual effects alongside faster memory bandwidths "to access/transfer" real-time graphical data quickly. Nowadays smart game engines more-often rely on faster and wider bandwidths for real-time dynamic assets/effects swapping hence compromising on memory speed (or transfer rates) will most likely end up with reduced performance, frame drops and simply a bad case of poorer sustained visual fidelity. Breaking the balance is a kick in the teeth and as usual the easy way out in scapegoating developer optimisations as the primary culprit, well they usually share blame but hardware limitations are sometimes overlooked and present unappealing challenges which devs are probably not bothered to entertain.
Obviously the same doesn't apply to everyone, the balance between VRAM and memory bandwidths will depend on the users specific needs and use case, and its important to consider both factors when selecting a graphics card (for the less-informed, benchmarks and reviews often help to stay on top of all the riff raff)
Xbox One's split render with on-chip 32 MB ESRAM (70% of 1080p render) and system memory example. NVIDIA has superior delta color compression (DCC).
NVIDIA's delta color compression conserves bandwidth and data storage.
There is nothing new, except a name change in the RTX 4000series. Look in my signature. This is not true. This generation has exactly the same bus than previous generations.
100% - 67% cudas --> 384 bit
66% - 45% cudas --> 256 bit
44% - 30% cudas --> 192 bit
29% - 15% cudas --> 128 bit
14% - 0% cudas --> 64 bit
Everything is exactly the same as always.
The 2070 has 2304 FP and 2304 INT cores. A pair counts as a CUDA core, that's why it has 2304.
The 3050 has 1280 FP and 1280 INT cores. Each one counts as a CUDA core, that's why it has 2560 (while it technically does not).
If there was a direct comparison, then the 3050 would be faster than the 2070, which it is clearly not.
Generally speaking, 1 Ampere/Ada core = ~0.5-0.75 Turing core in performance.
Memory bus sizes allow balancing the power of the core with the output of the memory. Basically you have a 82 TFlop 4090 with 1 GB/s memory bandwith being balanced compared to a 35TFlop 3090 balanced with a 0.9 GB/s one. How? Because of the extra cache. %maximum cuda has nothing to do there.
From www.nvidia.com/content/PDF/nvidia-ampere-ga-102-gpu-architecture-whitepaper-v2.pdf
GA10x SM evolved from TU10x SM when integer units gained floating-point support.
RTX 3070 has 184 TMUs, 96 ROPS, 2,944 CUDA FP32 cores and 2,944 CUDA FP32/INT32 cores.
RTX 3050 has 80 TMUs, 32 ROPS, 1,280 CUDA FP32 cores and 1,280 CUDA FP32/INT32 cores.
RTX 2080 has 144 TMUs, 64 ROPS, 2304 CUDA FP32 cores and 2304 CUDA INT32 cores.
You have to compare inside the family and then look at the piece of the cake, piece of chocolate, is trying to sell us Nvidia and at what price.
1/2 of Lovelace, 1/3 of Lovelace, 1/4 of Lovelace.
Medium cake is mid-range? A quarter of cake is mid-range? Is mid-range a quarter of Lovelace?
4060 Ti is a quarter-range GPU. The increase of cache is a characteristic of all Lovelace, so it becomes irrelevant for comparison. I only compare within the same family.
What he meant before is:
Lovelace has the same memory bus as the previous generations, Lovelace needs the same memory bus as the previous generations.
In Lovelace the speed of memory is higher (or the same) than the previous generations. Lovelace needs faster ram (or the same) than previous generations.
- Memory speed x bus = Bandwidth
Lovelace has higher bandwidth than previous generations, Lovelace needs higher bandwidth than previous generations. (or the same in the worst case)
To say the opposite is to lie. I hope no one says Lovelace has or needs less bandwidth than previous generations.
Full Lovelace has and needs more bandwidth than previous
1/2 of Lovelace has and needs more bandwidth than previous
1/3 of Lovelace has and needs more bandwidth than previous
1/4 of Lovelace has and needs more bandwidth than previous
Repetitive but easy to understand
Pure TFLOPS debate is meaningless for texture-mapped 3D accelerated games.
RTX 4090 has 16384 CUDA cores with 82.58 TFLOPS and 512 TMUs (1,290 GTexel/s). AIB OC is higher e.g. 86.5 TFLOPS and 1,352 GTexel/s.
RTX 4080 has 9728 CUDA cores with 48.74 TFLOPS and 304 TMUs (761.5 GTexel/s). AIB OC is higher e.g. 51.36 TFLOPS and 802.6 GTexel/s. It can be higher with AIB's single-button auto OC e.g. 55 TFLOPS.
RTX 4070 Ti has 7680 CUDA cores with 40.09 TFLOPS and 240 TMUs (626.4 GTexel/s). AIB OC is higher e.g. 42 TFLOPS and 658.8 GTexel/s.
The mid-range textured 3D ADA is about RTX 4070 / 4070 Ti level.
1/2 of cake, 1/3 of cake, 1/4 of cake, ...
First: No, the full Lovelace is not 16000 cudas...
RTX 6000 (and future 4090Ti) has: 18176 cudas & 568 TMUs (and it is not really the full Lovelace. It has 18432 cudas & 576 TMUs)
If you do not take the full cake you will have wrong percentages and you will not be able to compare with previous generations
1/2 of Lovelace are 284 TMUs
1/3 of Lovelace are 189 TMUs
1/4 of Lovelace are 142 TMUs
RTX 4090 has a 90% of TMUs
RTX 4080Ti has a 77% of TMUs
RTX 4080 has a 1/2 of TMUs
RTX 4070 has a 1/3 of TMUs
RTX 4060Ti has a 1/4 of TMUs
You can see in my signature.
Edit: TMU are Cuda/32 in Lovelace, therefore RTX-4060-Ti has 4352cudas/32 = 136 TMU (24%) --> In the web say 128 TMU ("may change in the future"), but it is wrong: www.techpowerup.com/gpu-specs/geforce-rtx-4060-ti.c3890
RTX 4090 (16384 cuda) is not the full AD102 (18432 cuda).
RTX 4080 (9728 cuda) is not the full AD103 (10240 cuda).
NVIDIA is reserving the fully enabled AD102 and AD103 for the future product stack refresh which is useless for the current product stack.
GPU clock speed is part of the SKU characteristics, hence my use of GTexel/s scaling.
RTX 6000 ADA has 568 TMUs and 1,423 GTexel/s, 96 MB L2 cache. No AIB OC variants. Not a gaming SKU.
RTX 4090 has 512 TMUs and 1,290 GTexel/s, 72 MB L2 cache. AIB OC can reach 1,352 GTexel/s.
RTX 4080 has 304 TMUs and 761.5 GTexel/s, , 64 MB L2 cache. AIB OC can reach 802.6 GTexel/s. 59% of RTX 4090's GTexel/s. My Gigabyte RTX 4080 Gaming OC's heatsink was designed for RTX 4090 SKU, hence it's overkill for RTX 4080 i.e. AIB one button ~2.9 Ghz OC is easy.
RTX 4070 Ti has 240 TMUs, 626.4 GTexel/s, 42 MB L2 cache. AIB OC can reach 666.0 GTexel/s. ~49% of RTX 4090's GTexel/s.