Yeah...I'm pretty confused by all this. Let me explain.
I can understand delaying launch a bit to get FSR4 out of
Alpha Beta Research Project (WTF?) phase, and putting MFG into the drivers.
People kid about that; I'm not. I think that could be a real thing that's happening so they have feature parity at launch. Not a bad idea, but it certainly doesn't look great on it's face to be playing catch-up.
The thing that befuddles me about this launch is AMD truly trying to fake out people about the design's clock potential.
Like I've said before, N31 was equalized to clock/bw at ~2720/20000. This design is literally 2/3 N31 with other beefed up attributes.
When you look at N32, it's essentially 200mm 5nm base chip with ~150mm of 7nm cache.
It also was almost-certainly aimed at 2.93ghz (with overclocking to ~3.24ghz), just like Apple's first "efficient" 5nm A-series design. (8192/7680*2720=2900mhz). nVIDIA also followed/follows this philosophy.
If you bare out of the standard memory overclock to 2700mhz (tight timings), that's equal to 7680sp @~3133mhz-capable according to my math.
Weird how 7700xt appears locked to 3133mhz max in the MFing bios....Except not really. 7800xt is such wasted potential so this series looks better...and that's a remnant of what might have been for N32.
7800xt is power-limited so it can't reach that potential. If one could hit a 2800mhz mem clock (as Navi3 is capable in many instances), 7800xt is 3.25ghz-capable, which supports my 3.24ghz max theory.
So, IOW, I'm fairly sure this was the goal...but
whatever happened happened™, and somewhere between FUBAR design and marketing we got what we got so we could get Navi4 and have it look better.
Also have a cheap card similar to the PS5 pro (that itself is absurdly bw limited and clocked into the dirt), but whatever. Bring on the (hopefully 11264sp/6SM/96 ROP...bc perfect ratio) PS6.
Anyway...shrinking N32 all down to 4nm should've left one with a die size not any larger than ~200mm2 + ~120mm2 of cache (.021 5nm sram vs .027 7nm sram).
Why on earth would the die size be ~390mm2 if not for a boosted clock/mem bus potential?
It wouldn't be...Also Apple's later 'tuned' design on the process was 3.46ghz. In my estimation, Navi3's design targeted 2.93-3.24ghz (just like Apple), while Navi4 targets 3.46ghz (just like Apple).
IMO, RTX50 is still targeting a goal of ~3.24ghz (probably max of ~3300).
While this isn't readily apparent in the 5080/5090 stock design designations (~2840-2855?), it will be with 5070...which can be estimated to have a clock of ~3165mhz. That's how it gets away with 6144sp.
Also, you know, 7168*2700 (9070 clock) = 6144sp @ 3150mhz. There, AMD, I did your homework for you.
Once again, the bandwidth on N48 shakes out in a couple of interesting ways if similar L3 (∞) cache.
8192@2720mhz = 7168sp @ >3100mhz/20000. Figure that average overclock to 2700mhz ram on many Navi3 cards and that shoots to >3350mhz.
2800mhz (something W1z often achieved on Navi3) ram = 3.48ghz capable.
Now comes the interesting part...
With a ram clockspeed of 24000, 8192sp should be capable of a clock of ~3264mhz. If it can hit 25600 (same as a 2700/21600 on 20ghz ram), that's the same 3.48ghz as a tapped-out 2800mhz 7168sp part.
Coincidence? I think not!
Now...THOSE...Those would've been good parts. This stack...this is (at stock)...not ok.
In my humble estimation, this line-up should have been 7168sp@3.1ghz/20gbps and 8192@ 3264@24gbps. There is absolutely no way these cards shouldn't mostly be able to hit 3.3ghz with enough juice.
This is similar to saying it's likely GTX50 will probably hit 3300mhz with enough juice in at least some instances. That's the point of libraries....Where one ends the other begins.
The 2.93-3.24ghz library/design is likely 2900-3300mhz aimed at higher density and lower voltage curve and nvidia will likely split the difference (~3.15-3.165...more on that in a sec).
The 3.46ghz design/libary/decap ~3.3-~3.6/3.7ghz (Apple's later 5nm A Series was 3.46ghz. M-series 3.5-3.7ghz). Before you say they aren't comparable...they are. GPUs clock a little bit lower, but not much.
Instead we have 9070 listed at a bottom-dollar yield of 2700mhz core (just like the worst overclocks on Navi3 or the stock clock of GTX40), and the weird part...
9070XT @ 2970mhz. The reason for this clock is two-fold. First of all is (20gbps?) memory bw, the second is because of general rop/shader calculations.
According to my age-old math, 16 ROPs pairs well with ~1877.7777sp for general raster. This is why 7680sp makes sense as a design. While N32 has 96 ROPs, it's likely generally used as 64@~7511.111.
If you use my math, 20gbps would be equal to 7511.11111sp @ ~2.967ghz. IOW, this is likely why 9070xt it's clocked where it is. IOW, a fixed/non-crippled Navi32 in practice, even if not appearance.
N32/7800xt (potential) died for this; it's some bullshit to see it regurgitated...and if it's priced higher with the same RAM that's laughable.
Sooooo........One really has to start to wonder some things about this release....
Was Samsung unable to make enough 24gbps ram, which is conceivable? Did AMD deem it too expensive (which is a bad decision if true)? Is Navi4 a failed design wrt high clocks like Navi 3?
Or...as is my suspicion, is AMD going to lock these cards down to < 3.4ghz and bin the ever-living crap out of chips for what can only assume will be a 9070
XTi (bc name-stealing) @ ~3264(+?)/24000?
The reason why this is important is again multi-faceted.
While on one hand 8192sp@3264mhz is the same flops as 9728@ 2748mhz (4080 stock clock is ~2715-2775mhz, avg 2730)...~3.46(3.48?)/25600 equals a 4080 super.
These are important metrics of parity, regardless if you think raw flops are important. It's also pretty clearly the ENTIRE POINT OF THE DESIGN. If they don't do this they fucked up.
Not only that, it's likely N48 is divided into 4 engines, just like GB205/5070 (2048@4, where-as 5070 could be 1536sp @ 4 like Ada). 4x1536 = 6144sp.
AD104 (4070ti) was 5 (5x1536 = 7680). So, you know, 5/4=1.25. 2715-2775 x 1.25 = 3394-3469mhz. Are you seeing a theme?
Hopefully you're beginning to see why this (high-clock/mem speed) is important.
Even MORE importantly GTX5070ti could be 5 @ 1792. It could be more...I haven't checked numbers...but GB203 5080 is likely 1792 @ 6 (to save die space) while AD102 was actually 7...6@1536/1@1024.
So Blackwell (in only the case of GB203) may actually REGRESS in RT per clock, made up by higher clockspeed. Say 3.165ghz vs 2715mhz (as a rough parity). Now you know where I got that clock.
It doesn't matter because AMD isn't competing on that level. While N31 has 6 engines, it isn't clocked that high. When it is clocked that high, it doesn't have enough bw.
This is a way nVIDIA could 'get' people....by limiting RT perf on 5070Ti (
even though 1792sp is actually a much-better ratio for RT as a feature and not a show-piece/bullshit ploy, but I digress).
IMO optimal is 1920sp per SM, but one could make an arguement for 1792 (as has always been the case for 16 ROP clusters). It goes back to the ~1878sp utilization rate and clock 2x power of shader logic.
So 1920sp wins by a hair, imo, plus those flops can be used for other stuff than raster (as is the AMD way of 2048sp per 16 ROPs in many designs)...IF YOU HAVE ENOUGH MEMORY BANDWIDTH.
Point is, if 5070 is 4 @ ~3150-3165mhz, as I think it will be, it will beat 9070xt in RT because with 20gbps RAM it's limited to ~2.72/2.97ghz. That's just fucking sad when 1/3 more shader potential.
Even with overclocking ram to 2800mhz that's only ~3046mhz full shader utilzation...Likely 3060mhz by AMD math (given that's the OC model clock)...
You can almost smell the wasted clock potential from here. All bc bandwidth. If 24gbps wasn't an option they should've overhauled the cache to be L2 like nVIDIA.
If using 64/7511.1111 split a higher clock could help it win with a mem overclock, but 5070 overclock potential may make it equalish in RT.
TLDR, N32 is actually a better design wrt shaders/rops (could have been 64 ROPs TBH), but 8192sp makes sense wrt high clock potential and if using 24gbps memory (or better cache).
At the end of the day, AMD has fucked this up badly...and I've spent a good 25 years defending them (and ATi).
I know they can't control yields (even if they've supposedly been sitting on this chip a while and could've binned them) nor Samsung (if they can't produce 24gbps GDDR6 in quantity)...but shit.
FSR4 could be ready. MFG could be ready (even if 3x given it is a lot easier to do than 4x...which I'm still not convinced nVIDIA can make both make work and perform substantially better than 3x).
This stack could have been pretty kick-ass...but whomever came up with it was no Dave Baumann.
Instead we have this wonky-ass 9070 xt with 20gbps ram (unless we've been fooled for some reason and Hynix made some special 21gbps+ batch) that I doubt will benefit from dual power connectors.
We'll probably have a locked-down 9070 so it literally doesn't perform exactly the same (given clock potential of the chip + limited bw).
9070 could have been a bomb card in it's own right (and it might still be depending on how AMD handles OC/power limits) if up to 375w. Watch them lock it way lower for no reason other than to
fail upsell.
9070xt could have been a
tight 375w card. A super fun to play with 375w+ card with 24gbps.
This stack exists for what? So AMD could compete favorably in price with 5070? That's nonsense. N48 is a good design if it does the things I mention. The stock configs we've seen do not.
So much of this is absurd. The ~20gbps (clocked) ram on XT. The fear to price it over 5070. Lack of 7900xtx with similar clock potential/RDNA4 benefits with 24GB/24gbps that would've been wonderful.
I think the potential is still there for these to be nice cards given the price, but I wouldn't blame anyone that waits for a 24GB 3nm card (say something like 6x1920=11520sp/256-bit 40-42gbps/24GB).
I think that's the play, imo. Plus, it'll likely be cheaper than a 5080 or a theoretical 7900xtx-like RDNA4 chip on 4nm.
They could've made a perfect 16GB card with 24gbps ram and high clocks, even if between 5070 and 5070ti in price. Maybe they still will. I wonder when they'll launch it? With this in March, later, or never?
They could have made a nice long-lasting 24GB card if 7900xtx w/ 24gbps/high-clocks. Perhaps there is a reason for that tho (nobody wants to sell a card that's 'perfect'/cheap as then you'll never upgrade.)
FWIW, that's why the gap between 5080/5090, and why 5080 is 16GB. Between that will get you through the next console gen. I figure a little over 80TF for a nice PC experience for literally 10 years.
Interesting that a theoretical 7900xtx using N48 principals would've filled this niche...or is it? Why sell it once if N48/GB203 users will need to eventually buy another card? So nice, they'll sell you stuff twice.
Instead, we get this. And a delay. And FSR4 in Alpha. Honestly...WTF is going on over there?
Where is the AMD/ATi I grew up loving for their engineering prowess...even in face of marketing adversity? I will forever champion their better core design decisions (and OC ability) vs nVIDIA's marketing.
Instead, it feels like they are 100% followers (in this instance), down to waiting for pricing, after-launch performance/clocks for opposition research, and even coping the naming scheme. Pretty damn sad.
Like I said, though, 9070XTi when? How much? That's what I want to know. If
THAT was coming it March, I'd understand. When it's called that, btw, I will laugh through my tears.