N48 L2 cache / How N48 is saving budget gamers from a 12GB future.

alwayssts · 2025-03-02T00:03:07+0000

Edit: After I wrote this I realized 8Mb L2 + 64MB L3 really is essentially 40MB L2...so it's 3.75x instead of 4 as interpreted below...so these numbers off a bit (depending on how much difference shader cache makes.)
Please forgive that error in the message below, and depending on other changes '4' may still be more-or-less correct, but hopefully you can understand the rough speculative point remains, even if not *perfect*.
I'll figure out the *real* limitations when there's actual info out there, but still should qualm some fears.
All it means is clocks would be equalized a *little* bit lower, but it should still be close-ish, clocks/bw still match pretty well. Core clock limitations (at 375 power) and bw matching w/ 20gbps still make sense.

Here's your nugget for today:

Interesting thing I noticed is that they buffed up the L2 cache from 4MB to 8MB. If you want to understand why this matters, I'll give you a quick bw crash course.

nVIDIA's L2 cache essentially translates to 6mhz (at core clock) added to the memory bandwidth total. AMD's similar size L3 cache adds 3mhz.
(FWIW, this is how we know nVIDIA will probably do 6144sp/9216/12288sp designs on 3nm @ ~3700-3780/36000. HIgh-end [90] probably 40gbps, but conversation for another day.)

So, for instance, when balanced something like N31 (7900xtx) at 2720mhz would be similar to having 20000mhz + 8160mhz (2720*3) = similar to 28160mhz over the 256-bit bus.

By increasing the L2, this number goes from 3 to ~4 (or ~2/3 nvida's cache).
I see they also note the shader cache is 2MB, but don't know what it used to be; L3 is the same.
I'm going to assume the L2 is the most-meaningful increase, but I need to look into that more (and how shader cache effects FSR, etc). If anyone can shed light on that, please do chime in.

Point is, where-as before I was kind of worried about bandwidth limitations, and I still am consider how high it *may* be able to clock, this bears well up to ~375w.

Point is, instead of being bandwidth-limited at 2720mhz @ 20gbps, it would instead me equalized to ~3148mhz.
My math surely isn't *perfect* (because it uses a lineage of past configs stacked), but it's very close...perhaps ~3150mhz.

Ofc, even weirder is that some parts have memory at 2518 (20144), which is weird to have the memory set higher than stock if 20gbps, but we shall see.
So, if we add that little bit... ~3170mhz?

Why is this interesting? Well, for two reasons. First, it allows factory overclock cards (up to ~3150mhz?) w/o having to screw with memory clock. Why is this clock important? Well, you figure it out.

This is the type of thing I would make (maxing out equalized units/cache at yield of a high clock at stock; the core fab process/mem chips OC potential [at end of low-leak curve] at the limits of a power connector).
If it can clock higher, it would likely be diminishing returns (or at least proportionally higher power-draw), but people that buy 3x8-pin are likely ready to accept that reality. I'm all for this methodology.
I still think the product segmentation is weird, barring people being able to flash 9070 xt bioses to 9070s or something, but that's a marketing thing; not a fault of the engineers. It makes the same point, I suppose.

I thought someone else out there (that likes equalized designs per market) might appreciate it as well.

Knowing a couple of things (like 329w will get us ~3244mhz), we can extrapolate 375w *could* give us up ~3470mhz (clock scales 1/2 power).
We really don't know how well chips will scale wrt leakage past 3154mhz (per nVIDIA).
This ofc is the range of the HDL/SRAM (~3154-3390?), which we've seen nVIDIA uses on all Blackwells so far.
HPL 3460-3700mhz for Apple, but may translate to slightly lower on GPUs. Which did AMD use? I dunno. I always assumed HPL, bc they usually do, but we'll see.
It's worth noting AMD has said there are designs up to 340W, so you can do that math yourself. Okay, I'll do it for you, it's 3300mhz *exactly*.
Typical overclock of 20gbps on AMD memory (using AMD's control panel) is 2700mhz (21600mhz). Equalized this would equal bandwidth good for ~3400mhz; perhaps a more realistic expectation for 375w.
As some of you are aware, some cards w/ 20gbps ram do operate higher, up to ~2800mhz, which could translate to ~3500mhz (more-than-enough for whatever 375w could generally give, but not overdone).
Obviously some SKUs have had voltage limited to the ram clock lower, or some just get unlucky with slightly lower-clocking chips, but I would not expect that to be the case here.
I imagine you'll generally be able to match core/bw.

So, one more worry about this thing gone. Anticipating this launch and what people are able to do with it. If not at stock, it would appear an OC may clear many hurdles...

We shall see, and you can do your own calculations (or just wait for what power limits are given/clocks are achieved at launch), but I thought I'd give a heads up...because I thought it was cool/interesting.

I take it back fwiw, this is my AMD/ATi...if it does this approximate thing.

_________________________________________________________________________________________________________

Different subject:

Interesting thing to note about this part is that it truly is preempting a low-end Rubin part.

Since (12GB) RAM becomes a limitation with 6144sp @ 3700mhz, and 3780mhz becomes a bandwidth limitation for 36gbps/128-bit GDDR7/nVIDIA's cache structure, AMD is playing a very delicious game of chicken.

I'm sure nVIDIA would love to offer such a part with 12GB, but...

Either nVIDIA will be dragged kicking and screaming into using 16GB at the low-end, they will have to use 40gbps and clock the chip higher, or they will lose to this chip; there-by 9070xt FineWine™.
And they sure as hell won't be able to sell it for ~~$1000~~ ~~$750~~ at that point, not even $600 like AMD currently will. No, this level of perf *should* firmly enter the LOW-END (new bracket), where it should be on 3nm.
Budget gamers deserve 45-60TF/16GB too, even if they don't understand. They deserve to be able to do the traced rays and the up-scales and the what-nots; N48 helps push this along where-as nVIDIA stalled it.

This, in-case you are wondering, is why we need AMD. This thing I am explaining right now. If we didn't have them, the worse-future for consumers would occur, without doubt..

Another thing to note; 340W power limit...3300mhz for 8192sp...that equals 6144sp @ 4400mhz. A potential hint to the maximum clocks of N3E (at least wrt GPUs), perhaps? My guess is yes.

Given this is the clock of the M4, and nVIDIA usually uses HDL (Like .0199 SRAM on N3B, vs .021/.023 on N3E/P), it may be AMD capitalizing on showing the weakness of nVIDIA using the most-dense/lowest-clocking process. It also might be an insight to what AMD themselves are doing (but perhaps not), as something like that (6144sp/128-bit) that *does* have that clock potential could potentially replace N48.

For instance, maybe nVIDIA uses a '3n' fake-named tweak of 3nm that capitalizes on the density of N3B (but other general improvements of n3e, similar to they did on 5/4nm).
They did this with '4n', and again did it again with Blackwell. They used the most-dense version of each process, where AMD used the full clock potential (which arguably kinda-sorta didn't work for Navi 3).
N3B clocked at 3780-4050mhz, where-as N3E 4050-4510mhz. If nVIDIA did as they have again, small Rubin (and potentially others) would be in bad shape, especially if AMD uses a similar design/high clocks.
Like-wise, AMD could even use something like 5760sp (1920sp/16 ROP chiplets x 3) clocked in that higher bracket. It would be more efficient (architecturally speaking, not clock/power) AND win.
AMD could very-well keep a template of 2048 themselves, though (for FSR/RT)...I'm just saying. That would work (until nVIDIA decides to clock their chips higher), if not preempting that eventuality from nVIDIA.

Conceivably about 3 different ways nVIDIA's tried and true formula could fail, and potentially already has, even if still pushing their 'minor, incremental improvements' some love to boast.
I truly do believe they would have loved to give people just over 45TF, which would help in some situations, but still 12GB, which is an incomplete/artificially-limited package, just like so many others.
Now they may be forced to actually make a decent card right-away. That is a good thing. 5070 will suffer for it, but whatever, that card is literally being sold on the "5070 is a 4090" bc (it has to) upscale BS.
Marketing-speak like that can die in a fire right-quick, replaced with actually decent hardware ASAP.

That said, it is possible nVIDIA uses HPL on 5070, and that could continue to this Rubin, but I doubt it. I think it's more likely if any chip uses HPL instead of HDL on Rubin, it will be the largest (384-bit?) chip.
Why is this, class? Say it with me: So they can replace their old chips with something cheaper and sell it again as better, but really isn't (much); and they could have made that perf on the former process. Like AMD.
The only exception is the highest end. You DO need to understand how it works, contrary to Huang's opinion preceding his perhaps most-quoted line of all time.
My goal is that one day you'll all get it. I'm sure some will when whatever limitations of Blackwell (max clocks vs stock clock of whatever Rubin that replaces it) are exploited to penalize those cards come Rubin.
And 9070xt is still potentially faster vs comp. On an old process. With a price that was competitive with where they likely want to place it long before-hand. I mean, $550 (now) would be better, but whatever.
I get why they did it... 6144 @ 4050mhz/8192 = 3037.5mhz. That might be a little high given how GPUs clock relatively, it might not even hit 4ghz (saved for a later gen). If similar to this gen, maybe ~ 3968mhz.
So if that part, at it's maximum expected potential were $550....to be $600 AMD would need to clock to at least ~3247-3314mhz...Hmm..wonder what it'll do (oh wait, we know... approx that). Making sense yet?

I'm kinda starting to like this AMD, they're starting to get how the game is played, and planning accordingly. Bravo.

Even the stock clock of 9070xt tells us something; 6144 @ 3780mhz / 8192sp = 2835mhz. Even accounting for a 64 ROP limitation (or perhaps using FSR), that's still less than 3100mhz, if barely.
So it should essentially beat a chip like that, well, all the time.

Which is one of the many reasons why there are many 9070xt's clocked at 3100mhz (the other being they obviously expect this general level of performance for games to be tailored around in the future).

If you ever wanted to know how 3D chess is played, there ya go.

You can buy 9070xt knowing that the 3nm 128-bit 5060-class/5070 replacement (that may not suck, but could have and still might) probably won't beat it (in any meaningful way).
That chip might now be a bigger jump over 5070 (because ram amount, if nothing else), making it not suck (like 5070), and/or cheaper because of N48. This is another reason not to buy 5070/12GB cards.
All of this is good, because it gives N48 longevity going into the new (RT) era, even if at the low-end, and also theoretically Geforce users could have a cheap GPU that isn't awful in one way or another.

Win-win, but mostly...

FineWine™ win. Common-sense design and affordability win. Moving the market forward win. Cheaper prices for this level of performance next-gen win. Better quality for all people if most have 16GB win.

Highest-density/low clock potential/slow cheap ram (potentially Micron)/slowest improvement to edge along consumers w/ minimal and fake progress bc can make CHEAPER: FAIL.

Mayyybe I'm wrong, but what if I'm not? All I ask for is a fair market with decent prices/products and progress...so BOLO for what they do, and think about if it had gone the other way; positive or negative.

Hopefully you think about this before buying any part, especially from nVIDIA, and you look at and/or think about this exact thing wrt your current part when considering an upgrade.
Especially those, probably like me, that will be looking at the literal 7th iteration of 3090 or 3rd version of AD103 (since larger cache to cut ram cost), now with that last shader cluster, that they could have made all-along...just as AMD did with 7900xtx, even if RT/FSR a lower-priority. Next time it won't be. That one will likely be a fight for the ages.

What will nVIDIA do then? Sell more ~~G-sync modules~~ ~~Optical Flow Engines~~ ~~lesser-ops as a feature~~ ~~6060 is a 4090~~ snake oil? The answer is probably yes.
Even better DLSS! Suddenly promoting using more ram (maybe showing FG FR)! Things that make your current cards look worse...surely someone finding a way to deprecate ways 5080 passes 60fps as we speak.

Just...wait for it. And pay attention. That's all I'm asking. You do you after that.

System Name	HTPC whhaaaat?
Processor	2600k @ 4500mhz
Motherboard	Asus Maximus IV gene-z gen3
Cooling	Noctua NH-C14
Memory	Gskill Ripjaw 2x4gb
Video Card(s)	EVGA 1080 FTW @ 2037/11016
Storage	2x512GB MX100/1x Agility 3 128gb ssds, Seagate 3TB HDD
Display(s)	Vizio P 65'' 4k tv
Case	Lian Li pc-c50b
Audio Device(s)	Denon 3311
Power Supply	Corsair 620HX

N48 L2 cache / How N48 is saving budget gamers from a 12GB future.

alwayssts