Monday, October 31st 2022
AMD Radeon RX 7900 XTX RDNA3 Prototype Leaked, Confirms Reference Cooler Design
Here's what is possibly the very first picture of an AMD Radeon RX 7900 XTX RDNA3 graphics card. AMD engineering samples and prototypes tend to use red color PCBs, which is what this card is. It reveals what could be the final design of the reference cooling solution for the card, and it seems to match the teasers the company put out in its Ryzen 7000-series launch event.
The RX 7900 XTX cooling solution design builds on that of its predecessor. The card itself has 3 slots thick, but slightly longer than the RX 6900 XT. The aluminium fin-stack heatsink is bulkier than the one on the RX 6900 XT cooler, and appears to be bursting out of the vents. It stretches out to the edges of the cooler shroud. The bulge toward the tail-end could be housing the tips of the heat-pipes. The prototype card has two 8-pin PCIe power inputs. There's no backplate, because the PCB has several headers in place for diagnostics and developmental use by AIBs and OEMs.
Source:
HXL (Twitter)
The RX 7900 XTX cooling solution design builds on that of its predecessor. The card itself has 3 slots thick, but slightly longer than the RX 6900 XT. The aluminium fin-stack heatsink is bulkier than the one on the RX 6900 XT cooler, and appears to be bursting out of the vents. It stretches out to the edges of the cooler shroud. The bulge toward the tail-end could be housing the tips of the heat-pipes. The prototype card has two 8-pin PCIe power inputs. There's no backplate, because the PCB has several headers in place for diagnostics and developmental use by AIBs and OEMs.
75 Comments on AMD Radeon RX 7900 XTX RDNA3 Prototype Leaked, Confirms Reference Cooler Design
The XTX may have three 8-pins or at least two 8-pins and a 6-pin.
www.techpowerup.com/gpu-specs/radeon-9700-pro.c50
Fun part about the previous owner and an example of average consumer.
15+ years ago he wanted a system to use his PCI satellite card. So I had build him a system with good enough CPU and RAM and chose a simple HD 2400 Pro for the graphics card, thinking he isn't playing games. Huge mistake, it wasn't capable of 1080p satelite channel playback. So, I tell him to go and ask the shop to replace that card with an HD 2600, thinking that card will be enough. He instead gone and bought the strongest AMD there was. The HD 3870 just to playback video streams from the satellite card. Maybe one of the few HD 3870's that didn't run a 3D game for at least a decade.
Can`t you mount the radiator on top as exhaust and add 2 fans in front instead as intake?My bad, after a closer inspection I have seen the size of that rad.
Wonder how long until a case maker thinks to integrate a radiator into a case design...
10752 is marketing BS, real number is 5376 (yes, they can do fp+fp, that doesn't double their number) vs 5120 by AMD.
Especially with die size in mind, AMD going with +20% shaders is 10+ times more probable, than AMD going with magical +140%.
The latter is simply technically impossible, how the heck do you cram that into that chip?
+20% shaders also aligns well with only 2 8 pin connectors (so 375W max).
That's some pretty narrow tunnel vision there.
("pulling less power" per TPU just shows how lacking TPUs "power pulling" testing is, but that's beside the point)
NV went from 5376 cores to 8192 cores, a 50% increase, and created new bazinga connector to feed the beast.
AMD rolling out only two 8-pin connectors perfectly aligns with 20% increase in shaders over 6950XT.
5700XT was 2560 cores. A card that was merely high range. 6950XT is only 2 times that.
140% (2.4 times) increase gen over gen, while sitting on two 8-pins is not even remotely imaginable. No, it doesn't. Starting with 3000 series, Huang decided to claim twice the number of shaders the card had.
Not that Huang's claims needed anything of substance to support it (e.g. 4000 are "2-4 times faster"), but formal reason was that new shaders could do fp+fp, so they "should be counted as 2". It is still one shader (a dumb mini CPU)
FE 2080 Ti, 3080s, 4080 (anything below xx90) and 3090 Turbo (asus/gigabyte) xD
Where, on this, is a shader? Those listed cores don't have any scheduling logic like a CPU has, that's up in the warp scheduler. No instruction or data cache, that's in the register file and instruction buffer(and L1, obviously).
From a CPU perspective, nvidia has 4 cores per SM on maxwell, pascal(bar GP100), Turing, Ampere, and Lovelace.
So, ok, we've established that shaders are not similar to CPU cores, on maxwell here a shader is a unit that can do FP32 math, it's an FPU. How do we know that it's just FP32 math? Well the load/store units there would be akin to an AGU, they request and store data, which does do math but not the same type of math. And the Special Function Units handle more complex math like Sine transforms, which use things other than standard FP32.
So nvidia uses FP32 units for its shader counts, great. Now lets address your claim that Ampere uses 2 paired FPUs to reach the 128 shaders per SM number(note that such is the same number Maxwell and gaming Pascal reach).
As you can see, the general structure remains the same, though there are changes carried over from Turing(fewer SFUs, LD/ST units closer to the cache, cache moved locations along with the SFUs, etc).
As you can also see, there are 2 datapaths in each SM subsection, 8 in total. One of these datapaths only does FP32 math while the other can do both, but it does both by having INT32 and FP32 units(ALUs and FPUs) on the same scheduling system, pairing them together but keeping the actual hardware units separate(likely for ease of design or for lower power draw, possibly to ease the next issue I'll talk about). Previous to Turing, all shaders were set up like this which caused some issues with context switching, that being the SM would take a few cycles(upwards of 10) to swap to an integer instruction and then take another few cycles to swap back to FP instructions.
With Ampere they decided to keep the split structures from Turing but added in more FP only units, as only around 30% of instructions are integer it makes sense to have only 1/3 of the SM be ALUs rather than the 1/2 Turing has. Ideally what they'd do is split INT off into its own datapath again, but it's possible that the extra die size incurred from such a move makes the economics infeasible or those INT units are primarily for doing tasks that the FPUs stall on and so it wouldn't gain much performance.
AMD, meanwhile, defines them similarly:
This is a Vega 20 compute unit, or CU, as found in the Mi50 and Mi60 datacenter cards.
They specifically call out Vector ALUs, or groups of FPUs and ALUs all paired together on the same datapath. Vega splits them up into subunits 4 units of 16 each and 2 major units of 32 each. This was changed in RDNA in that they merged 2 of the 16 wide units into one, getting a single 32 wide one and then pairing those into groups of 4(the WGPs).
But, still, the definitions are the same. 1 Shader=1 32bit FPU.
also: All shaders can do 2 floating point operations per clock cycle. All of them. That's why the formula to calculate tflops is Shaders*2*Clockspeed
2080Ti, 4352 CUs
3080, 8704 "CUs" <= faux ones, apparently
So, 3080 had:
- "twice as many" CUs, that also "have higher IPC"
- was clocked higher
- had faster memory
But GPU performance of 3080 vs 2080Ti went ahead by just 25/30%www.techpowerup.com/review/nvidia-geforce-rtx-3080-founders-edition/34.html
What does this performance gain align well with? Oh yeah, the same number of CUs with a small IPC bump (nowhere 1.4 times claimed by NV though) and higher clock.
The performance per shader went down with Ampere, which is an entirely normal and expected behavior and one that Nvidia has had before with Kepler. They even tell you what a shader core is in kepler within its white paper: If you look at the specs, a GTX 580 should be a lot slower than the GTX 760... but it isn't, it's only 6% slower according to TPU. Even looking at the tflops it should be slower than that, 1.58 vs 2.378
Going down in per shader performance almost always correlates with adding more shaders, because nothing scales perfectly. I'd expect a 30-40% gain in performance from a naive doubling of shaders, more if you double a lot of supporting structures with it.
If you do it perfectly, you'd scale up to what your front end allows, but this isn't always possible because you'd need to sacrifice a lot of area for routing logic, so you'd not be able to double shaders overall making scaling gains moot.
i wonder,if amd launch rdna3 gpus today,do we see also review of them?
i hope...but if so,example Techpowerup know alot now,all of it.
they have both cards at least few days now, and reviews are done.