Saturday, March 23rd 2024

AMD 24.3.1 Drivers Unlock RX 7900 GRE Memory OC Limits, Additional Performance Boost Tested

Without making much noise, AMD lifted the memory overclocking limits of the Radeon RX 7900 GRE graphics card with its latest Adrenalin 24.3.1 WHQL drivers, TechPowerUp found. The changelog is a bit vague and states "The maximum memory tuning limit may be incorrectly reported on AMD Radeon RX 7900 GRE graphics products."—we tested it. The RX 7900 GRE has been around since mid-2023, but gained prominence as the company gave it a global launch in February 2024, to help AMD better compete with the NVIDIA GeForce RTX 4070 Super. Before this, the RX 7900 GRE had started out its lifecycle as a special edition product confined to China, and its designers had ensured that it came with just the right performance positioning that didn't end up disrupting other products in the AMD stack. One of these limitations had to do with the memory overclocking potential, which was probably put in place to ensure that the RX 7900 GRE has a near-identical total board power as the RX 7800 XT.

Shortly after the global launch of the RX 7900 GRE, and responding to drama online, AMD declared the limited memory overclocking range a bug and promised a fix. The overclocking limits are defined in the graphics card VBIOS, so increasing those limits would mean shipping BIOS updates for over a dozen SKUs from all the major vendors, and requiring users to upgrade it by themselves. Such a solution isn't very practical, so AMD implemented a clock limit override in their new drivers, which reprograms the power limits on the GPU during boot-up. Nicely done, good job AMD!
During the course of our testing of the PowerColor RX 7900 GRE Hellhound graphics card, we were playing around with overclocking using the latest 24.3.1 WHQL drivers, and found that it increased the memory overclocking slider limit in AMD Software, which can be pushed all the way up to 3000 MHz now (24 Gbps GDDR6-effective). Previously the highest possible setting was 2316 MHz. This doesn't necessarily mean that the memory will overclock all the way up to 24 Gbps, you're still limited by what the GDDR6 chips are capable of. Our PowerColor Hellhound ships with Samsung K4ZAF325BC-SC20 memory chips that are rated for 20 Gbps. With our review drivers for the RX 7900 GRE, we had managed a memory overclock of 2316 MHz (18.5 Gbps GDDR6-effective); but with the new drivers, we scored a spectacular 2604 MHz (20 Gbps), which beats the 19.5 Gbps speed that the RX 7800 XT ships with.

The increased memory speed sees our 3DMark Time Spy GT1 overclocked frame rate jump from 72.6 FPS to 77.1 FPS (GPU frequency was constant between the two runs at 2803 MHz). This brings the card's total overclocking potential to an impressive 15% real-life performance gain. It remains a mystery why AMD chose to go with a slower memory sub-system than the RX 7800 XT for the RX 7900 GRE. It may have to do with achieving an almost identical board power number to the RX 7800 XT, so that board partners could end up with the same cooler noise figures as their RX 7800 XT products; or it was just a product segmentation decision—we'll never know. With the 20 Gbps overclock, the RX 7900 GRE has a hearty 640 GB/s of memory bandwidth at its disposal, which should come in handy to keep the 80 RDNA 3 compute units better-fed.

Thanks to @Dragokar for letting us know of the driver change.
Add your own comment

32 Comments on AMD 24.3.1 Drivers Unlock RX 7900 GRE Memory OC Limits, Additional Performance Boost Tested

#26
Minus Infinity
This is interesting. I would normally say just get the 7900XT but I'm seeing a $250 price difference on the Powercolour Hellhound versions of GRE and XT. Let's say worst case 2600MHz OC is it enough of a jump over the 6800XT to bother, or go the XT route and save for an extra few weeks. I would rather the XTX but the jump in prices is too much IMO.

Nvidia's pricing is insane with 4070 Ti Super way dearer than 7900XT and 4080 Super starting at $1900 way dearer than XTX.
Posted on Reply
#27
RTX4070TI

insane some games almost Rx 7900XT performance . We got a new value King here.
Posted on Reply
#28
FordGT90Concept
"I go fast!1!11!1!"
FeelasThis is however fascinating, since I guess I just hit an ASRock which doesn't want to go over 2400MHz, which would go really closed to the reported instability issues around 2500MHz for some parts.

Which would mean, that some of the worldwide stock is repurposed 7900GRE for China which didn't sell (like mine). It looks like a very confusing case, because I feel that should be refundable... It's not even silicon lottery, it's a hidden-revision-that-is-not-in-specs at all. Either purposefully locked based on BW or binned based on max stable memory controller speed.
The GDDR modules installed on the card are up to the AIB partner's discretion so long as they meet or exceed reference spec. In this case, PowerColor (see OP) has much more capable Samsung chips installed compared your ASRock model. In short, ASRock skimped on the GDDR and PowerColor did not (likely used the same modules the 7900 XT and XTX have). All AMD did was enable overclocking of the VRAM so you can unlock whatever capability the GDDR modules have from the factory that were previously limited by the VBIOS.

That said, people shouldn't go into GRE thinking every card's VRAM is going to be able to overclock up to 7900 XT spec. AIBs like changing cards up based on GDDR chip availability/pricing without public notice. Stock clocks are all that is guaranteed/warrantied.


It's still not clear to me why AMD is doing this unless consumer overclocks are exempt from the 600 GB/s regulation.
alwaysstsLots of really fascinating info/takes is this thread, thanks all for your contributions (and for helping inform people/make the whole ecosystem better).

I, for one, had forgotten the limits to specs (other than flops) wrt exports to China (which may impact some 'stock' product decisions beyond 4090D, perhaps even how certain chips are designed with that limitation in mind in the future) in ways I haven't personally looked into on a deep-enough level to comment. Thanks for the info/reminder.

That said, GDDR would be volatile memory, correct? I imagine (but am not well-versed on the current definition of I/O bw) this is in relation to external bandwidth (links/protocol) and if anything may perhaps only relate to their (capabilities of) external cache structure? While I don't know the actual bw (or limitations of the link), using it's observational impact on performance I've generally equated AMD's L3 structure to something like adding 3mhz(/gbps) over the bus in relation to core speed (as the cache is likely clocked in relation to core; haven't looked into it). So, for instance, if something like a 7800xt/7900gre were running at 2800mhz core clock, the L3 would be contributing what equates to 268.8Gbps for DRAM bw. Obviously it's faster as the size is smaller, but in terms of perf does appear to shake out if you look at it wrt bw limitations. The same is true of nVIDIA's L2 cache and roughly double at same size (similar to core clock*6mhz over the bus), which nVIDIA all-but-confirmed with one product release not so long ago (something in the 4060/4070 series iirc) to excuse the more narrow bus wrt the competition.

In the case of 4090 (with 72MB L2), for instance, that translates to something like ~2730*4.5*384/8 = ~589.68Gbps, which is indeed oddly close to the number you stated. That's internal though, so I don't know if that was part of that decision-making process of the chip and/or effected by those export rules (or rather the rules were built around them after it was completed and/or released)? Now you've got me curious, but I always had a feeling these 'rules' were constructed as a compromise between the government and nVIDIA as for them to still be able use that chip in China, but for them to be conscious of using restraint moving forward.

I'm sure someone more studied could read those last couple paragraphs and declare me a simpleton, and that's fair-enough! I respect there are people that know much more in the intricate areas of these things (and their definitions/classifications) than I do, have have kept up on it better, and/or have information most don't have. I generally focus on the real-world impact to most users, and where limitations are/how they can be improved. I don't always know the correct specific language/terminology regarding how each area is classified. I'm sure there are people around here that can shed much more light on it than myself. I don't *think* what you are implying would apply, but I absolutely could not state that as a fact as I just don't have the inherent knowledge or haven't done the reseach to declare that as fact.
It's possible Hack-A-Day misinterpreted the regulation but what we do know is that the regulation mostly targeted AI/high performance computing which often ship with HBM. The regulation effectively made exporting HBM products to China illegal. The language might be fuzzy but the real world application is the 600 GB/s limit on GDDR/HBM performance which have almost entirely vanished from the Chinese market.
Posted on Reply
#29
Dragokar
alwaysstsKeep banging that drum, buddy. I'd love for you and any other people that have similar issues to have them completely resolved. Compatability with old titles can sometimes be a tricky thing (depending on how something was programmed), and sometimes not (it's just a matter of getting ahold of the right person and them having the time to fix it/implement a change). All I can hope is that if there is an easily-enough implementable fix on AMD's side that you are able to make that connection to them and it can/will be sorted to the best of their capability and your appeasement, even if it takes them a few tries to address your (or others) particular issue(s). I wish you much luck, and appreciate you for trying to get that sorted for your community of players. That's what it's all about, right? Just trying to make things the best they can be if it's possible; calling out issues/perceptions and hoping it reaches the right ears so that both they know about it and getting someone capable to help with a solution that is applicable/agreeable to the most people, if able/possible...and doing one's best at going about those things without being offensive/negative as possible.

(The last part is a personal reminder [I mean well, but know I can sometimes come across unintendly harsh towards people when trying to make a point], your attitude towards the issue appears good-natured and on-point, and once-again I applaud your persistance. :))
Yeah I just want it to works like it over a year ago did. Sometimes I tend to get a bit overly engaged and might be misunderstood, but since GSC Gameworld does not care about the game anymore the only company that can help is AMD and I try my very best to get it solved.
Posted on Reply
#30
Feelas
FordGT90ConceptThe GDDR modules installed on the card are up to the AIB partner's discretion so long as they meet or exceed reference spec. In this case, PowerColor (see OP) has much more capable Samsung chips installed compared your ASRock model. In short, ASRock skimped on the GDDR and PowerColor did not (likely used the same modules the 7900 XT and XTX have). All AMD did was enable overclocking of the VRAM so you can unlock whatever capability the GDDR modules have from the factory that were previously limited by the VBIOS.
I would agree, but we are talking about clocking at 2500MHz and here's the review for Steel Legend.

Qutoing:
[ICODE]The GDDR6 memory chips are made by Samsung and carry the model number K4ZAF325BC-SC20. They are specified to run at 2500 MHz (20 Gbps effective).[/ICODE]

So you are suggesting that I have received a different memory PN or Samsung is selling overspecced ICs? Given that TPU BIOS dumpfor the card is correct, the BIOS supports either Hynix H56G42AS8DX014 or Samsung K4ZAF325BC. Judging by Google, both chip models are possible to find on GRE/XT/XTX all alike and the Samsung part doesn't come in a variant specced for 18Gbp: it is available as SC16 for 16Gbps or SC20 speecced for 20Gbps. Looks like an issue with the memory controller instead, which of course points that it is "in the reference spec", given they underclock for 2250MHz. It is completely baffling they would put 20Gbps VRAMs into a card capable of only 18Gbps...

Perhaps there is a bigger issue for AMD, given that many people on XT/XTX are suffering from black screen issues - did AMD badly bin RDNA3 with modules incapable of >2400MHz VRAM clocks into XT/XTX and those are suffering from the problems? Overclocking on GRE >2400MHz very similarly causes black screen & system hard reset. Maybe there is a bigger RDNA3 failure at play, not a GRE-related one...

LabRat 891 asked for GPU-Z screenshot, here it is.
Posted on Reply
#31
FordGT90Concept
"I go fast!1!11!1!"
FeelasSo you are suggesting that I have received a different memory PN or Samsung is selling overspecced ICs? Given that TPU BIOS dumpfor the card is correct, the BIOS supports either Hynix H56G42AS8DX014 or Samsung K4ZAF325BC. Judging by Google, both chip models are possible to find on GRE/XT/XTX all alike and the Samsung part doesn't come in a variant specced for 18Gbp: it is available as SC16 for 16Gbps or SC20 speecced for 20Gbps. Looks like an issue with the memory controller instead, which of course points that it is "in the reference spec", given they underclock for 2250MHz. It is completely baffling they would put 20Gbps VRAMs into a card capable of only 18Gbps...
It's entirely possible that your specific chip was binned for GRE because the memory controller couldn't handle 7900 XT spec.



I did a lot more digging into what happened. Short version is this:
1) November 2022, the performance export ban was issued that mentions the 600 GB/s rule on non-volatile memory: hackaday.com/2022/11/09/chinese-chips-are-being-artificially-slowed-to-dodge-us-export-regulations/
2) July 2023, 7900 GRE launches: www.tomshardware.com/news/amd-radeon-rx-7900-gre-launch
3) October 2023, Department of Commerce announces change in the export ban that strongly targets tensor cores (TFLOP calculation): www.tomshardware.com/news/no-nvidia-isnt-breaking-gpu-sanctions-analyst
4) November 2023, the rule goes into effect, 4090 disappears because it's not compliant, and 7900 series sales surge in China to fill the demand: hothardware.com/news/amds-flagship-radeon-rx-7900-xtx-and-xt-gpus-flourish-in-china

I believe GRE was created in case the rule was modified to apply the 600 GB/s rule to volatile memory which would limit Chinese sales of 7900 XT and 7900 XTX. Dell apparently misinterpreted the rule and proactively applied it in November 2023: www.techpowerup.com/316044/dell-allegedly-prohibits-sales-of-high-end-radeon-and-instinct-mi-gpus-in-china

AMD wasn't the only company to release a 600 GB/s product preemptively; Birin Technology (a Chinese company) did the same (see hackaday article above).

Because the 2023 replaced the rule invalidating the 2022 rule which mentioned 600 GB/s, AMD felt they were no longer threatened by it so they enabled overclocking in the driver for the card.

So, all good now, except that Chinese can't get 4090 anymore (hence 4090D www.tomshardware.com/pc-components/gpus/nvidia-launches-china-specific-rtx-4090d-dragon-gpu-sanctions-compliant-model-has-fewer-cores-and-lower-power-draw).
Posted on Reply
#32
alwayssts
FordGT90ConceptI did a lot more digging into what happened. Short version is this:
1) November 2022, the performance export ban was issued that mentions the 600 GB/s rule on non-volatile memory: hackaday.com/2022/11/09/chinese-chips-are-being-artificially-slowed-to-dodge-us-export-regulations/
2) July 2023, 7900 GRE launches: www.tomshardware.com/news/amd-radeon-rx-7900-gre-launch
3) October 2023, Department of Commerce announces change in the export ban that strongly targets tensor cores (TFLOP calculation): www.tomshardware.com/news/no-nvidia-isnt-breaking-gpu-sanctions-analyst
4) November 2023, the rule goes into effect, 4090 disappears because it's not compliant, and 7900 series sales surge in China to fill the demand: hothardware.com/news/amds-flagship-radeon-rx-7900-xtx-and-xt-gpus-flourish-in-china

I believe GRE was created in case the rule was modified to apply the 600 GB/s rule to volatile memory which would limit Chinese sales of 7900 XT and 7900 XTX. Dell apparently misinterpreted the rule and proactively applied it in November 2023: www.techpowerup.com/316044/dell-allegedly-prohibits-sales-of-high-end-radeon-and-instinct-mi-gpus-in-china

AMD wasn't the only company to release a 600 GB/s product preemptively; Birin Technology (a Chinese company) did the same (see hackaday article above).

Because the 2023 replaced the rule invalidating the 2022 rule which mentioned 600 GB/s, AMD felt they were no longer threatened by it so they enabled overclocking in the driver for the card.

So, all good now, except that Chinese can't get 4090 anymore (hence 4090D www.tomshardware.com/pc-components/gpus/nvidia-launches-china-specific-rtx-4090d-dragon-gpu-sanctions-compliant-model-has-fewer-cores-and-lower-power-draw).
Appreciate your time on the research; I figured it was likely links (chip-to-chip communication; interconnects: ex infinity fabric et al) that could be used for things other than just volatile memory.

Yeah, 4090 was a no-no due to FP32 (Tflops), I knew that part. Didn't know/remember there was a 8-bit (TOP) performance(/density) update to the law, but I can certainly believe that part (I won't get into it).

I think we used to live in a world that largely relied on full/double precision (FP32/64), where-as now that has morphed into larger use of FP16 and 8-bit calculations (for which can be more densely packed).

I really should read the actual current guidelines, as obviously and as you can see, things can be easily misinterpreted and incorrect information passed along (even by the press and/or major companies).

It's actually quite fascinating imho: It used to be these companies were so far ahead of the governments they would circumvent whatever (eventually) archaic rule by implementing different design decisions. In this case the government actually appeared to very much understand what they were doing (at least in the updated law, which makes sense) while perhaps some in the supply chain did not (or keep current).

That said, I quite dislike talking about this stuff, tbh. Although it is important to understand the what/why, I personally very much prefer to be a unifier rather than discussing limitations in international trade.

Edit: TMW you look at your post later and realize you meant double-precision (64-bit floating point), single precision (FP32), and half-precision (FP16)...but worded it incorrectly.
Posted on Reply
Add your own comment
May 16th, 2024 00:44 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts