AMD Radeon RX 6400

The red spirit · May 6, 2022

AusWolf said:
I don't know much about compute (and I don't care, either), so I take your word for it.

Have you ever tried Folding@Home or BOINC?

AusWolf said:
As for which architecture is better, I somewhat disagree. Terascale was awesome at the time of release, but newer games killed it. GCN was also great.

That's exactly why I don't say that Terascale was good. Perhaps it was alright arch, but wasn't as universally good as others like GCN or Tesla, thus lacked in longevity. I have some Terascale cards and you could see how immature arch was. First iteration was downright awful and once it matured, it was already quite old. Plus, there were quite big core design differences between different Terascale versions too. And those weren't just incremental improvements, but more like complete overhauls of the whole thing. Terascale 1 in last revision was good, but second version was eh. Terascale 1, rev 1 was flawed and slow. ATi was also quite behind nV in compute stuff, only 4000 series got good at that and later. And at that time ATi was awful at writing drivers, there were Omega drivers that managed to extract 20% more performance.

AusWolf said:
On nvidia's side, I agree with what you said about Kepler and Fermi - they were hot, hungry, but otherwise kind of meh. Maxwell was a huge improvement on them, just like Pascal was on Maxwell. They're both great architectures up to this day, imo. Turing never aimed for more performance over Pascal. It only introduced RT and DLSS. One could call it "Pascal RT" as well. For this, I cannot say that it's bad because it's not. Just a bit different. Ampere on the other hand, is nothing more than a dirty trick. Nvidia added FP32 capability to the INT32 cores to say that they "doubled" the CUDA cores without actually doubling them. Performance per watt stayed the same, though, so technically, it's "Pascal RT v2.0".

Wait, wasn't it Turing that "doubled" cores? Either way, higher driver overhead and lower 1080p performance, was a bit of fail. Turing 1 couldn't ray trace fast, DLSS 1 was clearly flawed and unpleasant to use. And RTX 2080 Ti was the first consumer card to cost more than 1k dollars. It was just rough. Ampere was needed to fix it, but it still didn't solve price, RT or DLSS problems, just made them not as bad. Ampere is still rough and is disposable arch. The only reason for it to exist is to be a step towards something better.

catulitechup · May 6, 2022

thanks to @W1zzard are official pci-e scaling test

AMD Radeon RX 6400 Tested on PCI-Express 3.0

AMD is using a fairly narrow PCIe x4 interface on their Radeon RX 6400 GPU. We're taking AMD's new budget offering for a spin in a PCI-Express 3.0 configuration to determine how big the performance hit will be when running on an older system.

www.techpowerup.com

Valantar · May 6, 2022

The red spirit said:
I wouldn't say that 1060 aged as well. It was initially a bit faster than RX 580 8GB, but years later, RX 580 beats it, not to mention that RX 580 loves vulkan, where it beats 1060. And if you need to run any computational software, then Polaris cards are a lot faster than Pascal cards. It's so ridiculous, that in floating point (double precision) operations, RX 560 beats GTX 1060. RX 580 is beating GTX 1080 Ti. I managed to take advantage of that in BOINC, but yeah I know that this isn't particularly interesting thing for average consumer. Even in single precision FP tasks, Polaris cards beat pascal cards significantly. In double precision floating point compute, RX 580 is still faster than RTX 3070 Ti. That's nuts. The old Vega 64 is still faster than RTX 3090 Ti. So if you need FP64 compute card, Polaris was insanely good. Today, you would need to buy RTX A6000 or Radeon Pro W6800 to beat Vega 64 in FP64 compute. Just to match (kind of Vega 64), minimum spec would be RTX A4000 or Radeon Pro W6600. If you wanted to help C19 vaccine research via BOINC or Folding@Home, you basically had to have GCN based card.
You know what? Polaris and Vega cards strongly remind me of Tesla arch cards. They guzzle power like no tomorrow, but in terms of processing power architecture was very well balanced and just lasted a hella long time. Tesla card like 8800 GTS lasted at least good 6 years and was bearable for 8 years. There just wasn't any architectural flaw (like in Kepler) to make them useless way before their time. GCN is AMD's equivalent of Tesla arch, but more modern and still relevant today. I honestly couldn't say the same about Terrascale or rDNA, GCN was special. Even on nVidia side Fermi, Kepler felt quite disposable architecture and then Turing 1 and Maxwell just weren't great either. Tesla is still the best arch with Pascal being close second.

There's a reason why AMD's CDNA arch is much more of a continuation of GCN than what RDNA is - GCN was far better at pure compute than at translating that compute into gaming performance. Of course, it's mostly silly to complain that FP64 performance is poor on cards that have FP64 explicitly left out as a design decision due to it being entirely unnecessary for their intended purpose. Like it or not, FP64 is almost entirely limited to scientific computing, which is generally not something consumers do. BOINC and the like are exceptions, but also projects based precisely on making use of "left over" compute capabilities of PCs - capabilities they no longer have, which thus begs the question of whether that model is no longer feasible. Useful or not, spending valuable silicon area on FP64 capabilities that <0.00001% of GPU owners will make use of is just plain wasteful.

The red spirit · May 6, 2022

Valantar said:
There's a reason why AMD's CDNA arch is much more of a continuation of GCN than what RDNA is - GCN was far better at pure compute than at translating that compute into gaming performance. Of course, it's mostly silly to complain that FP64 performance is poor on cards that have FP64 explicitly left out as a design decision due to it being entirely unnecessary for their intended purpose. Like it or not, FP64 is almost entirely limited to scientific computing, which is generally not something consumers do. BOINC and the like are exceptions, but also projects based precisely on making use of "left over" compute capabilities of PCs - capabilities they no longer have, which thus begs the question of whether that model is no longer feasible. Useful or not, spending valuable silicon area on FP64 capabilities that <0.00001% of GPU owners will make use of is just plain wasteful.

Sort of, I still remember reading that FP performance matters in some games. nVidia FX 5000 series had fucked up floating point stuff and thus performance in games was disappointing. But those are single precision floating point operations. In the past, FP64 performance used to be intentionally crippled on consumer cards and you could flash Quadro or FirePro vBIOS to regain it back. Anyway, here's a more modern explanation of when and where FP operations are used:

The FP64 performance mostly depends on how existing FP32 are interconnected probably. AFAIK Kerbal Space Program uses some FP64.

AusWolf · May 7, 2022

The red spirit said:
Have you ever tried Folding@Home or BOINC?

I have, but I can't see it being a reason for anyone to buy a graphics card. That is, you don't buy a compute capable graphics card for F@H then game on it. You buy a gaming card and then run F@H when you're not playing anything.

The red spirit said:
That's exactly why I don't say that Terascale was good. Perhaps it was alright arch, but wasn't as universally good as others like GCN or Tesla, thus lacked in longevity. I have some Terascale cards and you could see how immature arch was. First iteration was downright awful and once it matured, it was already quite old. Plus, there were quite big core design differences between different Terascale versions too. And those weren't just incremental improvements, but more like complete overhauls of the whole thing. Terascale 1 in last revision was good, but second version was eh. Terascale 1, rev 1 was flawed and slow. ATi was also quite behind nV in compute stuff, only 4000 series got good at that and later. And at that time ATi was awful at writing drivers, there were Omega drivers that managed to extract 20% more performance.

We had some Terascale 2 cards in my family that were good (until The Witcher 3 came out). I don't really have much experience with Terascale 1, so I'll take your word for it.

The red spirit said:
Wait, wasn't it Turing that "doubled" cores?

Nope, it was Ampere. Turing has separate INT32 and FP32 cores - a pair of which counts as a CUDA core. For example, my 2070 has 2304 INT32 and 2304 FP32 cores. If the INT32 cores could also do FP32, you could say that it has 4608 CUDA cores. A 3060 has 3584 CUDA cores, but only half of them are "full" FP32 cores. The other half are INT32/FP32 multifunctional units.

The red spirit said:
Either way, higher driver overhead and lower 1080p performance, was a bit of fail. Turing 1 couldn't ray trace fast, DLSS 1 was clearly flawed and unpleasant to use. And RTX 2080 Ti was the first consumer card to cost more than 1k dollars. It was just rough. Ampere was needed to fix it, but it still didn't solve price, RT or DLSS problems, just made them not as bad. Ampere is still rough and is disposable arch.

I tend to disagree. Driver overhead only affects you on a slow CPU and Turing's raytracing is about the same as Ampere's. Sure, Ampere does the same work with fewer RT cores, but since the performance per Watt of the whole architecture is the same (for example in a 1080-2070-3060 comparison), it doesn't really matter.

The red spirit said:
Ampere is still rough and is disposable arch. The only reason for it to exist is to be a step towards something better.

This I agree with. I'd much rather call it Turing Refresh and the chips TU2xx, to be honest.

The red spirit said:
Sort of, I still remember reading that FP performance matters in some games. nVidia FX 5000 series had fucked up floating point stuff and thus performance in games was disappointing. But those are single precision floating point operations. In the past, FP64 performance used to be intentionally crippled on consumer cards and you could flash Quadro or FirePro vBIOS to regain it back. Anyway, here's a more modern explanation of when and where FP operations are used:

The FP64 performance mostly depends on how existing FP32 are interconnected probably. AFAIK Kerbal Space Program uses some FP64.

The failure of the FX series was a much more complex story.

catulitechup said:
thanks to @W1zzard are official pci-e scaling test

It's strange that it doesn't add up with other people's findings. :confused:

Here, God of War plays the same on PCI-e 3.0. On the other hand, Metro: Exodus runs fine in W1zzard's review, but it's absolutely unplayable on my PC.

The red spirit · May 7, 2022

AusWolf said:
I have, but I can't see it being a reason for anyone to buy a graphics card. That is, you don't buy a compute capable graphics card for F@H then game on it. You buy a gaming card and then run F@H when you're not playing anything.

But when you have a choice of GTX 1060 and RX 580, those things count. Except that RX 580 was often significantly cheaper (as much as 30%), a bit faster than GTX 1060, more tweakable with only disadvantages being higher power consumption and lies about VP9 decoding support. RX 570 was often going for crazy low prices, I saw 4GB Red Devil RX 570 going for 150 EUR, when low end GTX 1060 6GB was at around 280 EUR. My own RX 580 was 209 EUR. With so many advantages over 1060, it was kind of no brainer deal, until I realized how loud my RX 580 model was due to fucked up vBIOS. Then I really started to hate it. In games it was screaming with 3000-3200 rpms, despite rather low 66C temperature. It wasn't just a bit loud, but it was completely overpowering speakers loud. Luckily I discovered vBIOS mods, so I "fixed" it, but I vowed to myself to never again buy low end models or at least low end models without reviews. I'm not RGinHD, so I can give up on some "great" deals. I thought that RGinHD was quite ignorant about noise, until Radeon showed him who's the boss:

For real, I remember those R9 290, R9 290X, R9 390, R9 390X cards being stupidly loud. They looked good, but they made you deaf. I honestly can't begin to imagine what torture it was to own R9 X2 295X. Two R9 290s on same PCB with total TDP of 500 wats with just tiny whiny 120mm AIO. It should have been named S.T.D. (Special Torture Device, forgive the pun), the AMD R9 S.T.D. XD

AusWolf said:
We had some Terascale 2 cards in my family that were good (until The Witcher 3 came out). I don't really have much experience with Terascale 1, so I'll take your word for it.

I messed a bit with Terascale 1 and 2. I had V3750 and v5800. Both woefully slow, but I was curious about Pro cards and I wanted to find out why both of them had very obviously superior 3D depth rendering than my GTX 650 Ti. I mainly said that Terascale 1 rev 1 was poor, due to Radeon 3870 failure to do anything meaningful in market. It was AMD's first fully programmable pipeline architecture (actually their very first was ATi Xenos Xbox GPU, but it was only partly programmable and half fixed pipeline), AMD promised a lot and said how fucking lit it will be, until 3870 launched and performance claims went to toilet, although programmability existed, but to me at least, it just didn't change games as much as AMD said it will. It took nV to do the same and launch CUDA to make it somewhat different. Some people at the time were a bit cross about poor performance and said that ATi would have been better off with faster fixed pipeline GPU, but frankly whole 2000 series also stank and x1000 series were also not that great. Despite 3000 series actually being a big improvement over past failures, they went against nVidia's Tesla, which was superior:

AMD later launched Radeon HD 3870 X2, but that didn't really matter much in market and soon nVidia launched their own 9800 GX2. But to be fair, nVidia was in bad place too. They weren't able to deliver Fermi fast enough and had to rebrand 8000 series several times (9000, 200 series and for low end GPUs way longer) and all the time their dies just grew, efficiency tanked and price rose and they did run hot, so much that their solder melted. The last Tesla cards had comically large dies. Once they managed to make Fermi yields good enough, they ran stupidly hot, were also big dies and complete mess. Took them rather uneventful Kepler to just fix their shit and then they started to revert back a bit with 600 series. 700 series was a good hack for a while, until cards aged awfully due to deep architectural flaws, which made their support with full performance a nightmare. They finally came clean with only 900 series, until they lied about 970's vRAM. And Pascal finally completely solved every known flaw. Until Turing launched. That's quite a long history of incompetence, but what pushed nVidia forwards were lots of gimmicks (3D surround, Gameworks, PhysX), monopolistic attitude (PhysX, tesellation sabotage) and tons of PR (the way it's meant to be played). Meanwhile AMD/ATi often had better hardware, but drivers were poor, which was a big anti-PR in reviews. Despite those being later fixed, people already had nVidia cards. Nobody cared about Finewine.

AusWolf said:
The failure of the FX series was a much more complex story.

I'm aware of that. nV also lied about actual Anti-aliasing, about FP precision, about shader capabilities. It was just really bad shit show. 3DMark had to finally kick out nV from their result charts, due to nV drivers lying about actual level of AA applied. There was a huge architectural level penalty for actually applying AA on those cards. Then bad PR about power usage and infamous OG leafblower. Some people say that its failure was mostly due to former 3dfx employees intentionally sabotaging nVidia, but my own take is that nVidia just failed to integrate such a huge ammount of employess properly and they pushed them to make FX 5000 series too fast, thus they cut corners. Cooler failure, however, was just general computer market's inexperience with higher wattage coolers. There jsut weren't any high performance coolers before and thus there was a lot of experimenting. Most coolers back then sucked. Some awfully failed to perform, some were too expensive (all copper coolers, some were loud (AMD stock cooler, CM Jet), some were just awful in engineering (fucking globe cooler, yep that existed, it literally looked like globe and I have no idea why it was supposed to ever work well). That was awful time for cooling products, you can probably find some of those thermal horrors at Maximum PC magazines at Internet Archive. There was one aluminum cooler literally made from flat bad and screws at top with fan attached. It's like those companies never had any thermal engineers and many products were made just for looks, although taste on early 2000s was questionable (Thermaltake Xaser with shit ton of cold cathode tubes, UV fans, tons of junk in 5.25 bays was the tits, bonus points for deafening noise from TT volcano 80mm fans, spider and skull decals, UV DFI board and BD-ROM drive even without HDCP capable GPU in machine). But on the other hand, today computer HW is very bland, full of black shrouds, with loads of tasteless RGB puke and just either full of huge hunks of aluminum or AIO aluminum. Now it's not just tasteless, but somehow boring too. I want my Palit mecha-frog on graphics card back! And copper everywhere! XD

AusWolf · May 7, 2022

This is hugely off-topic now, so I'll only give a short answer to some points, and leave it at that.

The red spirit said:
But when you have a choice of GTX 1060 and RX 580, those things count.

For gamers, they don't. If Card A is 10% faster in games, but Card B is 100% faster in compute, I'll still choose Card A.

The red spirit said:
Except that RX 580 was often significantly cheaper (as much as 30%), a bit faster than GTX 1060, more tweakable with only disadvantages being higher power consumption and lies about VP9 decoding support. RX 570 was often going for crazy low prices, I saw 4GB Red Devil RX 570 going for 150 EUR, when low end GTX 1060 6GB was at around 280 EUR.

What's a "low end 1060 6 GB"? Personally, my vote is on the 1060 6 GB. Based on TechPowerUp! review data, it's performance is on par with the 580, but has a significantly lower TDP, which results in smaller and quieter coolers, and smaller form factors where AMD wasn't competitive at that time. That's why the 570/580 pair became so cheap after a while.

The red spirit · May 7, 2022

AusWolf said:
What's a "low end 1060 6 GB"?

Something like GTX 1060 Ventus compared to Gaming X or Palit Dual model compared to Palit Jetstream. Basically low end cooler, VRM, shroud models, that are often loud, have poor cooling or bad power delivery.

Something as pathetic as this:

ASUS TUF GTX 1660 GAMING Specs

NVIDIA TU116, 1785 MHz, 1408 Cores, 88 TMUs, 48 ROPs, 6144 MB GDDR5, 2001 MHz, 192 bit

www.techpowerup.com

Yep, its recycled intel aluminum heatsink with two 80mm fans (instead of standard 92mm), exhaust-blocking shroud galore, no memory cooling and likely very low end and hot VRMs. Unsurprisingly, card ran hot, was failing to reach typical GTX 1660 performance and I don't think that will last a long time with such awful cooling.

My own RX 580 doesn't look low end:

PowerColor Red Dragon RX 580 OC V2 Specs

AMD Polaris 20, 1350 MHz, 2304 Cores, 144 TMUs, 32 ROPs, 8192 MB GDDR5, 2000 MHz, 256 bit

www.techpowerup.com

But Powercolour recycled same heatsink from RX 480/RX 470 (https://www.techpowerup.com/gpu-specs/powercolor-red-dragon-rx-470.b3753) and even on those cards, it was quite loud and ineffective. It was even recycled for RX 590. Yikes.

AusWolf said:
Personally, my vote is on the 1060 6 GB. Based on TechPowerUp! review data, it's performance is on par with the 580, but has a significantly lower TDP, which results in smaller and quieter coolers, and smaller form factors where AMD wasn't competitive at that time. That's why the 570/580 pair became so cheap after a while.

That is until Finewine kicked in and now RX 580 is faster than GTX 1060. But in retrospect, maybe GTX 1060 was nicer of them two. At this point it's just whichever you personally prefer more. They each have their advantages. Except RX 570. It kind of solves power usage issue, performs close to 1060 6GB and was a lot cheaper. Meanwhile, product like RX 590 really had no place. It was more expensive that GTX 1060, guzzled power and was just better yield of RX 580. It just shouldn't have existed at all. Vega cards were also inexcusable disasters due to their insane power draw and heat output. Those were sold at huge discounts quite soon. And the peak of Vega, the Radeon VII. That was a total abortion with bad drivers, terrible support from day one, high price and no competitiveness at all. AMD finally fixed Vega, but AMD never asked if the whole idea of Vega like card was sound.

Valantar · May 7, 2022

The red spirit said:
Sort of, I still remember reading that FP performance matters in some games. nVidia FX 5000 series had fucked up floating point stuff and thus performance in games was disappointing. But those are single precision floating point operations. In the past, FP64 performance used to be intentionally crippled on consumer cards and you could flash Quadro or FirePro vBIOS to regain it back. Anyway, here's a more modern explanation of when and where FP operations are used:

The FP64 performance mostly depends on how existing FP32 are interconnected probably. AFAIK Kerbal Space Program uses some FP64.

You're missing the point here, or misrepresenting what I said. FP performance is hugely important - FP32 is the basis of modern GPU operations in a broad sense. That's also what your video is about - and what all current gaming GPU architectures prioritize. But I've never said anything about FP performance. What we were talking about was FP64 performance, double precision, which is utterly meaningless for consumers. Sure, it's possible Kerbal Space Program makes some use of it - but that is also essentially a scientific simuation system masquerading as a game. The number of games or other consumer applications making meaningful use of FP64 is tiny, and has been steadily shrinking over the past decade - if anything, there's a move towards lower precision, with FP16 and INT8 growing in use across many fields, including some gaming uses.

As for saying it was "intentionally crippled" on consumer cards, I think that's a completely unreasonable point of view. It was (essentially) disabled, as it had no use for consumers, and not doing so would have caused enterprise/datacenter users to gobble up (much) cheaper consumer cards. This would not only have screwed consumers out of their GPUs, but seriously hurt GPU makers - none of them were even close to the size where they could afford developing separate architectures for compute and gaming at this point, and giving enterprise customers a way to avoid paying for the fancy drivers and support that comes with a Quadro/FirePro (which many of the larger entities at the time could most likely have developed on their own for their uses, given that end users were making custom GPU drivers for gaming). That double whammy of the loss of GPU access for gamers and loss of enterprise revenues for GPU makers could probably have killed this industry. Limiting consumer FP64 access was a necessity in light of that. And, crucially, calling it "intentionally crippled" strongly implied that the features disabled were useful or desirable for consumers, which ... well, they weren't, outside of a few very narrow edge cases. FP64 has never been important for consumer usage. It was mentioned as an architectural advantage for some architectures in reviews, but that was never an advantage with any meaningful use cases. It was just there. And that's also where using those capabilities for BOINC and the like came from - why not, when people just had the hardware sitting around? But it's ultimately no loss to the good of the world that these things are disappearing. The entities running BOINC projects likely have access to compute power of their own today that vastly outstrips what BOINC contributed 5-10 years ago. You could of course argue that if current GPUs had better FP64, they could also contribute vastly more than 5-10 years ago, but such an argument overlooks both the extremely low adoption rate (BOINC currently counts 34k volunteers with 121k computers; F@H counts ~35k GPUs and 73k CPUs) and the inherent cost to including this hardware (increased silicon area would mean larger die sizes and lower yields, more embodied energy in these larger dice, more energy and materials spent producing the same number of GPUs). Even if adding great FP64 capabilities to current consumer GPUs cost, say, a 1% increase in die area, would that cost-benefit analysis add up? If a few tens of thousands of users actually made beneficial use of that added 1% of area, would that make sense for the millions of GPUs made each generation? IMO, not even close.

The historical disabling of FP64 in consumer GPUs has only ever been a problem to those wanting a Quadro/Firepro but not willing/able to pay those prices; the removal of these features from consumer architectures makes them more efficient and fit for purpose. The former was a necessity, the latter is a significant net benefit.

The red spirit · May 7, 2022

Valantar said:
You're missing the point here, or misrepresenting what I said. FP performance is hugely important - FP32 is the basis of modern GPU operations in a broad sense. That's also what your video is about - and what all current gaming GPU architectures prioritize. But I've never said anything about FP performance. What we were talking about was FP64 performance, double precision, which is utterly meaningless for consumers. Sure, it's possible Kerbal Space Program makes some use of it - but that is also essentially a scientific simuation system masquerading as a game. The number of games or other consumer applications making meaningful use of FP64 is tiny, and has been steadily shrinking over the past decade - if anything, there's a move towards lower precision, with FP16 and INT8 growing in use across many fields, including some gaming uses.

Don't you think that it's due to its crippling?

Valantar said:
As for saying it was "intentionally crippled" on consumer cards, I think that's a completely unreasonable point of view. It was (essentially) disabled, as it had no use for consumers, and not doing so would have caused enterprise/datacenter users to gobble up (much) cheaper consumer cards.

And none of that happened. nVidia launched Tesla 2 cards with 1 8th of FP32 performance, datacenters didn't give a shit. ATi released Terascale cards with 1 5th of FP32 performance and datacenters didn't give a shit. Guys that used BOINC or Folding@Home bought quite a bit Radeons and later simple enterprise card flash gave you that "nerfed" performance back. I don't think that enterprise or big data care about consumer cards either way. nVidia soon took away more and more FP64 performance and later sold less disabled cards as Titans, until they locked them down too. AMD just screwed consumers less with lower nerfing multiplier.

Valantar said:
This would not only have screwed consumers out of their GPUs, but seriously hurt GPU makers - none of them were even close to the size where they could afford developing separate architectures for compute and gaming at this point, and giving enterprise customers a way to avoid paying for the fancy drivers and support that comes with a Quadro/FirePro (which many of the larger entities at the time could most likely have developed on their own for their uses, given that end users were making custom GPU drivers for gaming). That double whammy of the loss of GPU access for gamers and loss of enterprise revenues for GPU makers could probably have killed this industry. Limiting consumer FP64 access was a necessity in light of that.

What if you are wrong?

Valantar said:
And, crucially, calling it "intentionally crippled" strongly implied that the features disabled were useful or desirable for consumers, which ... well, they weren't, outside of a few very narrow edge cases. FP64 has never been important for consumer usage.

FP64 was super new thing too and at first was crippled with just simple vBIOS flash. You could just reflash your card and get the performance. Therefore crippled. Very similar to automakers putting heated seat hardware, but charging a fee to activate it. Same shit, different industry.

Valantar said:
It was mentioned as an architectural advantage for some architectures in reviews, but that was never an advantage with any meaningful use cases. It was just there. And that's also where using those capabilities for BOINC and the like came from - why not, when people just had the hardware sitting around? But it's ultimately no loss to the good of the world that these things are disappearing.

So crippling volunteer charity project is good, right?

Valantar said:
The entities running BOINC projects likely have access to compute power of their own today that vastly outstrips what BOINC contributed 5-10 years ago. You could of course argue that if current GPUs had better FP64, they could also contribute vastly more than 5-10 years ago, but such an argument overlooks both the extremely low adoption rate (BOINC currently counts 34k volunteers with 121k computers; F@H counts ~35k GPUs and 73k CPUs) and the inherent cost to including this hardware (increased silicon area would mean larger die sizes and lower yields, more embodied energy in these larger dice, more energy and materials spent producing the same number of GPUs). Even if adding great FP64 capabilities to current consumer GPUs cost, say, a 1% increase in die area, would that cost-benefit analysis add up? If a few tens of thousands of users actually made beneficial use of that added 1% of area, would that make sense for the millions of GPUs made each generation? IMO, not even close.

Sure you can say that, but when C19 started, Folding@Home became the most powerful supercomputer on Earth and nVidia was giving them Quadros for free. Such compute power isn't exactly affordable for researchers. The increase in die size of those features is negligible, performance is left on table, the only ones benefitting from that is nV and AMD. Considering that they gave as good FP64 as they could before in their older cards and made money from that, I fail to see how it would ruin them.

Valantar said:
The historical disabling of FP64 in consumer GPUs has only ever been a problem to those wanting a Quadro/Firepro but not willing/able to pay those prices; the removal of these features from consumer architectures makes them more efficient and fit for purpose. The former was a necessity, the latter is a significant net benefit.

Not it doesn't. Terascale is a good example of that. Beat nVidia on price, efficiency and yields with HD 4870. nVidia had no answer to that. Only cost around 200 USD and it beat 500 USD nVidia equivalents. nV and AMD just later nicked that "feature" to give more reasons why to buy their FirePros and Quadros, because otherwise those cards were just downclocked GeForces or Radeons.

Valantar · May 8, 2022

The red spirit said:
Don't you think that it's due to its crippling?

No, I think it's due to games and other consumer applications having no real use for calculations with this degree of precision. It just insn't necessary, which makes adopting it wasteful on so many levels, from development to execution. Hence why nobody did so! If there were common or relevant consumer scenarios where FP64 was a noticeable advantage, it would have been adopted. It hasn't been, despite accelerated FP64 being available on every single consumer dGPU for a decade - just at considerably lower rates than FP32 on most of them.

The red spirit said:
And none of that happened. nVidia launched Tesla 2 cards with 1 8th of FP32 performance, datacenters didn't give a shit. ATi released Terascale cards with 1 5th of FP32 performance and datacenters didn't give a shit. Guys that used BOINC or Folding@Home bought quite a bit Radeons and later simple enterprise card flash gave you that "nerfed" performance back. I don't think that enterprise or big data care about consumer cards either way. nVidia soon took away more and more FP64 performance and later sold less disabled cards as Titans, until they locked them down too. AMD just screwed consumers less with lower nerfing multiplier.

None of that happened, in part because there were no widely available high-FP64 cards outside of a few generations, and in part because FP64 is relatively niche even in HPC and enterprise. So, limiting FP64 on certain Tesla models was fine because the people needing FP64 knew not to buy those. But if you're kitting out a supercomputer or HPC cluster on a medium budget - not one of those high profile ones, but at a university or the like - don't you think a lot of those people would have jumped at the opportunity to get the same performance at a quarter of the price or less? Because they would have - and they would have gobbled up GPUs by the thousands, if not more.

The red spirit said:
What if you are wrong?

Ah, yes, what if I was wrong, and we would have seen ... a continuation of cards with features nobody makes use of because they have no tangible benefits and a lot of downsides, alongside increase production costs, materials costs, embodied energy and die size? Yes, what a terrible possibility to have avoided.

Seriously, what are the lost benefits you're alluding to here? Games doing calculations at a level of precision that has zero benefit to them? Consumer software adoption FP64, despite not seeing any benefits from it? Those levels of performance simply aren't necessary for these applications.

The red spirit said:
FP64 was super new thing too and at first was crippled with just simple vBIOS flash. You could just reflash your card and get the performance. Therefore crippled. Very similar to automakers putting heated seat hardware, but charging a fee to activate it. Same shit, different industry.

Except that heated seats benefit essentially everyone (it can get cold no matter where you are on the planet), while high performance FP64 has no benefits whatsoever for the vast majority of users. Which is the point I was making that you seem to have missed entirely: it's unreasonable to say it was "crippled" (besides the point of using dumb ableist language, of course) because the disabled functionality had no use. Crippling implies a loss of useful functionality, so when the functionality had no meaningful use, the term loses its meaning. If you break your leg, you lose the ability to walk. This is not equivalent to that - it's equivalent to breaking a vestigial limb that has no use except for some really weird niche thing that essentially nobody does.

The red spirit said:
So crippling volunteer charity project is good, right?

... Sigh. I mean, you do love your bad-faith arguing tactics, don't you? "You like to kick puppies, right?" Can we please at least try to discuss things the other person has actually said, and not project our own nonsensical straw man figures onto them? That would be nice.

It's not being crippled - the basis upon which it was based, an inherent excess in an otherwise unrelated product, ceased to exist. Whether this is bad or not is a) essentially irrelevant in this context, and b) was addressed in the previous post. When people no longer have excess resources to contribute, the basis for the contribution disappears. That's not necessarily either good nor bad, it just is a thing that happened. I mean, personally, I would much rather live in a world where we didn't need volunteer charity project to contribute to important medical research, but that's how the world works under capitalism. Arguing for a wasteful and unnecessary feature to be included in mass-market products because a tiny fraction of people make use of them for good is ... well, just woefully inefficient. Those people could rather just donate money to the relevant causes instead - that would be far more efficient.

The red spirit said:
Sure you can say that, but when C19 started, Folding@Home became the most powerful supercomputer on Earth and nVidia was giving them Quadros for free. Such compute power isn't exactly affordable for researchers.

Well aware of that. But ... that rough cost-benefit analysis I did in the post you're responding to addressed this. Specifically. It just wouldn't be worth it. Not whatsoever. The proportion of people contributing to these projects is so damn small that literally any increase in resource expenditures in order to implement it broadly would be a waste.

Also, Nvidia giving away Quadros to the project kind of indicates that ... well, there are far more efficient ways of doing these types of things than moderately wealthy people (i.e. owners of powerful GPUs) "generously" sharing things they aren't making use of anyhow. Not to undermine the contributions of people doing this - it's definitely good that they do - but this is a woefully inefficient way of solving anything at all.

Let's do some simple math:
- In Q3 of 2021, 12.1 million desktop GPUs were shipped. Let's base ourselves on ~10m/quarter, as a nice round number that is around the average per-quarter sales in the graph from that link.
- Assuming every PC contributing to BOINC has two dGPUs (which is definitely more than the real number), the best-case scenario for GPUs contributing to BOINC and F@H combined are ~280k. But let's assume that we're at a historical low point, and let's (almost) quadruple that to a nice, round million GPUs contributing to these projects at all times.
- Of course it is extremely unlikely that those GPUs are all new. Most likely most of them are at least a year old. But, for the sake of simplicity, let's assume that every one of those GPUs is brand-new.
Those GPUs? They represent 0,1% of the GPU sales of a single quarter. Over a year, that's 0,025%.
- A 3-5 year replacement rate for GPUs is far more reasonable. So let's assume every BOINC and F@H GPU was bought during the past three years (which, IMO, is likely an optimistic estimate - 5 years is far more likely, though I'm sure even older hardware is still contributing). In that context, our optimistic estimate of all contributing GPUs represents 0,008333% of all GPUs sold globally during those three years.

So, if keeping high FP64 capabilities meant a 1% die size increase, or even a 0,1%, or even 0,001% die size increase, that would be a massive waste, as those die size increases would mean linear or higher-than-linear increases in waste, energy and raw material consumption, which would be completely wasted on the overwhelming majority of GPUs. 99,991777% of those extra materials and that extra energy would most likely never provide any benefits to anyone or anything, ever.

It would be cheaper and far less wasteful for both GPU makers and the world in general to just give a few HPC GPUs to these projects than to prioritize FP64 in consumer GPUs.

The red spirit said:
The increase in die size of those features is negligible, performance is left on table, the only ones benefitting from that is nV and AMD. Considering that they gave as good FP64 as they could before in their older cards and made money from that, I fail to see how it would ruin them.

Negligible per single die? Sure. Did you get some other impresison from my given example of 1% in the post you quoted? But what performance is left on the table? None at all in scenarios where FP64 isn't used or doesn't provide any benefit. You keep arguing as if FP64 has practical uses for consumers - it doesn't.

They didn't restrict FP64 in consumer cards early on as the actual real-world uses of FP64 had yet to crystallize, and they thus left the door open for new use cases. As it became apparent that the only real use case for it is in extreme high precision scientific calculations, they started segmenting their product lines accordingly, ensuring that the extra QC, software development, and support required to service highly demanding enterprise and scientific customers were actually paid for.

The red spirit said:
Not it doesn't. Terascale is a good example of that. Beat nVidia on price, efficiency and yields with HD 4870. nVidia had no answer to that. Only cost around 200 USD and it beat 500 USD nVidia equivalents. nV and AMD just later nicked that "feature" to give more reasons why to buy their FirePros and Quadros, because otherwise those cards were just downclocked GeForces or Radeons.

... that literally does not relate to what I was saying in any way, shape or form. I was talking about the development of GPU architectures over time, not comparisons of individual GPU architectures. Arbitrarily picking a point in time to demonstrate some kind of point is meaningless in this context, as I'm not talking about individual architectures at all, but design choices and factors affecting those, and how this in turn affects products down the line. Removing high-performance FP64 as an important consideration for gaming GPUs frees up developmental resources to focus on other, more useful features.

Consumers don't need, and have never needed FP64. It is also extremely likely that they never will, as game development, if anything, is moving more towards FP16 and INT8 than it is towards FP64. Prioritizing FP64 hardware in products where it's actually likely to be made use of is thus the sensible approach. Stripping it down to a functional minium where it isn't important is also the most sensible approach.

YESSS · May 8, 2022

ModEl4 said:
Lol what a dud, in a PCI-express 3 system should be -10%, so RX570 is going to be around 30% faster in 1080p.
Why in the early design stages AMD team thought that this is going to be acceptable for the desktop segment I don't know. It would be preferable not to launch a desktop Navi 24 model at all, and have only mobile solutions.
The performance will not be so easily tracked, no PCI-express 3 deficit , you would have the APU option for encoding and less negativity all around, now even the mobile suffers from all this negativity potentially influencing the future OEM RX6400 based contracts.

To be fair, this stupid PCIe 3.0 limitation is a PITA, indeed! Should be at least x8 or even better x16 for PCIe 3.0 mode. When I built my current rig (Autumn 2021) I just didn't have enough funds to buy an 11th gen. Intel CPU, so I went for the possibly 'cheapest' 10400F. Although my motherboard would support PCIe 4.0, it only does it with an 11th gen. CPU. FYI I had to rebuild my rig because of a fire accident, so this was a fairly expensive process back then for me. Yeah, and in my country Hungary we do have that horrendous 27% of VAT, so I think, my rig will stay that way. Thanks for reading.

The red spirit · May 8, 2022

Valantar said:
None of that happened, in part because there were no widely available high-FP64 cards outside of a few generations, and in part because FP64 is relatively niche even in HPC and enterprise. So, limiting FP64 on certain Tesla models was fine because the people needing FP64 knew not to buy those. But if you're kitting out a supercomputer or HPC cluster on a medium budget - not one of those high profile ones, but at a university or the like - don't you think a lot of those people would have jumped at the opportunity to get the same performance at a quarter of the price or less? Because they would have - and they would have gobbled up GPUs by the thousands, if not more.

Those people bought PS3 for supercomputer too. They had to do some hacking to make it work. Buying a ready to use GPU is nothing for them. The thing is they just didn't do that.

Valantar said:
Ah, yes, what if I was wrong, and we would have seen ... a continuation of cards with features nobody makes use of because they have no tangible benefits and a lot of downsides, alongside increase production costs, materials costs, embodied energy and die size? Yes, what a terrible possibility to have avoided.

That certainly wouldn't be as complex as you say, neither as expensive. Intel literally added AVX-512, that even less people used with actually big cost of die space and it didn't take off, but it didn't impact cost of chips much either.

Valantar said:
Seriously, what are the lost benefits you're alluding to here? Games doing calculations at a level of precision that has zero benefit to them? Consumer software adoption FP64, despite not seeing any benefits from it? Those levels of performance simply aren't necessary for these applications.

Potential further software development that can utilize FP64. I read more about FP64 and it may have been useful in physics simulations. Which is becoming more of a thing in games and already sort of was in games (PhysX) before. There is deep learning, which could utilize FP64 capabilities, considering the push to DL cores on nVidia side, it might get relevant (DL ASIC already supports FP64, but I mean integration into GPU die itself without ASIC).

Valantar said:
It's not being crippled - the basis upon which it was based, an inherent excess in an otherwise unrelated product, ceased to exist. Whether this is bad or not is a) essentially irrelevant in this context, and b) was addressed in the previous post. When people no longer have excess resources to contribute, the basis for the contribution disappears. That's not necessarily either good nor bad, it just is a thing that happened. I mean, personally, I would much rather live in a world where we didn't need volunteer charity project to contribute to important medical research, but that's how the world works under capitalism. Arguing for a wasteful and unnecessary feature to be included in mass-market products because a tiny fraction of people make use of them for good is ... well, just woefully inefficient. Those people could rather just donate money to the relevant causes instead - that would be far more efficient.

It's literally the same as crippling hash rate of current cards.

Valantar said:
Let's do some simple math:
- In Q3 of 2021, 12.1 million desktop GPUs were shipped. Let's base ourselves on ~10m/quarter, as a nice round number that is around the average per-quarter sales in the graph from that link.
- Assuming every PC contributing to BOINC has two dGPUs (which is definitely more than the real number), the best-case scenario for GPUs contributing to BOINC and F@H combined are ~280k. But let's assume that we're at a historical low point, and let's (almost) quadruple that to a nice, round million GPUs contributing to these projects at all times.
- Of course it is extremely unlikely that those GPUs are all new. Most likely most of them are at least a year old. But, for the sake of simplicity, let's assume that every one of those GPUs is brand-new.
Those GPUs? They represent 0,1% of the GPU sales of a single quarter. Over a year, that's 0,025%.
- A 3-5 year replacement rate for GPUs is far more reasonable. So let's assume every BOINC and F@H GPU was bought during the past three years (which, IMO, is likely an optimistic estimate - 5 years is far more likely, though I'm sure even older hardware is still contributing). In that context, our optimistic estimate of all contributing GPUs represents 0,008333% of all GPUs sold globally during those three years.

So, if keeping high FP64 capabilities meant a 1% die size increase, or even a 0,1%, or even 0,001% die size increase, that would be a massive waste, as those die size increases would mean linear or higher-than-linear increases in waste, energy and raw material consumption, which would be completely wasted on the overwhelming majority of GPUs. 99,991777% of those extra materials and that extra energy would most likely never provide any benefits to anyone or anything, ever.

It would be cheaper and far less wasteful for both GPU makers and the world in general to just give a few HPC GPUs to these projects than to prioritize FP64 in consumer GPUs.

It's not even a die space issue. It's all about interconnecting already existing FP cores. Perhaps benefits are small, but if you are already making a GPU, it's stupidly easy to add.

Valantar said:
Negligible per single die? Sure. Did you get some other impresison from my given example of 1% in the post you quoted? But what performance is left on the table? None at all in scenarios where FP64 isn't used or doesn't provide any benefit. You keep arguing as if FP64 has practical uses for consumers - it doesn't.

I'm arguing, that it's way lower than 1% and that it's just wasteful not to do it as it takes nearly no effort for GPU makers to make it. perhaps not often used or whatever, but pointless to cut off too.

Valantar said:
They didn't restrict FP64 in consumer cards early on as the actual real-world uses of FP64 had yet to crystallize, and they thus left the door open for new use cases. As it became apparent that the only real use case for it is in extreme high precision scientific calculations, they started segmenting their product lines accordingly, ensuring that the extra QC, software development, and support required to service highly demanding enterprise and scientific customers were actually paid for.

And do you honestly think that it's enough to give just a few years for new technology to take off? You could basically argue that CUDA was sort of useless too using exactly the same argument.

Valantar said:
... that literally does not relate to what I was saying in any way, shape or form. I was talking about the development of GPU architectures over time, not comparisons of individual GPU architectures. Arbitrarily picking a point in time to demonstrate some kind of point is meaningless in this context, as I'm not talking about individual architectures at all, but design choices and factors affecting those, and how this in turn affects products down the line. Removing high-performance FP64 as an important consideration for gaming GPUs frees up developmental resources to focus on other, more useful features.

My point was that those cards could be made stupidly cheap and FP64 capabilities don't have any significant effect on price of end product. Meanwhile their competitor made far worse mistakes, that weren't even FP64 related and those cost them way more.

AusWolf · May 8, 2022

The red spirit said:
Something like GTX 1060 Ventus compared to Gaming X or Palit Dual model compared to Palit Jetstream. Basically low end cooler, VRM, shroud models, that are often loud, have poor cooling or bad power delivery.

Something as pathetic as this:

ASUS TUF GTX 1660 GAMING Specs

NVIDIA TU116, 1785 MHz, 1408 Cores, 88 TMUs, 48 ROPs, 6144 MB GDDR5, 2001 MHz, 192 bit

www.techpowerup.com

Yep, its recycled intel aluminum heatsink with two 80mm fans (instead of standard 92mm), exhaust-blocking shroud galore, no memory cooling and likely very low end and hot VRMs. Unsurprisingly, card ran hot, was failing to reach typical GTX 1660 performance and I don't think that will last a long time with such awful cooling.

Ah, I see! Personally, I avoid those cards like the plague.

The red spirit said:
That is until Finewine kicked in and now RX 580 is faster than GTX 1060. But in retrospect, maybe GTX 1060 was nicer of them two. At this point it's just whichever you personally prefer more. They each have their advantages. Except RX 570. It kind of solves power usage issue, performs close to 1060 6GB and was a lot cheaper. Meanwhile, product like RX 590 really had no place. It was more expensive that GTX 1060, guzzled power and was just better yield of RX 580. It just shouldn't have existed at all. Vega cards were also inexcusable disasters due to their insane power draw and heat output. Those were sold at huge discounts quite soon. And the peak of Vega, the Radeon VII. That was a total abortion with bad drivers, terrible support from day one, high price and no competitiveness at all. AMD finally fixed Vega, but AMD never asked if the whole idea of Vega like card was sound.

The problem is that you can't sell a product based on a vague assumption that it will probably be better than the competition 2-3 years down the line, so in my opinion, the "fine wine" argument is moot - it's just another AMD-style marketing BS that they threw in to attract customers because they had nothing else at the time of release.

Anyway, the 580's TDP is 54% higher than that of the 1060, so the fact that newer drivers helped it isn't much of a reason to celebrate. Maybe it could match the 1060 when new, maybe it can surpass it in some titles with new drivers, but it's still a highly inefficient GPU. It's only advantage right now is the fact that it's relatively common and cheap on the used market, but personally, I'd still rather have a 1060 because better efficiency usually means less noise when paired with a fairly decent cooler.

The red spirit said:
Don't you think that it's due to its crippling?

If by "crippling" you mean not spending time and resource to include a feature that no one ever needs, then I guess that's what @Valantar meant.

The red spirit said:
That certainly wouldn't be as complex as you say, neither as expensive. Intel literally added AVX-512, that even less people used with actually big cost of die space and it didn't take off, but it didn't impact cost of chips much either.

It did cause problems on Alder Lake however, and instead of trying to fix it, Intel just disabled it (it's funny that Rocket Lake still has it). That says something about costs vs gains on a corporate level.

The red spirit said:
I'm arguing, that it's way lower than 1% and that it's just wasteful not to do it as it takes nearly no effort for GPU makers to make it. perhaps not often used or whatever, but pointless to cut off too.

"Cutting off" and "not including" are two different things. Yes, it takes effort to cut parts of a chip that are already there. But it also takes effort to design a more complex chip for a single feature. If they know that the target audience won't use that feature, then there's no point designing the chip around it - that is, it's easier and cheaper to design a smaller, simpler chip. That way, you're not "cutting" anything, because the parts you would have to cut aren't even there to start with.

The red spirit said:
So crippling volunteer charity project is good, right?

I think you misunderstand the point of these charity projects, which is to use the idle time of your gaming card to run calculations in the background. They're not meant to be the ultimate goal and purpose for Random Joe to buy a new GPU. You're meant to donate whatever processing power you already have - you're not meant to upgrade your system just so that you can donate more. Thus, designing a new GPU around something that is never meant to be its main purpose, and only maybe 1% of the target audience will ever use, is wasteful and pointless.

Heck, I used to run F@H on my GT 1030 simply because it only eats 30 Watts. Sure, it took 2-2.5 days to finish a work block, but who cares? A donation is a donation, however small it may be.

YESSS · May 8, 2022

There's also another issue with Asus TUF cards not mentioned by anybody: they are slightly (about 20-40 mm) longer, than their non-TUF versions. Just enough, to NOT fit in my Apevia case. However, the regular DUAL version of RX 6500XT which I bought, fits just fine...

The red spirit · May 8, 2022

AusWolf said:
Ah, I see! Personally, I avoid those cards like the plague.

Asus TUF card was a shitty deal for sure, but PowerColor RX 580 frankly looked every bit as decent as any other RX 580 just without crazy gamer plastics. Turns out it was a trap.

AusWolf said:
The problem is that you can't sell a product based on a vague assumption that it will probably be better than the competition 2-3 years down the line, so in my opinion, the "fine wine" argument is moot - it's just another AMD-style marketing BS that they threw in to attract customers because they had nothing else at the time of release.

It's actually fan made term. AMD never officially acknowledged it, other than once saying that they really don't like it and would prefer all performance on day one.

AusWolf said:
Anyway, the 580's TDP is 54% higher than that of the 1060, so the fact that newer drivers helped it isn't much of a reason to celebrate. Maybe it could match the 1060 when new, maybe it can surpass it in some titles with new drivers, but it's still a highly inefficient GPU. It's only advantage right now is the fact that it's relatively common and cheap on the used market, but personally, I'd still rather have a 1060 because better efficiency usually means less noise when paired with a fairly decent cooler.

fair enough, but there's stated TDP and real power use. RX 580 uses as much power as GTX 1080 in tests.

AusWolf said:
If by "crippling" you mean not spending time and resource to include a feature that no one ever needs, then I guess that's what @Valantar meant.

Time spent on it would be negligible. Few traces to connect existing FP32 cores and voila.

AusWolf said:
It did cause problems on Alder Lake however, and instead of trying to fix it, Intel just disabled it (it's funny that Rocket Lake still has it). That says something about costs vs gains on a corporate level.

Yeah, but that was obviously poor idea and took up die space. FP64 is neither.

AusWolf said:
"Cutting off" and "not including" are two different things. Yes, it takes effort to cut parts of a chip that are already there. But it also takes effort to design a more complex chip for a single feature. If they know that the target audience won't use that feature, then there's no point designing the chip around it - that is, it's easier and cheaper to design a smaller, simpler chip. That way, you're not "cutting" anything, because the parts you would have to cut aren't even there to start with.

Except they design basically whole architecture and for both gaming cards, as well as datacenter or enterprise cards. And no, parts are there. At first functionality was disabled in vBIOS, but later it was fused off, meaning that they put more effort to remove it, than to just give it. Thus crippling. That's why I say that it's literally the same as LHR cards. Capabilities are just there, but now nV goes extra mile to sell you CMPs.

AusWolf said:
I think you misunderstand the point of these charity projects, which is to use the idle time of your gaming card to run calculations in the background. They're not meant to be the ultimate goal and purpose for Random Joe to buy a new GPU. You're meant to donate whatever processing power you already have - you're not meant to upgrade your system just so that you can donate more. Thus, designing a new GPU around something that is never meant to be its main purpose, and only maybe 1% of the target audience will ever use, is wasteful and pointless.

Except, if you want to get more serious about it and actually have some impact, then it starts to matter. And if more FP64 performance was left on table, then with same effort, you could do more with your idle GPU.

AusWolf said:
Heck, I used to run F@H on my GT 1030 simply because it only eats 30 Watts. Sure, it took 2-2.5 days to finish a work block, but who cares? A donation is a donation, however small it may be.

At that point it's borderline waste of electricity. It would make more sense for you to just contribute to WCG project on BOINC. You would achieve more in way shorter time with your CPU.

YESSS · May 8, 2022

The red spirit said:
Asus TUF card was a shitty deal for sure, but PowerColor RX 580 frankly looked every bit as decent as any other RX 580 just without crazy gamer plastics. Turns out it was a trap.

Here in Hungary both brands are overpriced. Powercolor is OK, IMHO. I have had several in my earlier rigs--no issues, at all! ASUS is even more expensive, simply because they (or the retailer, or even both) do like to charge a premium, just because a sticker with the name 'Asus' is slapped on them--and, it's working!

AusWolf · May 8, 2022

The red spirit said:
It's actually fan made term. AMD never officially acknowledged it, other than once saying that they really don't like it and would prefer all performance on day one.

Fair enough. All the more reason not to give into it.

The red spirit said:
fair enough, but there's stated TDP and real power use. RX 580 uses as much power as GTX 1080 in tests.

That makes it even worse. A card that eats as much power as a 1080, but performs at 1060 levels. I remember my 5700 XT that had a TDP of 220 Watts - but that only meant chip power draw, while my 2070 needs 175 Watts for the whole board, and offers the similar performance.

The red spirit said:
Time spent on it would be negligible. Few traces to connect existing FP32 cores and voila.

It would have increased complexity in both design and manufacturing.

The red spirit said:
Except they design basically whole architecture and for both gaming cards, as well as datacenter or enterprise cards. And no, parts are there. At first functionality was disabled in vBIOS, but later it was fused off, meaning that they put more effort to remove it, than to just give it. Thus crippling. That's why I say that it's literally the same as LHR cards. Capabilities are just there, but now nV goes extra mile to sell you CMPs.

So you're saying that FP64 capability is there in Navi 24 by design, just fused off? Do you have a source on this?

I don't think it's the same as LHR and CMP cards. Those are just a cash grab to milk gamers and miners as much as possible. FP64 isn't needed in a low-end GPU.

The red spirit said:
Except, if you want to get more serious about it and actually have some impact, then it starts to matter. And if more FP64 performance was left on table, then with same effort, you could do more with your idle GPU.

If you want to get serious about it, then you buy a server farm, or order a supercomputer from IBM or Nvidia. Running F@H faster on the low-end GPU in your home PC doesn't make a difference.

The red spirit said:
At that point it's borderline waste of electricity. It would make more sense for you to just contribute to WCG project on BOINC. You would achieve more in way shorter time with your CPU.

I agree, though electricity isn't free. Running the 1030 on full power 24/7 is one thing, doing the same with my main gaming PC would be another.

The red spirit · May 8, 2022

AusWolf said:
That makes it even worse. A card that eats as much power as a 1080, but performs at 1060 levels. I remember my 5700 XT that had a TDP of 220 Watts - but that only meant chip power draw, while my 2070 needs 175 Watts for the whole board, and offers the similar performance.

Sort of. It's inconsistently at such power draw. TDP for AMD is more like average, not maximum value. At least this is how I understand it from vBIOS values available. The only hard limits are TDC and maybe peak power draw. I haven't done this in a while. Anyway, there's a TPU review of Sapphire card:

Sapphire Radeon RX 580 Nitro+ Limited Edition 8 GB Review

The Sapphire RX 580 Nitro+ Limited Edition is a highly overclocked custom variant of the just-launched AMD Radeon RX 580. Performance now beats the NVIDIA GTX 1060, and in the box, Sapphire has bundled two user-replaceable semi-transparent fans with blue LEDs, if you want a little bit more bling.

www.techpowerup.com

Fail on efficiency front. It's even worse than I remember it. Literally eats more power than 1080. It seems that public didn't really know about that and just thought that Polaris is good. You know what, 1060 was better card, but Polaris for once was somewhat competitive.

AusWolf said:
So you're saying that FP64 capability is there in Navi 24 by design, just fused off? Do you have a source on this?

Not on Navi24 specifically, in the past it was, but now Pro cards have full performance FP64 fused off. The last cards with artificial segmentation were Radeon Hawaii cards (R9 290 and Pro equivalent)

AusWolf said:
I don't think it's the same as LHR and CMP cards. Those are just a cash grab to milk gamers and miners as much as possible. FP64 isn't needed in a low-end GPU.

But GPU makers in the past attempted to milk people for that too. You want good FP64 performance? Bin Radeon, get FirePro. They did this until FP64 became obscure. And of course, for same hardware with FirePro title, you pay times more money.

AusWolf said:
If you want to get serious about it, then you buy a server farm, or order a supercomputer from IBM or Nvidia. Running F@H faster on the low-end GPU in your home PC doesn't make a difference.

having more GPUs with 1:16 FP64 ration won't help you any. You can compare this Navi monstrosity (https://www.techpowerup.com/gpu-specs/radeon-pro-w6800x-duo.c3824) with quite old GCN proper card (https://www.techpowerup.com/gpu-specs/firepro-w9100.c2562). 8 year old card smokes dual GPU recent card. That's ridiculous. It's even worse if you compare that old FirePro to RTX A6000. Honestly, this sucks for enterprises that actually use FP64 often as they are stuck using old hardware or downgrading to inferior new product. It's so sad, that some basement dweller who got his 7970 can still outperform brand new Quadro or Radeon Pro cards. I really hope that IBM has their own hardware for FP64, nVidia seems to be lost cause.

AusWolf said:
I agree, though electricity isn't free. Running the 1030 on full power 24/7 is one thing, doing the same with my main gaming PC would be another.

Ryzen 3100 would beat it. That's your HTPC1.

Valantar · May 9, 2022

The red spirit said:
Those people bought PS3 for supercomputer too. They had to do some hacking to make it work. Buying a ready to use GPU is nothing for them. The thing is they just didn't do that.

... so you're admitting that delimiting this feature to enterprise products only was effective? Thanks! That's what I've been saying all along.

After all, the only difference between those two is the respective difficulty of running custom software on a PS3 vs. a custom BIOS or driver unlocking disabled FP64 capabilities on the GPUs in question. So ... this demonstates the effectiveness of the GPU makers' strategy.

The red spirit said:
That certainly wouldn't be as complex as you say, neither as expensive. Intel literally added AVX-512, that even less people used with actually big cost of die space and it didn't take off, but it didn't impact cost of chips much either.

... and we're seeing exactly the same movement of it having a test run of "open" availability, where use cases are explored and identified, before the hardware is then disabled (and likely removed in the future) from implementations where it doesn't make much sense. And, of course, AVX-512 has seen far more widespread usage in consumer facing applications than FP64 compute ever did, yet it's still being disabled. Of course that is largely attributable to that much higher die area requirement you mention, which significantly raises the threshold for relative usefulness vs. keeping it around. So while AVX-512 has far more consumer utility than FP64, it is still moving towards only being included in enterprise hardware where someone is willing to pay for the cost of keeping it around.

The red spirit said:
Potential further software development that can utilize FP64. I read more about FP64 and it may have been useful in physics simulations. Which is becoming more of a thing in games and already sort of was in games (PhysX) before. There is deep learning, which could utilize FP64 capabilities, considering the push to DL cores on nVidia side, it might get relevant (DL ASIC already supports FP64, but I mean integration into GPU die itself without ASIC).

"Potential" - but that didn't come to pass in a decade, with half that time having widespread availability of high performance implementations? Yeah, sorry, that is plenty of time to explore whether this has merit. And the games industry has concluded that the vast, overwhelming majority of games do not at all benefit from that degree of precision in its simulations, and will do just fine with FP32 in the same scenarios. The precision just isn't necessary, which makes programming for it wasteful and inefficient.

Also, deep learning is moving the exact opposite way: utilizing lower precision calculations - FP16, INT8, and various packed versions of those operations. FP64 does have some deep learning-related use, mainly in training models. But that's a crucial difference: training models isn't something any consumer is likely to do with any frequency at a scale where such acceleration is really necessary. Running those models is what end users are likely to do, in which case FP64 is completely useless.

The red spirit said:
It's literally the same as crippling hash rate of current cards.

That's actually a good example, yes. LHR cards purposely delimit a specific subset of functionality that is of very little use to most gamers in order to avoid gamer-facing products being gobbled up by far wealthier entities seeking to use them for other purposes. Is this supposed to be an argument for this somehow being bad? It's not been even close to as effective (in part because of crypto being only about profit unlike scientific endeavors which generally don't care about profitability, and instead focus on quality, reliability and reproducible results), but that is entirely besides the point. All this does is exemplify why this type of market segmentation generally is beneficial to end users.

(And no, cryptomining is not beneficial to the average gamer - not whatsoever. It's a means for the already wealthy to entrench their wealth, with a tiny roster of exceptions that give it a veneer of "democratization" that is easily disproven.)

The red spirit said:
It's not even a die space issue. It's all about interconnecting already existing FP cores. Perhaps benefits are small, but if you are already making a GPU, it's stupidly easy to add.

Ah, so you're an ASIC design engineer? Fascinating! I'd love to hear you explain how you create those zero-area interconnect structures between features. Do they not use ... wires? Logic? 'Cause last I checked, those things take up die area.

The red spirit said:
I'm arguing, that it's way lower than 1% and that it's just wasteful not to do it as it takes nearly no effort for GPU makers to make it. perhaps not often used or whatever, but pointless to cut off too.

It is the exact opposite of pointless - it has a clear point: design efficiency and simplicity. Bringing forward a feature that nobody uses into a new die design makes that design needlessly more complex, which drives up costs, likely harms energy efficiency, increases die sizes - and to no benefit whatsoever, as the feature isn't used. Why would anyone do that?

Putting it another way: allowing for paired FP32 cores to work as a single FP64 core requires specific structures and interconnects between these cores, and thus reduces design freedom - it puts constraints on how you design your FP32 cores, it puts constraints on how you place them relative to each other, it puts constraints on how these connect to intra-core fabrics, caches, and more. Removing this requirement - keeping FP64 at a high level of acceleration - thus increases design freedom and flexibility, allowing for a broader range of design creativity and thus a higher likelihood of higher performing and more efficient designs overall. In addition to this, it saves die area through leaving out specific physical features. Not a lot, but still enough to also matter over the tens of millions of GPUs produced each year.

How long after the electric starter motor was first designed and implemented did car makers keep putting hand cranks on the front of their cars? They disappeared within about a decade. Why? Because they just weren't needed. The exception was 4WD cars and trucks, which might see use in locations where the failure of an electric starter could leave you stranded - and hand cranks were found in these segments up until at least the 1970s. Again: features of a design are typically only brought forward when they have a clear usefulness; if not, they are discarded and left behind. That is sensible and efficient design, which allows for better fit-for-purpose designs, which are in turn more efficient designs, avoiding feature bloat. You keep a feature where it has a use; you leave it behind where it falls out of use or is demonstrated to not be used.

The red spirit said:
And do you honestly think that it's enough to give just a few years for new technology to take off?

Pretty much, yes. FP64 and its uses were relatively well known when it started being implemented in GPU hardware. Half a decade or so to figure out just how those uses would pan out in reality was plenty.

The red spirit said:
You could basically argue that CUDA was sort of useless too using exactly the same argument.

What? CUDA saw relatively rapid adoption, despite being entirely novel (unlike the concept of FP64). This was of course significantly helped along by Nvidia pouring money into CUDA development efforts, but it was still proven to have concrete benefits across a wide range of applications in a relateively short span of time. It is also still used across a wide range of consumer-facing applications, even if they are somewhat specialized. CUDA has far wider applicability than FP64.

The red spirit said:
My point was that those cards could be made stupidly cheap and FP64 capabilities don't have any significant effect on price of end product. Meanwhile their competitor made far worse mistakes, that weren't even FP64 related and those cost them way more.

Again: that argument is entirely irrelevant to the point I was making. That a specific architecture at a specific point in time, which is inherently bound up in the specific and concrete realities of that spatiotemporal setting (economics, available lithographic nodes, competitive situations, technology adoption, etc.), doesn't say anything whatsoever that is generalizeable about future uses or implementations of specific traits of those architectures. Your entire argument here boils down to "it was easy once, so it will always be easy", which is a complete logical fallacy. It just isn't true. That it was easy (which is also up for debate - making it work definitely had costs and took time) doesn't say anything about its future usefulness, the complexity of maintaining that feature in future designs, or the extra effort involved in doing so (and on the reverse side, the lower complexity of cutting out this unused feature and simplifying designs towards actually used and useful features). Things change. Features lose relevance, or are proven to never have been relevant. Design goals change. Design contingencies and interdependencies favor reducing complexity where possible.

AusWolf said:
If by "crippling" you mean not spending time and resource to include a feature that no one ever needs, then I guess that's what @Valantar meant.

Exactly what I was saying. "Crippling" means removing or disabling something useful. FP64 isn't useful to consumers or regular end users in any appreciable way.

The red spirit said:
Time spent on it would be negligible. Few traces to connect existing FP32 cores and voila.

That is a gross misrepresentation of the complexity of designing and implementing a feature like this, and frankly quite disrespectful in its dismissiveness of the skills and knowledge of the people doing these designs.

The red spirit said:
Except they design basically whole architecture and for both gaming cards, as well as datacenter or enterprise cards. And no, parts are there. At first functionality was disabled in vBIOS, but later it was fused off, meaning that they put more effort to remove it, than to just give it. Thus crippling. That's why I say that it's literally the same as LHR cards. Capabilities are just there, but now nV goes extra mile to sell you CMPs.

That used to be true, but no longer is. Low end enterprise cards still use consumer architectures, as they are (relatively) low cost and still don't need those specific features (like FP64). High performance enterprise cards now use specific architectures (CDNA) or very different subsets of architectures (Nvidia's XX100 dice, which are increasingly architecturally different from every other die in the same generation). And this just demonstrates how maintaining these features across designs is anything buyt the trivial matter you're presenting it as. If it was trivial, and there was any profit to be made from it, they would keep it around across all designs. Instead, it's being phased out, due to design costs, implementation costs, and lack of use (=profitability).

The red spirit said:
Except, if you want to get more serious about it and actually have some impact, then it starts to matter. And if more FP64 performance was left on table, then with same effort, you could do more with your idle GPU.

Except this fundamentally undermines the idea of these charities. They were invented at a time when CPUs and GPUs didn't turbo or clock down meaningfully at idle, i.e. when they consumed just about the same amount of power no matter what. Making use of those wasted processing cycles thus made sense - otherwise you were just burning power for no use. Now, they instead cause CPUs and GPUs to boost higher, burn more power, to do this work. This by itself already significantly undermines the core argument for these undertakings.

All the while, their relative usefulness is dropping as enterprise and scientific calculations grow ever more demanding and specialized, and the hardware changes to match. You are effectively arguing that all hardware should be made equal, because a small subset of users would then donate the unused performance to a good cause. This argument is absurd on its face, as it is arguing for massive waste because of some of it not being wasted. That's like arguing that all cars should have 200+ bhp, 4WD, differential locking and low gear modes because a tiny subset of people use them for offroading. Just because a feature has been invented and proven to work in a niche use case is not an argument for implementing it broadly, nor is it an argument for keeping it implemented across generations of a product if it is demonstrated to not be made use of. The lack of use is in fact an explicit argument for not including it.

There is nothing inherently wrong with FP64 being phased out of consumer products. It is not "crippling", it is the selective removal of an un(der)utilized feature for the sake of design efficiency. This has taken a while, mainly because up until very recently neither GPU maker has had the means to produce separate datacenter and consumer architectures. This has changed, and now both do (though in different ways). And this is perfectly fine. Nobody is losing anything significant from it. That a few thousand people are losing the ability to combine their love for running home servers and workstation with contributing to charity is not a major loss in this context. Heck, if they were serious about the charity, they could just donate the same amount of money to relevant causes, where it would most likely be put to far better and more efficient use than through donating compute power.

AusWolf · May 9, 2022

The red spirit said:
But GPU makers in the past attempted to milk people for that too. You want good FP64 performance? Bin Radeon, get FirePro. They did this until FP64 became obscure. And of course, for same hardware with FirePro title, you pay times more money.

As a consumer and/or gamer, why would I care about FP64 performance?

The red spirit said:
Not on Navi24 specifically, in the past it was, but now Pro cards have full performance FP64 fused off. The last cards with artificial segmentation were Radeon Hawaii cards (R9 290 and Pro equivalent)

Oh. I thought we were talking about the 6400.

The red spirit said:
having more GPUs with 1:16 FP64 ration won't help you any. You can compare this Navi monstrosity (https://www.techpowerup.com/gpu-specs/radeon-pro-w6800x-duo.c3824) with quite old GCN proper card (https://www.techpowerup.com/gpu-specs/firepro-w9100.c2562). 8 year old card smokes dual GPU recent card. That's ridiculous. It's even worse if you compare that old FirePro to RTX A6000. Honestly, this sucks for enterprises that actually use FP64 often as they are stuck using old hardware or downgrading to inferior new product. It's so sad, that some basement dweller who got his 7970 can still outperform brand new Quadro or Radeon Pro cards. I really hope that IBM has their own hardware for FP64, nVidia seems to be lost cause.

That's interesting, but not really my problem. Besides, as it's been mentioned, the likes of AMD CDNA and Nvidia Volta (soon Hopper) are available for those corporations that need it.

The red spirit said:
Ryzen 3100 would beat it. That's your HTPC1.

I did run F@H on both, actually. An entire system's power consumption under 100 Watts in full load is mind-boggling.

Valantar said:
Exactly what I was saying. "Crippling" means removing or disabling something useful. FP64 isn't useful to consumers or regular end users in any appreciable way.

Not just that. As "crippling" means removing something useful, that useful thing has to be there first. You can't remove what isn't there.

The red spirit said:
Potential further software development that can utilize FP64. I read more about FP64 and it may have been useful in physics simulations. Which is becoming more of a thing in games and already sort of was in games (PhysX) before. There is deep learning, which could utilize FP64 capabilities, considering the push to DL cores on nVidia side, it might get relevant (DL ASIC already supports FP64, but I mean integration into GPU die itself without ASIC).

Development for a potential future application is exactly what AMD did with the FX-series CPUs. Look how it turned out. About a decade later, they are actually okay entry-level CPUs as we live in an era when we can utilise 8 integer cores in common applications, but it doesn't matter as they were absolutely awful at the time of release (and their efficiency is still crap).

Valantar said:
Again: that argument is entirely irrelevant to the point I was making. That a specific architecture at a specific point in time, which is inherently bound up in the specific and concrete realities of that spatiotemporal setting (economics, available lithographic nodes, competitive situations, technology adoption, etc.), doesn't say anything whatsoever that is generalizeable about future uses or implementations of specific traits of those architectures. Your entire argument here boils down to "it was easy once, so it will always be easy", which is a complete logical fallacy. It just isn't true. That it was easy (which is also up for debate - making it work definitely had costs and took time) doesn't say anything about its future usefulness, the complexity of maintaining that feature in future designs, or the extra effort involved in doing so (and on the reverse side, the lower complexity of cutting out this unused feature and simplifying designs towards actually used and useful features). Things change. Features lose relevance, or are proven to never have been relevant. Design goals change. Design contingencies and interdependencies favor reducing complexity where possible.

Exactly. If we follow the above logic, we can conclude that my RTX 2070 is an absolute rubbish GPU because it can't run 3dfx Glide.

Valantar · May 9, 2022

AusWolf said:
As a consumer and/or gamer, why would I care about FP64 performance?

Oh. I thought we were talking about the 6400.

That's interesting, but not really my problem. Besides, as it's been mentioned, the likes of AMD CDNA and Nvidia Volta (soon Hopper) are available for those corporations that need it.

I did run F@H on both, actually. An entire system's power consumption under 100 Watts in full load is mind-boggling.

Not just that. As "crippling" means removing something useful, that useful thing has to be there first. You can't remove what isn't there.

Development for a potential future application is exactly what AMD did with the FX-series CPUs. Look how it turned out. About a decade later, they are actually okay entry-level CPUs as we live in an era when we can utilise 8 integer cores in common applications, but it doesn't matter as they were absolutely awful at the time of release (and their efficiency is still crap).

Exactly. If we follow the above logic, we can conclude that my RTX 2070 is an absolute rubbish GPU because it can't run 3dfx Glide.

You know, it's almost as if hardware, firmware, driver, OS and software development is this really complicated and interwoven process where the best you can hope to do is make good guesses towards future use cases and problems while at the same time performing well in current use cases, and where bad or erroneous guesses happen relatively often and are either discarded or relegated to niche markets where they have an application. Who would have thought it?

The red spirit · May 9, 2022

Valantar said:
... so you're admitting that delimiting this feature to enterprise products only was effective? Thanks! That's what I've been saying all along.

No? PS3s were bought for FP32 performance. That was an example of databases/enterprises willing to go bizarre for performance they need. Just like how they buy 8 core server Atoms. Super niche product, only for them.

Valantar said:
... and we're seeing exactly the same movement of it having a test run of "open" availability, where use cases are explored and identified, before the hardware is then disabled (and likely removed in the future) from implementations where it doesn't make much sense. And, of course, AVX-512 has seen far more widespread usage in consumer facing applications than FP64 compute ever did, yet it's still being disabled. Of course that is largely attributable to that much higher die area requirement you mention, which significantly raises the threshold for relative usefulness vs. keeping it around. So while AVX-512 has far more consumer utility than FP64, it is still moving towards only being included in enterprise hardware where someone is willing to pay for the cost of keeping it around.

AVX-512 fate is sealed at this point. GPU does vectors in games at high framerates, why use CPU instead? FP64 however, is really slow on CPU. FP64 didn't meaningfully increase die space either.

Valantar said:
"Potential" - but that didn't come to pass in a decade, with half that time having widespread availability of high performance implementations? Yeah, sorry, that is plenty of time to explore whether this has merit. And the games industry has concluded that the vast, overwhelming majority of games do not at all benefit from that degree of precision in its simulations, and will do just fine with FP32 in the same scenarios. The precision just isn't necessary, which makes programming for it wasteful and inefficient.

Maybe maybe. Internet at first also was super niche thing and "didn't take off", same with CUDA.

Valantar said:
Also, deep learning is moving the exact opposite way: utilizing lower precision calculations - FP16, INT8, and various packed versions of those operations. FP64 does have some deep learning-related use, mainly in training models. But that's a crucial difference: training models isn't something any consumer is likely to do with any frequency at a scale where such acceleration is really necessary. Running those models is what end users are likely to do, in which case FP64 is completely useless.

Incidentally, nVidia doesn't have ungimped FP64 cards at all either.

Valantar said:
That's actually a good example, yes. LHR cards purposely delimit a specific subset of functionality that is of very little use to most gamers in order to avoid gamer-facing products being gobbled up by far wealthier entities seeking to use them for other purposes. Is this supposed to be an argument for this somehow being bad? It's not been even close to as effective (in part because of crypto being only about profit unlike scientific endeavors which generally don't care about profitability, and instead focus on quality, reliability and reproducible results), but that is entirely besides the point. All this does is exemplify why this type of market segmentation generally is beneficial to end users.

Assuming it had worked well, it takes away clearly valuable feature of their cards that people know how to utilize and then they resell exact same thing at times higher price. That's predatory. I would rather not have nV acting like some dicks that they are.

Valantar said:
Ah, so you're an ASIC design engineer? Fascinating! I'd love to hear you explain how you create those zero-area interconnect structures between features. Do they not use ... wires? Logic? 'Cause last I checked, those things take up die area.

No, they don't. They are just another layer. It's more like die volume, than area. Still, would cost them barely anything to add those. Literally less than cent per GPU.

It is the exact opposite of pointless - it has a clear point: design efficiency and simplicity. Bringing forward a feature that nobody uses into a new die design makes that design needlessly more complex, which drives up costs, likely harms energy efficiency, increases die sizes - and to no benefit whatsoever, as the feature isn't used. Why would anyone do that?

Valantar said:
Putting it another way: allowing for paired FP32 cores to work as a single FP64 core requires specific structures and interconnects between these cores, and thus reduces design freedom - it puts constraints on how you design your FP32 cores, it puts constraints on how you place them relative to each other, it puts constraints on how these connect to intra-core fabrics, caches, and more. Removing this requirement - keeping FP64 at a high level of acceleration - thus increases design freedom and flexibility, allowing for a broader range of design creativity and thus a higher likelihood of higher performing and more efficient designs overall. In addition to this, it saves die area through leaving out specific physical features. Not a lot, but still enough to also matter over the tens of millions of GPUs produced each year.
Not having proper enterprise cards also puts strain on your profits. Besides software and ECC, there's hardly any reason to shell out for RTX A or Radeon Pro.

Valantar said:

How long after the electric starter motor was first designed and implemented did car makers keep putting hand cranks on the front of their cars? They disappeared within about a decade. Why? Because they just weren't needed. The exception was 4WD cars and trucks, which might see use in locations where the failure of an electric starter could leave you stranded - and hand cranks were found in these segments up until at least the 1970s. Again: features of a design are typically only brought forward when they have a clear usefulness; if not, they are discarded and left behind. That is sensible and efficient design, which allows for better fit-for-purpose designs, which are in turn more efficient designs, avoiding feature bloat. You keep a feature where it has a use; you leave it behind where it falls out of use or is demonstrated to not be used.

Click to expand...

In USSR, that feature was common until 90s tho. Hand cranking a Lada lasted a long time.

Valantar said:

Again: that argument is entirely irrelevant to the point I was making. That a specific architecture at a specific point in time, which is inherently bound up in the specific and concrete realities of that spatiotemporal setting (economics, available lithographic nodes, competitive situations, technology adoption, etc.), doesn't say anything whatsoever that is generalizeable about future uses or implementations of specific traits of those architectures. Your entire argument here boils down to "it was easy once, so it will always be easy", which is a complete logical fallacy. It just isn't true. That it was easy (which is also up for debate - making it work definitely had costs and took time) doesn't say anything about its future usefulness, the complexity of maintaining that feature in future designs, or the extra effort involved in doing so (and on the reverse side, the lower complexity of cutting out this unused feature and simplifying designs towards actually used and useful features). Things change. Features lose relevance, or are proven to never have been relevant. Design goals change. Design contingencies and interdependencies favor reducing complexity where possible.

Click to expand...

It will be easy as long as we will have FP32 cores.

Valantar said:

That is a gross misrepresentation of the complexity of designing and implementing a feature like this, and frankly quite disrespectful in its dismissiveness of the skills and knowledge of the people doing these designs.

Click to expand...

It's really not that hard. designing GPUs in general is not very hard. You have few instructions, which you want to implement on hardware, you create core and just scale them. Each core are just ALUs or FPUs with cache.

Valantar said:

That used to be true, but no longer is. Low end enterprise cards still use consumer architectures, as they are (relatively) low cost and still don't need those specific features (like FP64). High performance enterprise cards now use specific architectures (CDNA) or very different subsets of architectures (Nvidia's XX100 dice, which are increasingly architecturally different from every other die in the same generation). And this just demonstrates how maintaining these features across designs is anything buyt the trivial matter you're presenting it as. If it was trivial, and there was any profit to be made from it, they would keep it around across all designs. Instead, it's being phased out, due to design costs, implementation costs, and lack of use (=profitability).

Click to expand...

And yet they still fail to create proper FP64 card. RTX A6000 and W6800 still kinda suck in comparison to Radeon HD 7970. So much for those fancy buzzwords.

Valantar said:

Except this fundamentally undermines the idea of these charities. They were invented at a time when CPUs and GPUs didn't turbo or clock down meaningfully at idle, i.e. when they consumed just about the same amount of power no matter what. Making use of those wasted processing cycles thus made sense - otherwise you were just burning power for no use. Now, they instead cause CPUs and GPUs to boost higher, burn more power, to do this work. This by itself already significantly undermines the core argument for these undertakings.

Click to expand...

Sort of. That issue was already solved by the time Athlon 64 was launched and by cards like ATi X800 Pro.

Valantar said:

All the while, their relative usefulness is dropping as enterprise and scientific calculations grow ever more demanding and specialized, and the hardware changes to match. You are effectively arguing that all hardware should be made equal, because a small subset of users would then donate the unused performance to a good cause. This argument is absurd on its face, as it is arguing for massive waste because of some of it not being wasted. That's like arguing that all cars should have 200+ bhp, 4WD, differential locking and low gear modes because a tiny subset of people use them for offroading. Just because a feature has been invented and proven to work in a niche use case is not an argument for implementing it broadly, nor is it an argument for keeping it implemented across generations of a product if it is demonstrated to not be made use of. The lack of use is in fact an explicit argument for not including it.

Click to expand...

Except that FP64 was made, implemented cheaply and continued to be cheap until nV and AMD wasted more cash. We, as consumers, lost performance, didn't get cheaper cards. At east their greed sort of backfired and now their enterprise cards suck and have no reason to exist, other than their highest end models. And GPGPU has been proven to be useful, along with FP64.

Intel doesn't take away many really old instructions from their CPUs, neither does AMD (usually). Tell me any reason why should GPUs be gimped and then those "premium" features scalped?

AusWolf said:
As a consumer and/or gamer, why would I care about FP64 performance?

Better physics, better AI?

AusWolf said:
That's interesting, but not really my problem. Besides, as it's been mentioned, the likes of AMD CDNA and Nvidia Volta (soon Hopper) are available for those corporations that need it.

Have you seen price? You can still buy 4 Radeon VIIs for less.

AusWolf said:
Development for a potential future application is exactly what AMD did with the FX-series CPUs. Look how it turned out. About a decade later, they are actually okay entry-level CPUs as we live in an era when we can utilise 8 integer cores in common applications, but it doesn't matter as they were absolutely awful at the time of release (and their efficiency is still crap).

But it lead to Ryzen. It literally had similar architecture. And you could also make an argument that their power use sucks, that Ryzen 1 sucked in games and that it was barely usable due to RAM and BIOS issues. And all that was true, but if you sell them cheap enough, consumer won't care.

AusWolf said:
Exactly. If we follow the above logic, we can conclude that my RTX 2070 is an absolute rubbish GPU because it can't run 3dfx Glide.

nGlide, mate. You can paly your NFS Porsche Unleased on RTX card too.

Valantar · May 9, 2022

The red spirit said:
No? PS3s were bought for FP32 performance. That was an example of databases/enterprises willing to go bizarre for performance they need. Just like how they buy 8 core server Atoms. Super niche product, only for them.

That still just supports my argument: some business customers will gobble up any cheap consumer hardware available if it can be made to do what they want it to in a reliable way.

The red spirit said:
AVX-512 fate is sealed at this point. GPU does vectors in games at high framerates, why use CPU instead? FP64 however, is really slow on CPU. FP64 didn't meaningfully increase die space either.

...but FP64 also doesn't have any relevant uses. So why keep it around? Now, comparing the die area needed for an integrated compute feature like this is extremely difficult unless you have very detailed annotated die shots with accurate measurements so that you can exclude other on-die factors (encode/decode blocks, memory controllers, etc.). But basic logic has some strong indications: if there is zero work involved in enabling FP64, and it has zero die area requirements, why did they differentiate between, say, Hawaii (1:2) and Tobago or Trinidad (1:16)? Why not just copy-paste the same CU design into the smaller die? If there wasn't a cost incentive towards leaving them out, all they would achieve by doing this would be to deny themselves the opportunity for making higher margin enterprise SKUs. This is obviously not proof, but it is a very strong indication of FP64 implementations have a noticeable cost compared to an identical architecture lacking it. (And, for reference, consumer Hawaii had 1:8 ratios.)

The red spirit said:
Maybe maybe. Internet at first also was super niche thing and "didn't take off", same with CUDA.

Yes, because one specific form of computational math is comparable to a vast interconnected network of computers across the globe and the mind-boggling technological developments that have accompanied the adoption and evolution of this over several decades. Yep, that's a reasonable comparison.

The red spirit said:
Incidentally, nVidia doesn't have ungimped FP64 cards at all either.

Sure they do. GA100, H100. Both have 2:1 FP64 ratios. This a) tells us something about who actually needs FP64; b) tells us something about the implementations in which it is worth including it in.

The red spirit said:
Assuming it had worked well, it takes away clearly valuable feature of their cards that people know how to utilize and then they resell exact same thing at times higher price. That's predatory. I would rather not have nV acting like some dicks that they are.

... yes, because not having LHR wouldn't at all have worsened the crypto """industry""" buying up all available GPUs, of course not! There is nothing predatory about this whatsoever. That Nvidia implements a (flawed, but a good try) programme aimed at alleviating the massively detrimental effects of a massive pyramid scheme bubble is literally as opposite of "predatory" as you can get. And ... is there any problem reselling LHR cards? Yeah, no, that's not a thing. You're arguing as if it's reasonable that a GPU increase in price after purchase, which just demonstrates that you're arguing from a completely absurd starting point.

The red spirit said:
No, they don't. They are just another layer. It's more like die volume, than area. Still, would cost them barely anything to add those. Literally less than cent per GPU.

Dunning–Kruger effect - Wikipedia

en.wikipedia.org

Seriously, what you're saying here is complete and utter nonsense, and

The red spirit said:
Not having proper enterprise cards also puts strain on your profits. Besides software and ECC, there's hardly any reason to shell out for RTX A or Radeon Pro.

"Besides it working significantly better in the software that your enterprise runs on and having massively increased data security, there's no reason to shell out for RTX A or Radeon Pro." Yes, those are indeed tiny and insignificant reasons :rolleyes:

The red spirit said:
In USSR, that feature was common until 90s tho. Hand cranking a Lada lasted a long time.

That just demonstrates that external factors can lead to otherwise obsolete factors staying useful. It does nothing to counter my argument whatsoever - heck, the analogue to that is exactly those high-end enterprise/HPC cards with tons of FP64. They are the niche, that's where it's sticking around, because that's where it has a use.

The red spirit said:
It will be easy as long as we will have FP32 cores.
It's really not that hard. designing GPUs in general is not very hard. You have few instructions, which you want to implement on hardware, you create core and just scale them. Each core are just ALUs or FPUs with cache.

Jesus, the sheer disrespect for people's skills and the massive complexity of these designs on show here is downright baffling. Seriously, please go read that article on the Dunning Kruger effect, as it is extremely relevant here. Then perhaps go read an article on ASIC design, and one on GPU design. Then try to consider how many billions of transistors are involved in such designs, and how every node and every design requires transistor-level tweaks and adjustments of all designs. Then try to consider the cascading effects on layouts and interconnects from feature sizes growing. Etc., etc. GPUs aren't the most complex ICs out there as they are mainly a ton of repetitions of one structure, but summing that up as "designing GPUs in general is not very hard" is such a vast underestimation of this complexity that it's just plain absurd.

The red spirit said:
And yet they still fail to create proper FP64 card.

No they don't, as demonstrated above.

The red spirit said:
RTX A6000 and W6800 still kinda suck in comparison to Radeon HD 7970. So much for those fancy buzzwords.

"Suck" in a use case nobody buys them for, and they aren't designed for. In the meantime, an A100 is ten times a 7970, and a H100 is thirty times faster.

If you need a hammer for hammering nails, you make or buy a hammer. If you need a sledgehammer for knocking down walls, you make or buy a sledgehammer. Just because the two are vaguely similar does not make them relevant comparisons to each other. And trying to combine the two generally gives sub-optimal results in both use cases.

The red spirit said:
Sort of. That issue was already solved by the time Athlon 64 was launched and by cards like ATi X800 Pro.

What "issue" was "solved"? I'm talking about how the fundamental operational characteristics of hardware have changed in a way that has rendered the basic premise of these charities incompatible with how hardware today operates.

The red spirit said:
Except that FP64 was made, implemented cheaply and continued to be cheap until nV and AMD wasted more cash.

Please define "cheaply". Also, how do you know? At the time, these companies had a single architecture across both consumer and enterprise segments. How do you know they weren't simply amortizing both implementation and die area costs of FP64 in enterprise pricing? Also, you don't quite seem to grasp the concept that new ideas are implemented as a gamble, to see if it gains popularity and takes off. If it doesn't, then those features are scaled down or cut accordingly. As we have seen with FP64 in consumer (and even most enterprise) products.

The red spirit said:
We, as consumers, lost performance, didn't get cheaper cards.

What? Lost performance? How? In what tasks? Kerbal Space program? 'Cause beyond that, that "performance" you're talking about is fictional. And you literally have zero basis for saying we didn't get cheaper cards, as it is entirely impossible for you to know whether the cards we got are cheaper than theoretical counterparts with better FP64 capabilities. And, as all additional features have a cost, it is reasonable to assume that better FP64 would have increased costs. How much? No idea. Some? Yes.

The red spirit said:
At east their greed sort of backfired and now their enterprise cards suck and have no reason to exist, other than their highest end models.

...what? Those cards sell like never before, and have more applications than ever before.

The red spirit said:
And GPGPU has been proven to be useful, along with FP64.

... what is it with you and straw man arguments? Has anyone here even brought up GPGPU, let alone claimed that it isn't useful? Please stop making up stuff. Also, GPGPU has nothing in particular to do with FP64.

The red spirit said:
Intel doesn't take away many really old instructions from their CPUs, neither does AMD (usually).

Yes they do. There are tons of old, deprecated instructions that are removed over time. It takes a long, long time, because doing so breaks compatibility with software in fundamental ways, but they most definitely do so.

Also, Intel is widely known to delimit many features to their Xeon lineup, despite being present on silicon for Core products - ECC support being the key example.

The red spirit said:
Tell me any reason why should GPUs be gimped and then those "premium" features scalped?

... you're arguing from a completely invalid premise here. These aren't "premium" features, they are specialized features with little to no usefulness outside of niche applications. Thus, not including them in designs is not gimping, it is an example of design efficiency, designing for purpose, removing unnecessary bloat.

The red spirit said:
Better physics, better AI?

Physics in games run just fine on FP32 - we don't need scientific levels of accuracy for game physics. And AI mostly needs a CPU - remember, game AI is not at all the same as the AI that the compute industry is all abuzz about (neural networks and the like). That kind of AI can indeed be applied to games in some ways, but would then most likely make use of FP16, INT16, INT8, or some similar AI-oriented low precision form of compute. Games are not going to be training neural networks while you play.

The red spirit said:
Have you seen price? You can still buy 4 Radeon VIIs for less.

Yet the people who need them can easily afford to pay for them, and likely also want the support that comes with that purchase. For everyone else, this just doesn't matter.

The red spirit said:
But it lead to Ryzen. It literally had similar architecture.

What? Ryzen was a dramatic departure from the Heavy Machinery CPU designs. They have extremely little in common. This deep-dive from Anandtech ought to be informative.

The red spirit said:
And you could also make an argument that their power use sucks, that Ryzen 1 sucked in games and that it was barely usable due to RAM and BIOS issues. And all that was true, but if you sell them cheap enough, consumer won't care.

Consumers definitely cared that Zen 1 had mediocre gaming performance - but they also cared about getting a better value proposition, more cores, and drastically more performance in (common, useful) compute applications. Still, Zen1 had tons of detractors. It wasn't until Zen2, or even Zen3 arguably, that the world in general took Ryzen seriously as a fully worthy alternative to Intel Core.

AusWolf · May 9, 2022

The red spirit said:
Better physics, better AI?

In what game?

The red spirit said:
Have you seen price? You can still buy 4 Radeon VIIs for less.

That's an extremely inefficient GPU. A Quadro GV100 has double the FP64 performance with a lower TDP. In such efficiency-sensitive applications (datacentres), it can recoup the extra initial costs in no time.

The red spirit said:
But it lead to Ryzen. It literally had similar architecture. And you could also make an argument that their power use sucks, that Ryzen 1 sucked in games and that it was barely usable due to RAM and BIOS issues. And all that was true, but if you sell them cheap enough, consumer won't care.

It's not similar at all. Ryzen was a completely new design, as AMD openly acknowledged that the direction they wanted to take with FX was a mistake.

The red spirit said:
nGlide, mate. You can paly your NFS Porsche Unleased on RTX card too.

Seriously? I'll have to have a look.

System Name	Shizuka
Processor	Intel Core i5 10400F
Motherboard	Gigabyte B460M Aorus Pro
Cooling	Scythe Choten
Memory	2x8GB G.Skill Aegis 2666 MHz
Video Card(s)	PowerColor Red Dragon V2 RX 580 8GB ~100 watts in Wattman
Storage	512GB WD Blue + 256GB WD Green + 4TH Toshiba X300
Display(s)	BenQ BL2420PT
Case	Cooler Master Silencio S400
Audio Device(s)	Topping D10 + AIWA NSX-V70
Power Supply	Chieftec A90 550W (GDP-550C)
Mouse	Steel Series Rival 100
Keyboard	Hama SL 570
Software	Windows 10 Enterprise

Processor	AMD Ryzen 5 4600G @4300mhz
Motherboard	MSI B550-Pro VC
Cooling	Scythe Mugen 5 Black Edition
Memory	16GB DDR4 4133Mhz Dual Channel
Video Card(s)	IGP AMD Vega 7 Renoir @2300mhz (8GB Shared memory)
Storage	256GB NVMe PCI-E 3.0 - 6TB HDD - 4TB HDD
Display(s)	Samsung SyncMaster T22B350
Software	Xubuntu 24.04 LTS x64 + Windows 10 x64

System Name	Hotbox
Processor	AMD Ryzen 7 5800X, 110/95/110, PBO +150Mhz, CO -7,-7,-20(x6),
Motherboard	ASRock Phantom Gaming B550 ITX/ax
Cooling	LOBO + Laing DDC 1T Plus PWM + Corsair XR5 280mm + 2x Arctic P14
Memory	32GB G.Skill FlareX 3200c14 @3800c15
Video Card(s)	PowerColor Radeon 6900XT Liquid Devil Ultimate, UC@2250MHz max @~200W
Storage	2TB Adata SX8200 Pro
Display(s)	Dell U2711 main, AOC 24P2C secondary
Case	SSUPD Meshlicious
Audio Device(s)	Optoma Nuforce μDAC 3
Power Supply	Corsair SF750 Platinum
Mouse	Logitech G603
Keyboard	Keychron K3/Cooler Master MasterKeys Pro M w/DSA profile caps
Software	Windows 10 Pro

System Name	Shizuka
Processor	Intel Core i5 10400F
Motherboard	Gigabyte B460M Aorus Pro
Cooling	Scythe Choten
Memory	2x8GB G.Skill Aegis 2666 MHz
Video Card(s)	PowerColor Red Dragon V2 RX 580 8GB ~100 watts in Wattman
Storage	512GB WD Blue + 256GB WD Green + 4TH Toshiba X300
Display(s)	BenQ BL2420PT
Case	Cooler Master Silencio S400
Audio Device(s)	Topping D10 + AIWA NSX-V70
Power Supply	Chieftec A90 550W (GDP-550C)
Mouse	Steel Series Rival 100
Keyboard	Hama SL 570
Software	Windows 10 Enterprise

System Name	Nebulon B
Processor	AMD Ryzen 7 7800X3D
Motherboard	MSi PRO B650M-A WiFi
Cooling	be quiet! Dark Rock 4
Memory	2x 24 GB Corsair Vengeance DDR5-4800
Video Card(s)	AMD Radeon RX 6750 XT 12 GB
Storage	2 TB Corsair MP600 GS, 2 TB Corsair MP600 R2
Display(s)	Dell S3422DWG, 7" Waveshare touchscreen
Case	Kolink Citadel Mesh black
Audio Device(s)	Logitech Z333 2.1 speakers, AKG Y50 headphones
Power Supply	Seasonic Prime GX-750
Mouse	Logitech MX Master 2S
Keyboard	Logitech G413 SE
Software	Bazzite (Fedora Linux) KDE

System Name	i5-11600K (custom-built PC)
Processor	i5-11600K
Motherboard	Asus Prime H570 Plus
Cooling	Arctic Cooling Freezer 11 LP, 2x Noctua NF-A8 PWM Chromax Black Swap
Memory	4 x 8GB 3200MHz Kingston ValueRAM CL22 (KVR32N22S8/8)
Video Card(s)	ASUS Dual Radeon™ RX 6500 XT OC Edition 4GB GDDR6
Storage	TOSHIBA P300 1TB 3.5" 7200rpm 64MB SATA HDWD110UZSVA
Display(s)	SENCOR 19" color TV
Case	Apevia Explorer
Audio Device(s)	MB integrated, connected to an AKAI hifi system
Power Supply	Thermaltake London 550W (modular)
Mouse	Microsoft Comfort Mouse 4500
Keyboard	Trust 17376
Software	Windows 10 Pro 21H2
Benchmark Scores	N/A This isn't a gaming rig, just a reasonable fast everyday machine.

AMD Radeon RX 6400

Attachments