Those people bought PS3 for supercomputer too. They had to do some hacking to make it work. Buying a ready to use GPU is nothing for them. The thing is they just didn't do that.
... so you're admitting that delimiting this feature to enterprise products only was effective? Thanks! That's what I've been saying all along.
After all, the only difference between those two is the respective difficulty of running custom software on a PS3 vs. a custom BIOS or driver unlocking disabled FP64 capabilities on the GPUs in question. So ... this demonstates the effectiveness of the GPU makers' strategy.
That certainly wouldn't be as complex as you say, neither as expensive. Intel literally added AVX-512, that even less people used with actually big cost of die space and it didn't take off, but it didn't impact cost of chips much either.
... and we're seeing exactly the same movement of it having a test run of "open" availability, where use cases are explored and identified, before the hardware is then disabled (and likely removed in the future) from implementations where it doesn't make much sense. And, of course, AVX-512 has seen far more widespread usage in consumer facing applications than FP64 compute ever did, yet it's still being disabled. Of course that is largely attributable to that much higher die area requirement you mention, which significantly raises the threshold for relative usefulness vs. keeping it around. So while AVX-512 has far more consumer utility than FP64, it is still moving towards only being included in enterprise hardware where someone is willing to pay for the cost of keeping it around.
Potential further software development that can utilize FP64. I read more about FP64 and it may have been useful in physics simulations. Which is becoming more of a thing in games and already sort of was in games (PhysX) before. There is deep learning, which could utilize FP64 capabilities, considering the push to DL cores on nVidia side, it might get relevant (DL ASIC already supports FP64, but I mean integration into GPU die itself without ASIC).
"Potential" - but that didn't come to pass in a decade, with half that time having widespread availability of high performance implementations? Yeah, sorry, that is plenty of time to explore whether this has merit. And the games industry has concluded that the
vast,
overwhelming majority of games do not at all benefit from that degree of precision in its simulations, and will do just fine with FP32 in the same scenarios. The precision just isn't necessary, which makes programming for it wasteful and inefficient.
Also, deep learning is moving the exact opposite way: utilizing
lower precision calculations - FP16, INT8, and various packed versions of those operations. FP64 does have some deep learning-related use, mainly in training models. But that's a crucial difference: training models isn't something any consumer is likely to do with any frequency at a scale where such acceleration is really necessary.
Running those models is what end users are likely to do, in which case FP64 is completely useless.
It's literally the same as crippling hash rate of current cards.
That's actually a good example, yes. LHR cards purposely delimit a specific subset of functionality that is of very little use to most gamers in order to avoid gamer-facing products being gobbled up by far wealthier entities seeking to use them for other purposes. Is this supposed to be an argument for this somehow being bad? It's not been even close to as effective (in part because of crypto being
only about profit unlike scientific endeavors which generally don't care about profitability, and instead focus on quality, reliability and reproducible results), but that is entirely besides the point. All this does is exemplify why this type of market segmentation generally is beneficial to end users.
(And no, cryptomining is not beneficial to the average gamer - not whatsoever. It's a means for the already wealthy to entrench their wealth, with a tiny roster of exceptions that give it a veneer of "democratization" that is easily disproven.)
It's not even a die space issue. It's all about interconnecting already existing FP cores. Perhaps benefits are small, but if you are already making a GPU, it's stupidly easy to add.
Ah, so you're an ASIC design engineer? Fascinating! I'd love to hear you explain how you create those zero-area interconnect structures between features. Do they not use ... wires? Logic? 'Cause last I checked, those things take up die area.
I'm arguing, that it's way lower than 1% and that it's just wasteful not to do it as it takes nearly no effort for GPU makers to make it. perhaps not often used or whatever, but pointless to cut off too.
It is the exact opposite of pointless - it has a clear point: design efficiency and simplicity. Bringing forward a feature that nobody uses into a new die design makes that design needlessly more complex, which drives up costs, likely harms energy efficiency, increases die sizes - and to no benefit whatsoever, as the feature isn't used. Why would anyone do that?
Putting it another way: allowing for paired FP32 cores to work as a single FP64 core requires specific structures and interconnects between these cores, and thus reduces design freedom - it puts constraints on how you design your FP32 cores, it puts constraints on how you place them relative to each other, it puts constraints on how these connect to intra-core fabrics, caches, and more. Removing this requirement - keeping FP64 at a high level of acceleration - thus increases design freedom and flexibility, allowing for a broader range of design creativity and thus a higher likelihood of higher performing and more efficient designs overall. In addition to this, it saves die area through leaving out specific physical features. Not a lot, but still enough to also matter over the tens of millions of GPUs produced each year.
How long after the electric starter motor was first designed and implemented did car makers keep putting hand cranks on the front of their cars? They disappeared within about a decade. Why? Because they just weren't needed.
The exception was 4WD cars and trucks, which might see use in locations where the failure of an electric starter could leave you stranded - and hand cranks were found in these segments up until at least the 1970s. Again: features of a design are typically only brought forward when they have a clear usefulness; if not, they are discarded and left behind. That is sensible and efficient design, which allows for better fit-for-purpose designs, which are in turn more efficient designs, avoiding feature bloat. You keep a feature where it has a use; you leave it behind where it falls out of use or is demonstrated to not be used.
And do you honestly think that it's enough to give just a few years for new technology to take off?
Pretty much, yes. FP64 and its uses were relatively well known when it started being implemented in GPU hardware. Half a decade or so to figure out just how those uses would pan out in reality was plenty.
You could basically argue that CUDA was sort of useless too using exactly the same argument.
What? CUDA saw relatively rapid adoption, despite being entirely novel (unlike the concept of FP64). This was of course significantly helped along by Nvidia pouring money into CUDA development efforts, but it was still proven to have concrete benefits across a wide range of applications in a relateively short span of time. It is also still used across a wide range of consumer-facing applications, even if they are somewhat specialized. CUDA has
far wider applicability than FP64.
My point was that those cards could be made stupidly cheap and FP64 capabilities don't have any significant effect on price of end product. Meanwhile their competitor made far worse mistakes, that weren't even FP64 related and those cost them way more.
Again: that argument is entirely irrelevant to the point I was making. That a specific architecture at a specific point in time, which is inherently bound up in the specific and concrete realities of that spatiotemporal setting (economics, available lithographic nodes, competitive situations, technology adoption, etc.), doesn't say
anything whatsoever that is generalizeable about future uses or implementations of specific traits of those architectures. Your entire argument here boils down to "it was easy once, so it will always be easy", which is a complete logical fallacy. It just isn't true. That it was easy (which is also up for debate - making it work definitely had costs and took time) doesn't say anything about its future usefulness, the complexity of maintaining that feature in future designs, or the extra effort involved in doing so (and on the reverse side, the lower complexity of cutting out this unused feature and simplifying designs towards actually used and useful features). Things change. Features lose relevance, or are proven to never have been relevant. Design goals change. Design contingencies and interdependencies favor reducing complexity where possible.
If by "crippling" you mean not spending time and resource to include a feature that no one ever needs, then I guess that's what @Valantar meant.
Exactly what I was saying. "Crippling" means removing or disabling something
useful. FP64 isn't useful to consumers or regular end users in any appreciable way.
Time spent on it would be negligible. Few traces to connect existing FP32 cores and voila.
That is a gross misrepresentation of the complexity of designing and implementing a feature like this, and frankly quite disrespectful in its dismissiveness of the skills and knowledge of the people doing these designs.
Except they design basically whole architecture and for both gaming cards, as well as datacenter or enterprise cards. And no, parts are there. At first functionality was disabled in vBIOS, but later it was fused off, meaning that they put more effort to remove it, than to just give it. Thus crippling. That's why I say that it's literally the same as LHR cards. Capabilities are just there, but now nV goes extra mile to sell you CMPs.
That used to be true, but no longer is. Low end enterprise cards still use consumer architectures, as they are (relatively) low cost and
still don't need those specific features (like FP64). High performance enterprise cards now use specific architectures (CDNA) or very different subsets of architectures (Nvidia's XX100 dice, which are increasingly architecturally different from every other die in the same generation). And this just demonstrates how maintaining these features across designs is anything buyt the trivial matter you're presenting it as. If it was trivial, and there was
any profit to be made from it, they would keep it around across all designs. Instead, it's being phased out, due to design costs, implementation costs, and lack of use (=profitability).
Except, if you want to get more serious about it and actually have some impact, then it starts to matter. And if more FP64 performance was left on table, then with same effort, you could do more with your idle GPU.
Except this fundamentally undermines the idea of these charities. They were invented at a time when CPUs and GPUs didn't turbo or clock down meaningfully at idle, i.e. when they consumed just about the same amount of power no matter what. Making use of those wasted processing cycles thus made sense - otherwise you were just burning power for no use. Now, they instead cause CPUs and GPUs to boost higher, burn more power, to do this work. This by itself already significantly undermines the core argument for these undertakings.
All the while, their relative usefulness is dropping as enterprise and scientific calculations grow ever more demanding and specialized, and the hardware changes to match. You are effectively arguing that all hardware should be made equal, because a small subset of users would then donate the unused performance to a good cause. This argument is absurd on its face, as it is arguing
for massive waste because of
some of it not being wasted. That's like arguing that all cars should have 200+ bhp, 4WD, differential locking and low gear modes because a tiny subset of people use them for offroading. Just because a feature has been invented and proven to work in a niche use case is not an argument for implementing it broadly, nor is it an argument for keeping it implemented across generations of a product if it is demonstrated to not be made use of. The lack of use is in fact an explicit argument for
not including it.
There is nothing inherently wrong with FP64 being phased out of consumer products. It is not "crippling", it is the selective removal of an un(der)utilized feature for the sake of design efficiency. This has taken a while, mainly because up until very recently neither GPU maker has had the means to produce separate datacenter and consumer architectures. This has changed, and now both do (though in different ways). And this is
perfectly fine. Nobody is losing anything significant from it. That a few thousand people are losing the ability to combine their love for running home servers and workstation with contributing to charity is not a major loss in this context. Heck, if they were serious about the charity, they could just donate the same amount of money to relevant causes, where it would most likely be put to far better and more efficient use than through donating compute power.