AMD Radeon RX 6400

The red spirit · May 10, 2022

AusWolf said:
In what game?

I said potential games. Either way, I think it's wrong to see GPUs as single purpose device as they are all are just plain SIMD processing units. They are naturally able of any such task, unless some special neutering is done to them. Special neutering is evil. It means more e-waste, useless work and artificial demand. Nothing of value is created, it's just removed and resold for more money. Nothing more than pathetic hack to get money for being a jerk. GPGPU should have been an eye opener that programmable pipeline cards mean that they can be used for anything and they should have caused the death of "pro" cards. In fixed pipeline days, you had to pick what features to include and often re-engineer card for different task, as it simply couldn't do anything else. And even late fixed pipe cards already showed that their cores can be multipurpose. Real pro cards died with fixed pipeline architecture, cards made after that just failed to provide any significant hardware change over "gaming" cards. Quadros today have nothing more than harvested GeForce dies, that just couldn't reach speeds high enough. And only top tier Quadros give ECC and some other extra hardware bits that actually matter. Low end and mid tier Quadros are a scam. There's nothing pro about them, other than drivers. nVidia should just sell "pro" drivers instead of having a whole line ups of pointless SKUs. Same for AMD, but frankly AMD should just pull out of pro card game altogether, because they only have general purpose pro cards with inferior software and even less features than Quadros. That or finally making proper cards for pros. But they most likely can't, since nVidia already monopolized much of "special" software features already and AMD doesn't have a budget to beat them. That explains why nV successfully sell more of 2-4 times slower pro cards at the same time than AMD.

AusWolf said:
That's an extremely inefficient GPU. A Quadro GV100 has double the FP64 performance with a lower TDP. In such efficiency-sensitive applications (datacentres), it can recoup the extra initial costs in no time.

It's a bit of unobtanium hardware. It's literally not available in some countries.

AusWolf said:
It's not similar at all. Ryzen was a completely new design, as AMD openly acknowledged that the direction they wanted to take with FX was a mistake.

They just switched from SMT to CMT, added some tweaks, improvements and boom, Ryzen was made. Compared to huge leap from K10 to bulldozer, Ryzen is truly similar to FX.

AusWolf said:
Seriously? I'll have to have a look.

It's a Glide API wrapper. Works well for just that. Other old software compatibility problems aren't fixed by it. Frankly, there's almost no point to use it. Even GeForce 256 was superior to 3dfx.

Valantar · May 10, 2022

The red spirit said:
I said potential games. Either way, I think it's wrong to see GPUs as single purpose device as they are all are just plain SIMD processing units. They are naturally able of any such task, unless some special neutering is done to them. Special neutering is evil. It means more e-waste, useless work and artificial demand. Nothing of value is created, it's just removed and resold for more money. Nothing more than pathetic hack to get money for being a jerk. GPGPU should have been an eye opener that programmable pipeline cards mean that they can be used for anything and they should have caused the death of "pro" cards. In fixed pipeline days, you had to pick what features to include and often re-engineer card for different task, as it simply couldn't do anything else. And even late fixed pipe cards already showed that their cores can be multipurpose. Real pro cards died with fixed pipeline architecture, cards made after that just failed to provide any significant hardware change over "gaming" cards. Quadros today have nothing more than harvested GeForce dies, that just couldn't reach speeds high enough. And only top tier Quadros give ECC and some other extra hardware bits that actually matter. Low end and mid tier Quadros are a scam. There's nothing pro about them, other than drivers. nVidia should just sell "pro" drivers instead of having a whole line ups of pointless SKUs. Same for AMD, but frankly AMD should just pull out of pro card game altogether, because they only have general purpose pro cards with inferior software and even less features than Quadros. That or finally making proper cards for pros. But they most likely can't, since nVidia already monopolized much of "special" software features already and AMD doesn't have a budget to beat them. That explains why nV successfully sell more of 2-4 times slower pro cards at the same time than AMD.

You are presenting some rather peculiar views here, wow. First off: you'll do well to rid yourself of the misconception of hardware being worth more in the pro space than software is. Software is by far the most valuable thing in most professional settings, and especially in HPC and similar scenarios. Most datacenters spend far more on software than hardware, and they do so happily. After that, they want efficiency, as they run their hardware at 100% 24/7, which makes for pretty high power bills both for the hardware and for cooling. Then you get to the hardware and how it performs.

Second: what in "professional GPU" indicates that it must have significantly different hardware from a consumer GPU? This misunderstanding seems, again, to stem from the idea that hardware is worth more than software. Put it this way: Pro GPUs today are slightly tweaked consumer GPUs with a few features added, different warranties and service agreements, and very different driver and software packages. And that, quite clearly, is a delineation of products that is sufficiently valuable to businesses for them to not be vacuuming consumer GPUs off the market. Drivers, software, and support are what these businesses pay for, as that is what is crucial to them. Hardware, as you say, is quite flexible. Good, well optimized, stable and well written software is more important than having the fastest hardware. If businesses weren't happy with this, they would be buying consumer GPUs by the thousands. (And, of course, some do, but they are quite rare.)

Calling Quadros "harvested" as if they are inferior to Geforces is also rather ludicrous. They have much stricter QC than Geforce cards, and their low clocks are for tuning for (moderate) efficiency rather than absolute peak performance - different markets have different priorities. There is no reason to equate lower clocks in Quadros with them not being able to sustain the same clocks as Geforces - it just tells us that Geforces sacrifice efficiency in the name of peak performance, whereas datacenters and workstations are very concerned about running costs, as those costs generally far outstrip hardware costs as well.

CUDA being monopolistic is absolutely a problem, but it has absolutely nothing to do with the delineation between pro and non-pro GPUs, FP64, or anything else we're discussing here.

To bring this back to FP64: "potential games" - yet the games industry has, in the time since ~2007 when FP64 appeared in GPUs, found no viable large-scale uses for FP64 in games. There also seems to be no interest in resuscitating it, and nobody is talking about it being beneficial. Remember, FP32 can do everything FP64 can do, just with less precise calculations. The only thin you'd be improving with using FP64 would be how predictable and reproducible the outcomes of your calculations would be, not what you could do with them. And that's why FP64 is useful almost exclusively for scientific computing: because they crave precision above all else. And nobody else does.

The red spirit said:
They just switched from SMT to CMT, added some tweaks, improvements and boom, Ryzen was made. Compared to huge leap from K10 to bulldozer, Ryzen is truly similar to FX.

That is absolute, utter, pure and unadulterated nonsense. Seriously, please go read Anandtech's Zen1 architectural deep dive that I linked above. Literally every single part of the Zen core is significantly changed from previous AMD designs. They have very little in common.

The red spirit · May 11, 2022

Valantar said:
You are presenting some rather peculiar views here, wow. First off: you'll do well to rid yourself of the misconception of hardware being worth more in the pro space than software is. Software is by far the most valuable thing in most professional settings, and especially in HPC and similar scenarios. Most datacenters spend far more on software than hardware, and they do so happily. After that, they want efficiency, as they run their hardware at 100% 24/7, which makes for pretty high power bills both for the hardware and for cooling. Then you get to the hardware and how it performs.

Dude, that's such a joke. You only get Quadro (sorry, RTX A or whatever else) if you absolutely can't afford any downtime. It's straight up propaganda, that you need Quadro. Have you seen workstation forums? People have already questioned a value of Quadros in CAD applications. We already saw that supercomputers can be made from literal consoles. They don't see any worth in that and there is barely any advantage to those cards even in software. You only buy them for niche of niches. Perhaps you are Pixar and you need shit load of vRAM, perhaps you are medical researcher that needs super high res brain maps, but that's all. Many professionals just use "peasant" Radeons and GeForces. People have already figured out that software is not worth their time, when you get better hardware.

And if software is so important, then why it's not subscription? Also if you need mission critical software, you are better off making it custom in linux like a real pro, instead on relying on proprietary crap.

Valantar said:
Second: what in "professional GPU" indicates that it must have significantly different hardware from a consumer GPU?

Maybe because it is called a professional GPU, not professional software pack with live support 24/7? BTW AMD gives pro software for consumer grade cards like RX 580, only nV doesn't. Because nVidia.

Valantar said:
Calling Quadros "harvested" as if they are inferior to Geforces is also rather ludicrous.

Less cores often, lower speed. Looks harvested on low end. On high end it more complicated.

Valantar said:
They have much stricter QC than Geforce cards, and their low clocks are for tuning for (moderate) efficiency rather than absolute peak performance - different markets have different priorities. There is no reason to equate lower clocks in Quadros with them not being able to sustain the same clocks as Geforces - it just tells us that Geforces sacrifice efficiency in the name of peak performance, whereas datacenters and workstations are very concerned about running costs, as those costs generally far outstrip hardware costs as well.

You can literally adjust TDP in software. Stricter QC is mostly a myth, there's no proof to that.

Valantar said:
To bring this back to FP64: "potential games" - yet the games industry has, in the time since ~2007 when FP64 appeared in GPUs, found no viable large-scale uses for FP64 in games. There also seems to be no interest in resuscitating it, and nobody is talking about it being beneficial. Remember, FP32 can do everything FP64 can do, just with less precise calculations. The only thin you'd be improving with using FP64 would be how predictable and reproducible the outcomes of your calculations would be, not what you could do with them. And that's why FP64 is useful almost exclusively for scientific computing: because they crave precision above all else. And nobody else does.

Where nV and AMD are also abandoning it?

Valantar said:
That is absolute, utter, pure and unadulterated nonsense. Seriously, please go read Anandtech's Zen1 architectural deep dive that I linked above. Literally every single part of the Zen core is significantly changed from previous AMD designs. They have very little in common.

Sure, but it's exactly the same philosophy. Moar cores, lower price, power usage be damned, PR to the moon. And Zen 1 also launched, when SMT was still new idea, another not tried technology in consumer market, just like CMT before. In terms of specs, it seems that cache size traditionally remained big (just like FX with high latency), just like in FX chips and concept of CCX is oddly similar to module. Infinity fabric was literally refreshed HyperTransport.

Also die shots of module and CCX:

Say what you want, but layout is stupidly similar too. The only difference is that instead of doubled physical ALUs, they just made them logical instead. AMD themselves say that only 30% of performance bump came from architecture alone too:

AMD%20Ryzen%20Tech%20Day%20-%20Architecture%20Keynote-14%20-%20Copy_575px.jpg

So if they hypothetically scaled down FX chips to smaller node and used those gains to add clock speed, Zen 1 may not have been any faster and that's not considering DDR3 handicap.

And similarities don't end here, I guess this quote explains a lot:
"The latest generation of Bulldozer, using ‘Excavator’ cores under the Carrizo name for the chip as a whole, is actually quite a large jump from the original Bulldozer design. We have had extensive reviews of Carrizo for laptops, as well as a review of the sole Carrizo-based desktop CPU, and the final variant performed much better for a Bulldozer design than expected. The fundamental base philosophy was unchanged, however the use of new GPUs, a new metal stack in the silicon design, new monitoring technology, new threading/dispatch algorithms and new internal analysis techniques led to a lower power, higher performance version. This was at the expense of super high-end performance above35W, and so the chip was engineered to focus at more mainstream prices, but many of the technologies helped pave the way for the new Zen floorplan."

BTW Carrizo also made to desktops in shape of Athlon X4 845, which was superior to older 880K.

Further quotes:
"Former Intel engineer Sam Naffziger, who was already working with AMD when the Zen team was put together, worked in tandem with the Carrizo and Zen teams on building internal metrics to assist with power as well."

"When we reported that Jim had left AMD, a number of people in the industry seemed confused: Zen wasn’t due for another year at best, so why had he left? The answers we had from AMD were simple – Jim and others had built the team, and laid the groundwork for Zen. With all the major building blocks in place, and simulations showing good results, all that was needed was fine tuning. Fine tuning is more complex than it sounds: getting caches to behave properly, moving data around the fabric at higher speeds, getting the desired memory and latency performance, getting power under control, working with Microsoft to ensure OS compatibility, and working with the semiconductor fabs (in this case, GlobalFoundries) to improve yields. None of this is typically influenced by the man at the top, so Jim’s job was done."

"This means that in the past year or so, AMD has been working on that fine tuning. This is why we’ve slowly seen more and more information coming out of AMD regarding microarchitecture and new features as the fine-tuning slots into place"

So basically, there was a lot of concurrent work between Carrizo people and Zen people. Nowhere in article they say that Zen is completely clean sheet design built form nothing like bulldozer was. It's more like advanced enhancement of Carrizo

"With the Zen microarchitecture, AMD’s goal was to return to high-end CPU performance, or specifically having a competitive per-core performance again. Trying to compete with high frequency while blowing the power budget, as seen with the FX-9000 series running at 220W, was not going to cut it. The base design had to be efficient, low latency, be able to operate at high frequency, and scale performance with frequency."

Which was mostly solved by Carrizo. it soundly beats Godavari core with around 20% IPC gain, 10% frequency loss and 40% lower power usage. Godavari was already a reasonable step up from Vishera and Vishera was a reasonable step up from Zambezi.

"In AMD’s initial announcements on Zen, the goal of a 40% gain in IPC was put into the ecosystem, with no increase in power. It is pertinent to say that there were doubts, and many enthusiasts/analysts were reluctant to claim that AMD had the resources or nous to both increase IPC by 40% and maintain power consumption. In normal circumstances, without a significant paradigm shift in the design philosophy, a 40% gain in efficiency can be a wild goose chase."

Yeah, but Carrizo was already made with quite similar gains so? Like mentioned earlier, Zen 1 aimed to scale performance with clock speed, but it was woeful at scaling voltage with clock speed and had quite low frequency wall. Again, sounds exactly like Carrizo. So why Carrizo didn't make any waves? Because it was only very temporary architecture made for laptops mainly and had trouble with pushing higher clock speeds. Also it was on rather old node and like other APUs, it didn't have any L3 cache and had severely limited L2 cache.

"I mention this because one of AMD’s goals, aside from the main one of 40%+ IPC, is to have small cores. As a result, as we’ll see in this review, various trade-offs are made."

It's literally what FX concept was too. Build small cores, that clock high and compensate lack of single core perf with moar coars.

"Zen is a pretty traditional x86 architecture as an overall machine, but there is optimization work to do. What makes this a bit different is that most of our optimization work is more on the developer side – we work with them to really understanding the bottlenecks in their code on our microarchitecture. I see many apps being tuned and getting better going on as we work forward on this."

In human language, that means that they neutered FX's oddities, but left design philosophy mostly the same.

The further text goes into hardcore details, but not necessarily comparison with FX chips. Either way, read about Carrizo:

AMD Launches Carrizo: The Laptop Leap of Efficiency and Architecture Updates

www.anandtech.com

Reads similar to Zen deep dive, just that there are less changes. Still my point was that Zen 1 was fundamentally similar to FX and its derivatives and like I said move to zen is "advanced enhancement" rather than completely new architecture build from zero, like Bulldozer was. Change is more similar to what happened when AMD moved from K8 to K10

Valantar · May 11, 2022

The red spirit said:
Dude, that's such a joke. You only get Quadro (sorry, RTX A or whatever else) if you absolutely can't afford any downtime. It's straight up propaganda, that you need Quadro. Have you seen workstation forums? People have already questioned a value of Quadros in CAD applications. We already saw that supercomputers can be made from literal consoles. They don't see any worth in that and there is barely any advantage to those cards even in software. You only buy them for niche of niches. Perhaps you are Pixar and you need shit load of vRAM, perhaps you are medical researcher that needs super high res brain maps, but that's all. Many professionals just use "peasant" Radeons and GeForces. People have already figured out that software is not worth their time, when you get better hardware.

For a lot of middle-ground prosumer apps, sure. For high stakes professional CAD work, medical modelling, scientific modelling, or anything remotely similar to that? No. I mean, sure, people also use consumer GPUs for stuff like that. At scale? That's debatable - a lot of people dabble in these kinds of things as a hobby or side gig, after all. But in any kind of sizeable industry perspective? No.

Remember, forums, even workstation forums, are populated by enthusiasts mainly. Most professionals are not hardware enthusiasts (though one could argue that their jobs push them in that direction, that doesn't mean most are that). And yes, it's obvious that budget-constrained small business owners/employees or freelancers will question the value of Quadros, as at those scales the price differences matter quite a lot. And quite a few have found that they do not benefit (sufficiently or at all) from the software and driver features of Pro GPUs - which renders them outside of the intended market for those GPUs in the first place. This is in no way contradictory to what I have been arguing.

Of course, you're entirely failing to demonstrate that there is somehow a desire for FP64 in these circles, which I guess is why you keep shifting the goal posts ever further into unrelated subjects.

The red spirit said:
And if software is so important, then why it's not subscription? Also if you need mission critical software, you are better off making it custom in linux like a real pro, instead on relying on proprietary crap.

A lot of it is. Professional software licencing is really, really, really complex, with many different licencing models. Some have various options, for continued support with future updates for X time, some give you a single version with stability/bug fixes for a set price, etc. Many also licence software either per CPU core or similar. There are lots of models out there.

The red spirit said:
Maybe because it is called a professional GPU, not professional software pack with live support 24/7? BTW AMD gives pro software for consumer grade cards like RX 580, only nV doesn't. Because nVidia.

Those "Pro" drivers are roughly equivalent to Nvidia's Studio drivers. Drivers know what GPU they are running on, after all. Nor do these drivers contain the support infrastructure that is available to pro customers.

The red spirit said:
Less cores often, lower speed. Looks harvested on low end. On high end it more complicated.

No, it's more complicated than that, period. The RTX A4000 is for example fully enabled, unlike the 3070 (though the 3070 Ti is - but that was also launched later). The RTX A2000 is slightly cut down (3328 cores vs. 2560 in the 3050, 3584 in the 3060, 3840 fully enabled, found only in 3060 mobile), but it's also a 70W GPU, unlike the 170W (!) 3060 that has 7.1% more cores. Different product stacks are segmented differently due to different priorities and target markets. Quadros and Nvidia RTX cards are not in any broad way more cut down than GeForce cards, they are just segmented differently due to having different priorities.

The red spirit said:
You can literally adjust TDP in software. Stricter QC is mostly a myth, there's no proof to that.

Software TDP adjustments are quite limited, and adjusting power limits in software is not the same as Nvidia's QC process finding a desirable balance between power consumption, performance, and stability. As for stricter QC being a myth: yeah, sure. 'Cause enterprise customers wouldn't be pissed at all if their GPUs failed as frequently as consumer GPUs do, of course not.

The red spirit said:
Where nV and AMD are also abandoning it?

FP64? Yep, because even most pro customers just don't want or need it. As I've been saying the whole time - essentially only datacenters want or need FP64. And anyone else in need of it can rent a compute node on one of the dozens of cloud compute services offering it, running off of those aforementioned FP64-heavy HPC GPUs, for far less money than buying their own hardware, while running their own software still.

The red spirit said:
Sure, but it's exactly the same philosophy. Moar cores, lower price, power usage be damned, PR to the moon.

Wait, what? Zen's main advantage and improvement was its efficiency, delivering 8 cores with competitive IPC in the same power draw as Intel gave you 4, and at less than half the power of 8 FX cores.

I'll stick the rest of this extremely off-topic CPU nonsense in a spoiler tag so as not to entirely drown people.

The red spirit said:
And Zen 1 also launched, when SMT was still new idea, another not tried technology in consumer market, just like CMT before. In terms of specs, it seems that cache size traditionally remained big (just like FX with high latency), just like in FX chips and concept of CCX is oddly similar to module. Infinity fabric was literally refreshed HyperTransport.

Also die shots of module and CCX:

Say what you want, but layout is stupidly similar too. The only difference is that instead of doubled physical ALUs, they just made them logical instead. AMD themselves say that only 30% of performance bump came from architecture alone too:

So if they hypothetically scaled down FX chips to smaller node and used those gains to add clock speed, Zen 1 may not have been any faster and that's not considering DDR3 handicap.

And similarities don't end here, I guess this quote explains a lot:
"The latest generation of Bulldozer, using ‘Excavator’ cores under the Carrizo name for the chip as a whole, is actually quite a large jump from the original Bulldozer design. We have had extensive reviews of Carrizo for laptops, as well as a review of the sole Carrizo-based desktop CPU, and the final variant performed much better for a Bulldozer design than expected. The fundamental base philosophy was unchanged, however the use of new GPUs, a new metal stack in the silicon design, new monitoring technology, new threading/dispatch algorithms and new internal analysis techniques led to a lower power, higher performance version. This was at the expense of super high-end performance above35W, and so the chip was engineered to focus at more mainstream prices, but many of the technologies helped pave the way for the new Zen floorplan."

BTW Carrizo also made to desktops in shape of Athlon X4 845, which was superior to older 880K.

Further quotes:
"Former Intel engineer Sam Naffziger, who was already working with AMD when the Zen team was put together, worked in tandem with the Carrizo and Zen teams on building internal metrics to assist with power as well."

"When we reported that Jim had left AMD, a number of people in the industry seemed confused: Zen wasn’t due for another year at best, so why had he left? The answers we had from AMD were simple – Jim and others had built the team, and laid the groundwork for Zen. With all the major building blocks in place, and simulations showing good results, all that was needed was fine tuning. Fine tuning is more complex than it sounds: getting caches to behave properly, moving data around the fabric at higher speeds, getting the desired memory and latency performance, getting power under control, working with Microsoft to ensure OS compatibility, and working with the semiconductor fabs (in this case, GlobalFoundries) to improve yields. None of this is typically influenced by the man at the top, so Jim’s job was done."

"This means that in the past year or so, AMD has been working on that fine tuning. This is why we’ve slowly seen more and more information coming out of AMD regarding microarchitecture and new features as the fine-tuning slots into place"

So basically, there was a lot of concurrent work between Carrizo people and Zen people. Nowhere in article they say that Zen is completely clean sheet design built form nothing like bulldozer was. It's more like advanced enhancement of Carrizo

"With the Zen microarchitecture, AMD’s goal was to return to high-end CPU performance, or specifically having a competitive per-core performance again. Trying to compete with high frequency while blowing the power budget, as seen with the FX-9000 series running at 220W, was not going to cut it. The base design had to be efficient, low latency, be able to operate at high frequency, and scale performance with frequency."

Which was mostly solved by Carrizo. it soundly beats Godavari core with around 20% IPC gain, 10% frequency loss and 40% lower power usage. Godavari was already a reasonable step up from Vishera and Vishera was a reasonable step up from Zambezi.

"In AMD’s initial announcements on Zen, the goal of a 40% gain in IPC was put into the ecosystem, with no increase in power. It is pertinent to say that there were doubts, and many enthusiasts/analysts were reluctant to claim that AMD had the resources or nous to both increase IPC by 40% and maintain power consumption. In normal circumstances, without a significant paradigm shift in the design philosophy, a 40% gain in efficiency can be a wild goose chase."

Yeah, but Carrizo was already made with quite similar gains so? Like mentioned earlier, Zen 1 aimed to scale performance with clock speed, but it was woeful at scaling voltage with clock speed and had quite low frequency wall. Again, sounds exactly like Carrizo. So why Carrizo didn't make any waves? Because it was only very temporary architecture made for laptops mainly and had trouble with pushing higher clock speeds. Also it was on rather old node and like other APUs, it didn't have any L3 cache and had severely limited L2 cache.

"I mention this because one of AMD’s goals, aside from the main one of 40%+ IPC, is to have small cores. As a result, as we’ll see in this review, various trade-offs are made."

It's literally what FX concept was too. Build small cores, that clock high and compensate lack of single core perf with moar coars.

"Zen is a pretty traditional x86 architecture as an overall machine, but there is optimization work to do. What makes this a bit different is that most of our optimization work is more on the developer side – we work with them to really understanding the bottlenecks in their code on our microarchitecture. I see many apps being tuned and getting better going on as we work forward on this."

In human language, that means that they neutered FX's oddities, but left design philosophy mostly the same.

The further text goes into hardcore details, but not necessarily comparison with FX chips. Either way, read about Carrizo:

AMD Launches Carrizo: The Laptop Leap of Efficiency and Architecture Updates

www.anandtech.com

Reads similar to Zen deep dive, just that there are less changes. Still my point was that Zen 1 was fundamentally similar to FX and its derivatives and like I said move to zen is "advanced enhancement" rather than completely new architecture build from zero, like Bulldozer was. Change is more similar to what happened when AMD moved from K8 to K10

Well, that was a long-form explanation of how you simply do not understand what "architecture" means in CPU design. Using die-level block diagrams and how the blocks are arranged as an argument for the architectures being similar proves this unequivocally. Here's the thing: the block diagram says essentially nothing about how those blocks are constructed. That's the architecture. Block diagrams roughly sketch out a die layout, on an extremely simplified level. Die layout and architecture are not the same. It's like looking at a photo of a house and saying that makes you intimately familiar with the personalities and lives of the people living in it - you can probably tell a few high level details, but you have literally none of the required information to answer the question in a satisfactory way.

You also show that you don't understand the timeframes involved in CPU design: CPU architectures have 5+-year development cycles. Of course there is concurrent work between Carrizo (2015) and Zen (2017) - it would literally be impossible for Zen to be designed in its entirety after Carrizo was finished. That quote you included about Jim Keller leaving just illustrates this - the architectural design of Zen was done just slightly later than Carrizo was done (and Carrizo had, at the time it was launched, itself been done for about a year - and so on and so on). Remember, Keller joined AMD in 2012, and was hired specifically to create a new architecture - Zen. With Carrizo being a refinement of an existing architecture rather than a new one it likely had a shorter development cycle - it's entirely possible that design on Zen started before design on Carrizo, and even that the Carrizo design was informed by findings from early stages of Zen design. But your argument here essentially boils down to "they were developed at the same time, so they must be very similar" which is just a fundamental misunderstanding of how CPU architectural design work happens in the real world.

Further misconceptions:
- SMT being "new" and "untested" for Zen1. The first hardware SMT implementations happened in 1968, though only in IBM's research labs. Intel introduced its SMT - HyperThreading - in 2002. IBM introduced 2-way SMT with Power5 in 2004, with 8-way SMT with Power8 in 2014. Oracle had 8-way SMT in 2010 with SPARC T3. And so on, and so on. SMT was widely established across the computing industry by the time early Zen design work started in 2012, and certainly by the time it launched in 2017. It was neither new nor untested in any reasonable understanding of those terms.
- That there is a significant parallelism between FX and Zen in what you describe as "Build small cores, that clock high and compensate lack of single core perf with moar coars". Zen1 never clocked particularly high. It couldn't, both due to node and architectural limitations. It took two architectural revisions and a major node change to change that. And even if this was similar, it is so broad and vague that it doesn't describe the CPU's architectural features whatsoever.
- Zen's IPC gains are on top of Carrizo. Carrizo was an impressive refinement of an overall poor architecture, but Zen absolutely trounced it, despite its 20% gains.
- There is extremely little in common between a CCX and an FX Module. A module is a cluster of two tightly integrated INT cores with a shared FP core (which has a partial form of SMT, though not particularly effective), as well as a shared L2 cache. Even on a very high level, a CCX is very different from this, with INT and FP cores being matched and not shared, and L2 caches being private to each int+fp core. But this is all very high level - differences are far, far greater than this on a lower level. Where do you think that 40% IPC increase comes from? From optimizations of how the cores themselves work.
- You claim Carrizo solved most of FX's problems. If so, then how does that non-power-limited desktop Athlon X4 845 compare to a Zen1 CPU with similar specifications? Oh, right, it gets absolutely stomped on. Sure, the 1300X doesn't win every test, but it wins the vast majority, and often by very significant margins. Also: uneven performance scaling - some tests improving massively, others not - is a surefire sign of low-level architectural differences, as different parts of the core in each architecture performs differently due to them being designed differently. If the 1300X was just a minor refinement of Carrizo, it would see relatively linear gains overall.
- Saying a 20% IPC gain is "quite similar" to a 40% IPC gain is ... something. I mean, sure, both are much more than 0! But then again, one is literally twice the other. Especially considering most IPC gains for CPUs are in the 5-15% range, 40% is far more exceptional than 20%, even if 20% is also very good. And, of course, those 40% were on top of the 20% gain of the other - they're not comparing to the same base number. And Carrizo was a very late revision of a mature design - which makes its 20% gains impressive, but also underscores how radical a departure Zen was from that base design. There is no CPU architecture in history where a late-generation refresh has managed to improve IPC by 40% (nor lay the groundwork for subsequent iterations improving IPC by ~50% on top of that).

The way you're arguing here and what you're taking away from the quotes you are including is highly indicative that you simply fail to understand the chasm that is between a high-level feature description of a CPU and the actual in-silicon implementation of its architecture. Zen and Carrizo share various familial traits - they are after all both X86/AMD64 CPU architectures developed by the same company. No doubt a handful of features in the core are very similar too. But the overall core design, and the majority of its components? Significantly changed - otherwise we wouldnt' have seen the changes we saw with Zen, nor the subsequent changes we've seen with further Zen revisions.

AusWolf · May 11, 2022

The red spirit said:
I said potential games.

The potential has been there for a decade. No one used it.

The red spirit said:
It's a bit of unobtanium hardware. It's literally not available in some countries.

Are you talking about retail channels? Don't forget that large-scale corporate customers rarely walk into a PC store, or hit up Amazon asking for 20-50-100 graphics cards. They are often directly connected with Nvidia / AMD through contracts.

The red spirit said:
They just switched from SMT to CMT, added some tweaks, improvements and boom, Ryzen was made. Compared to huge leap from K10 to bulldozer, Ryzen is truly similar to FX.

You're oversimplifying how a CPU works. With that sentiment, my 11700 is the same as 8 Pentium 4 HTs and some L3 cache, as it has 8 Hyper-Threaded cores while the Pentium 4 had 1.

The red spirit said:
It's a Glide API wrapper. Works well for just that. Other old software compatibility problems aren't fixed by it. Frankly, there's almost no point to use it. Even GeForce 256 was superior to 3dfx.

First you wanted to sell me on it, now you're saying there's no point. What the heck?

But then, comparing GeForce and 3dfx cards is pointless, imo. DirectX (7 or 8, I think?) was superior to Glide, but if you wanted to run Glide, you needed a 3dfx card. Not every game supported both.

The red spirit said:
Less cores often, lower speed. Looks harvested on low end. On high end it more complicated.

Don't forget to add lower voltage and higher stability to your list. Don't you think that requires some binning?

The red spirit said:
Sure, but it's exactly the same philosophy. Moar cores, lower price, power usage be damned, PR to the moon. And Zen 1 also launched, when SMT was still new idea, another not tried technology in consumer market, just like CMT before. In terms of specs, it seems that cache size traditionally remained big (just like FX with high latency), just like in FX chips and concept of CCX is oddly similar to module. Infinity fabric was literally refreshed HyperTransport.

Also die shots of module and CCX:

Say what you want, but layout is stupidly similar too.

I think you're confusing chip layout and core design. Here's an article comparing the Bulldozer and Zen 1 core.

The red spirit · May 11, 2022

Valantar said:
For a lot of middle-ground prosumer apps, sure. For high stakes professional CAD work, medical modelling, scientific modelling, or anything remotely similar to that? No. I mean, sure, people also use consumer GPUs for stuff like that. At scale? That's debatable - a lot of people dabble in these kinds of things as a hobby or side gig, after all. But in any kind of sizeable industry perspective? No.

My dad works in power plant/heating engineering I know full well what their machine have. You won't see any Quadros, FirePros or Xeons ever in those. His machine in current job has specs like this:
i7 2600K
Intel motherboard
stock cooler (thermal paste never changed)
some cheapo case with suffocated ventilation and only USB 2.0 port available
Radeon HD 7770
Codegen power supply
4x4GB DDR3 RAM
Storage was recently upgraded from hard drive to 512GB SSD, SSD wasn't even screwed down
Windows 7

In his last job he had some random machine with AMD Phenom quad core chip. At home he used ancient PC to do some overtime, specs were:
AMD Athlon 64 3200+
DFI K8T800 Pro ALF
nVidia FX 5200 128MB
stock cooler
80GB IDE HDD
Windows XP 32 bit

I remember he once got work laptop. It was a higher end laptop... from decade ago. It only had Core 2 Duo something, 2GB RAM, 5400 rpm HDD and Intel GMA.

I have been in hospitals too, where I was often examined and most of the time there were windows XP machine with Core 2 era hardware as late as 2018. There were some Windows 2000 machine too. My university has regular computers. They are various models with circa Phenom X4 - Sandy Bridge era i3s. Onboard graphics only. Only IT department has one beastly machine with Xeon, Titans in SLI, 32GB RAM and (wait for it) Windows 7. My school's engineering/drafting (CAD) class only had Pentium D, 4GB RAM, Intel integrated Windows XP machines. My university's computer lab only had Sandy i3 with 4GB RAM, HDD only storage and "blazing fast" Intel HD 2000 graphics, which took forever to draw ArcGIS maps.

I'm sorry, but I think you are heavily overestimating what businesses actually use. Xeons, Quadros and Radeon Pros are really a luxury products and nobody buys them unless strictly necessary and since those parts are basically the same as consumer hardware, they are barely ever used. It's not even about cost, but that IT people don't even know that they could benefit from those parts. You also heavily overestimate the staff in healthcare sector and their IT knowledge. Some of them are Unga Bungas with computers. They only know (hopefully) how to do healthcare, but not healthcare + IT.

BTW respect to that old ass HD 7770 for soldiering on for over decade without ever being cleaned. Old school GCN aged quite well.

Valantar said:
No, it's more complicated than that, period. The RTX A4000 is for example fully enabled, unlike the 3070 (though the 3070 Ti is - but that was also launched later). The RTX A2000 is slightly cut down (3328 cores vs. 2560 in the 3050, 3584 in the 3060, 3840 fully enabled, found only in 3060 mobile), but it's also a 70W GPU, unlike the 170W (!) 3060 that has 7.1% more cores. Different product stacks are segmented differently due to different priorities and target markets. Quadros and Nvidia RTX cards are not in any broad way more cut down than GeForce cards, they are just segmented differently due to having different priorities.

It just depends on how they decided to segmentate their hardware that gen.

Valantar said:
Software TDP adjustments are quite limited, and adjusting power limits in software is not the same as Nvidia's QC process finding a desirable balance between power consumption, performance, and stability. As for stricter QC being a myth: yeah, sure. 'Cause enterprise customers wouldn't be pissed at all if their GPUs failed as frequently as consumer GPUs do, of course not.

Guess what, nVidia/AMD don't do any extra. Dude, I have used FirePro v5800 and v3750 cards. There wasn't any extra robustness on them at all. You get literally identical PCBs to consumer tier hardware, often the same voltage, crappy reference coolers, that just prevent GPU from melting and that's all. They are really nothing more than consumer tier cards with lower clock speed, which in turn dramatically lowers TDP. There is no magic in them, no secret features. You get a bit more knobs in control panel, but that's all. The real difference is their drivers. I have tried playing games on those cards and I immediately noticed vastly superior depth rendering. Also desktop rendering appeared to be sharper. That's compared to GTX 650 Ti. And yes I set HDMI to proper black level, as well as full RGB. There were some actual visual quality differences.

In terms of QC those cards just didn't have any extra. v3750 is literally the same HD 4670 down to same capacitors used. Literally identical. Maybe super high end workstation cards actually get higher QC, but there's no evidence for that. BTW that v3750 BSODed every time YT was launched or if YT video was embedded into website, so same crappy ATi drivers also make into FirePros. AMD today give "pro" drivers to Polaris cards, but those pro drivers are literally old mainstream drivers, which are somewhat more stable. There was no other advantage to them. You also get the same as mainstream video quality. And yes, new features take forever to get incorporated and I was finally fed up with them, when I saw driver problems with modern games and that problem just didn't exist in mainstream drivers anymore. Pro drivers weren't updated for months.

Maybe nVidia does things differently, but AMD's pro stuff is truly nothing special.

Valantar said:
FP64? Yep, because even most pro customers just don't want or need it. As I've been saying the whole time - essentially only datacenters want or need FP64. And anyone else in need of it can rent a compute node on one of the dozens of cloud compute services offering it, running off of those aforementioned FP64-heavy HPC GPUs, for far less money than buying their own hardware, while running their own software still.

More like nV and AMD completely don't give a shit about them, rather than them being useless. They don't have any competition in this market, so they can do as they please.

Valantar said:
Wait, what? Zen's main advantage and improvement was its efficiency, delivering 8 cores with competitive IPC in the same power draw as Intel gave you 4, and at less than half the power of 8 FX cores.

I guess I forgot that, however Zen ran obscenely hot at launch. Just as hot as FX chips. AMD still haven't completely solved that problem with connecting chiplets to IHS, therefore heat is trapped in transmission. FX chips weren't as hot as many remember, just compared to Sandy I series they were. Initially maximum temperature spec was just 62C due to poor sensor calibration, but later it was lifted to 72C. FX chips were great at being spread out design, which was very efficient at transferring heat to IHS. I also found out that FX 6300 can run passively with Mugen 4 heatsink in enclosed case and it only reaches 58C reported temperature in long prime95 small FFT run. Do that with Ryzen 1600x and it will throttle, not to mention reach way higher temperature.

I'll stick the rest of this extremely off-topic CPU nonsense in a spoiler tag so as not to entirely drown people.

Well, that was a long-form explanation of how you simply do not understand what "architecture" means in CPU design. Using die-level block diagrams and how the blocks are arranged as an argument for the architectures being similar proves this unequivocally. Here's the thing: the block diagram says essentially nothing about how those blocks are constructed. That's the architecture. Block diagrams roughly sketch out a die layout, on an extremely simplified level. Die layout and architecture are not the same. It's like looking at a photo of a house and saying that makes you intimately familiar with the personalities and lives of the people living in it - you can probably tell a few high level details, but you have literally none of the required information to answer the question in a satisfactory way.

You also show that you don't understand the timeframes involved in CPU design: CPU architectures have 5+-year development cycles. Of course there is concurrent work between Carrizo (2015) and Zen (2017) - it would literally be impossible for Zen to be designed in its entirety after Carrizo was finished. That quote you included about Jim Keller leaving just illustrates this - the architectural design of Zen was done just slightly later than Carrizo was done (and Carrizo had, at the time it was launched, itself been done for about a year - and so on and so on). Remember, Keller joined AMD in 2012, and was hired specifically to create a new architecture - Zen. With Carrizo being a refinement of an existing architecture rather than a new one it likely had a shorter development cycle - it's entirely possible that design on Zen started before design on Carrizo, and even that the Carrizo design was informed by findings from early stages of Zen design. But your argument here essentially boils down to "they were developed at the same time, so they must be very similar" which is just a fundamental misunderstanding of how CPU architectural design work happens in the real world.

Further misconceptions:
- SMT being "new" and "untested" for Zen1. The first hardware SMT implementations happened in 1968, though only in IBM's research labs. Intel introduced its SMT - HyperThreading - in 2002. IBM introduced 2-way SMT with Power5 in 2004, with 8-way SMT with Power8 in 2014. Oracle had 8-way SMT in 2010 with SPARC T3. And so on, and so on. SMT was widely established across the computing industry by the time early Zen design work started in 2012, and certainly by the time it launched in 2017. It was neither new nor untested in any reasonable understanding of those terms.
- That there is a significant parallelism between FX and Zen in what you describe as "Build small cores, that clock high and compensate lack of single core perf with moar coars". Zen1 never clocked particularly high. It couldn't, both due to node and architectural limitations. It took two architectural revisions and a major node change to change that. And even if this was similar, it is so broad and vague that it doesn't describe the CPU's architectural features whatsoever.
- Zen's IPC gains are on top of Carrizo. Carrizo was an impressive refinement of an overall poor architecture, but Zen absolutely trounced it, despite its 20% gains.
- There is extremely little in common between a CCX and an FX Module. A module is a cluster of two tightly integrated INT cores with a shared FP core (which has a partial form of SMT, though not particularly effective), as well as a shared L2 cache. Even on a very high level, a CCX is very different from this, with INT and FP cores being matched and not shared, and L2 caches being private to each int+fp core. But this is all very high level - differences are far, far greater than this on a lower level. Where do you think that 40% IPC increase comes from? From optimizations of how the cores themselves work.
- You claim Carrizo solved most of FX's problems. If so, then how does that non-power-limited desktop Athlon X4 845 compare to a Zen1 CPU with similar specifications? Oh, right, it gets absolutely stomped on. Sure, the 1300X doesn't win every test, but it wins the vast majority, and often by very significant margins. Also: uneven performance scaling - some tests improving massively, others not - is a surefire sign of low-level architectural differences, as different parts of the core in each architecture performs differently due to them being designed differently. If the 1300X was just a minor refinement of Carrizo, it would see relatively linear gains overall.
- Saying a 20% IPC gain is "quite similar" to a 40% IPC gain is ... something. I mean, sure, both are much more than 0! But then again, one is literally twice the other. Especially considering most IPC gains for CPUs are in the 5-15% range, 40% is far more exceptional than 20%, even if 20% is also very good. And, of course, those 40% were on top of the 20% gain of the other - they're not comparing to the same base number. And Carrizo was a very late revision of a mature design - which makes its 20% gains impressive, but also underscores how radical a departure Zen was from that base design. There is no CPU architecture in history where a late-generation refresh has managed to improve IPC by 40% (nor lay the groundwork for subsequent iterations improving IPC by ~50% on top of that).

The way you're arguing here and what you're taking away from the quotes you are including is highly indicative that you simply fail to understand the chasm that is between a high-level feature description of a CPU and the actual in-silicon implementation of its architecture. Zen and Carrizo share various familial traits - they are after all both X86/AMD64 CPU architectures developed by the same company. No doubt a handful of features in the core are very similar too. But the overall core design, and the majority of its components? Significantly changed - otherwise we wouldnt' have seen the changes we saw with Zen, nor the subsequent changes we've seen with further Zen revisions.

[/QUOTE]

AusWolf said:
The potential has been there for a decade. No one used it.

KSP

But for real, it's very hard to find info about how PhysX worked, I know that it did FP, but which? KSP is the only game I found example of utilizing FP64. Others might use it to, but like I said, nobody says that their game uses FP64 or even FP32. companies just don't share their super technical details so freely.

I read more and it seems that Arma 2, Universe Sandbox may use it. Star Citizen's game engine does use FP64. Hellion certainly uses FP64.

Form what I read, FP64 is being utilized more off-screen, where compute capabilities can be utilized. But on other hand, Unity engine has used double precision floats too, but I can't find where specifically. UE5 is speculated to use FP64, but there's no confirmation of that. UE4 can support FP64 if needed, but that's avoided due to performance penalty, lack of support on certain hardware for doubles altogether and hardness of troubleshooting. So devs do a lot of work to make things work in FP32. If FP64 performance was great and capable cards common, it could take gaming to quite a different space than it is today. FP32 is a bottleneck, not a natural sweet spot.

AusWolf said:
Are you talking about retail channels? Don't forget that large-scale corporate customers rarely walk into a PC store, or hit up Amazon asking for 20-50-100 graphics cards. They are often directly connected with Nvidia / AMD through contracts.

Yes them, dunno about you, but in Lithuania even big businesses get their hardware from retailers, retailer do get their stuff from single big logistic center and that center gets stuff from perhaps nVidia. No business would want to deal with expense or hassle of basically doing retail themselves.

AusWolf said:
First you wanted to sell me on it, now you're saying there's no point. What the heck?

But then, comparing GeForce and 3dfx cards is pointless, imo. DirectX (7 or 8, I think?) was superior to Glide, but if you wanted to run Glide, you needed a 3dfx card. Not every game supported both.

Sure, if your game doesn't support anything else, then nGlide saves you, but many games soon supported DirectX and it was superior in picture quality. So if you have a game that supports several APIs, DirectX or even OpenGL is better to use than Glide. And BTW I'm not selling you anything, nGlide is free software.

AusWolf said:
I think you're confusing chip layout and core design. Here's an article comparing the Bulldozer and Zen 1 core.

I'm not, but don't you think that FX revisions also changed core internals? It did, I just don't see any point in mentioning that, because it's obvious. Chip layout was indeed very similar, but if you compare K10 core layout to FX, there's no similarity. Zen didn't do that.

Valantar · May 12, 2022

Again, don't want to completely pollute the thread, so...

The red spirit said:
My dad works in power plant/heating engineering I know full well what their machine have. You won't see any Quadros, FirePros or Xeons ever in those. His machine in current job has specs like this:
i7 2600K
Intel motherboard
stock cooler (thermal paste never changed)
some cheapo case with suffocated ventilation and only USB 2.0 port available
Radeon HD 7770
Codegen power supply
4x4GB DDR3 RAM
Storage was recently upgraded from hard drive to 512GB SSD, SSD wasn't even screwed down
Windows 7

In his last job he had some random machine with AMD Phenom quad core chip. At home he used ancient PC to do some overtime, specs were:
AMD Athlon 64 3200+
DFI K8T800 Pro ALF
nVidia FX 5200 128MB
stock cooler
80GB IDE HDD
Windows XP 32 bit

I remember he once got work laptop. It was a higher end laptop... from decade ago. It only had Core 2 Duo something, 2GB RAM, 5400 rpm HDD and Intel GMA.

I have absolutely zero idea what tasks those PCs were running, so ... okay? If anything, that just indicates that those tasks can't have been particularly performance or precision sensitive. I would be rather shocked if the people designing or building a power plant, or running CFI simulations for a reactor, a high-load heat exchange system, etc. were using that class of software.

The red spirit said:
I have been in hospitals too, where I was often examined and most of the time there were windows XP machine with Core 2 era hardware as late as 2018. There were some Windows 2000 machine too. My university has regular computers. They are various models with circa Phenom X4 - Sandy Bridge era i3s. Onboard graphics only. Only IT department has one beastly machine with Xeon, Titans in SLI, 32GB RAM and (wait for it) Windows 7. My school's engineering/drafting (CAD) class only had Pentium D, 4GB RAM, Intel integrated Windows XP machines. My university's computer lab only had Sandy i3 with 4GB RAM, HDD only storage and "blazing fast" Intel HD 2000 graphics, which took forever to draw ArcGIS maps.

You seem to be mistaking statements of "these products are used in these industries" with "these products are used in every instance, everywhere in these industries". Nobody here has made the latter claim. The average PC running in a hospital or university isn't likely running heavy computational workloads, but might just be used for reading and writing to various databases and other organizational tasks, and only needs hardware to match. Heck, a lot of stuff in hospitals and all kinds of laboratories is run on low-power, low performance embedded terminals of various kinds as well.

Go talk to someone managing hardware for an MRI or CT scanner, and ask them what GPUs are running those tasks. Anyone in any kind of medical imaging, really. Or someone doing research on any type of molecular biology or biochemistry that involves any kind of modelling or simulation. That's the stuff we're talking about - the stuff that needs high performance compute, ECC, and to some degree also FP64, though with the introduction of more machine learning based approaches, less of that going forward (as high precision is only needed for training models, not running inference on them - that's FP32, FP16, INT8, or some AI-specific format like Bfloat16).

Of course a lot of the heavier compute tasks, particularly within research, is increasingly offloaded to off-site cloud compute services - Azure, AWS, or any of dozens and dozens of smaller providers. Unless the hospital or university is sufficiently wealthy to build and run their own HPC clusters, of course.

The red spirit said:
I'm sorry, but I think you are heavily overestimating what businesses actually use. Xeons, Quadros and Radeon Pros are really a luxury products and nobody buys them unless strictly necessary and since those parts are basically the same as consumer hardware, they are barely ever used. It's not even about cost, but that IT people don't even know that they could benefit from those parts. You also heavily overestimate the staff in healthcare sector and their IT knowledge. Some of them are Unga Bungas with computers. They only know (hopefully) how to do healthcare, but not healthcare + IT.

See above. You're misinterpreting "these products are used in these industries" as "these products are always, exclusively used in these industries." We are telling you where the relevant use cases are, we are not making all-encompassig claims about the hardware used in all aspects of these industries.

The red spirit said:
BTW respect to that old ass HD 7770 for soldiering on for over decade without ever being cleaned. Old school GCN aged quite well.

Absolutely - GCN has aged very well, especially in terms of compute, even if its efficiency is obviously crap by today's standards. And even consumer electronics can run for a long, long time - I ran my old Core2Quad for nearly a decade as my main CPU, with a 31% overclock for the last few years of my ownership of it, before selling it on to someone else (who at least never reported to me that it had failed, so I'm assuming it still works). GPUs can also work for a long time as long as they aren't hammered or some onboard component fails - MOSFETs and capacitors are the main killers of those things, on top of ensuring good overall designs that avoid internal resonance, hot spots, etc.

The red spirit said:
It just depends on how they decided to segmentate their hardware that gen.

Yes, that's what I've been saying the whole time. That is not equal to them being "crippled" or "harvested". Different tools for different uses made to different standards.

The red spirit said:
Guess what, nVidia/AMD don't do any extra. Dude, I have used FirePro v5800 and v3750 cards. There wasn't any extra robustness on them at all. You get literally identical PCBs to consumer tier hardware, often the same voltage, crappy reference coolers, that just prevent GPU from melting and that's all. They are really nothing more than consumer tier cards with lower clock speed, which in turn dramatically lowers TDP. There is no magic in them, no secret features. You get a bit more knobs in control panel, but that's all. The real difference is their drivers. I have tried playing games on those cards and I immediately noticed vastly superior depth rendering. Also desktop rendering appeared to be sharper. That's compared to GTX 650 Ti. And yes I set HDMI to proper black level, as well as full RGB. There were some actual visual quality differences.

I never said they had "secret features" or magic. Also, have you considered that for a resource-strapped company like AMD was back in those days, they might take the cheaper approach of engineering one good board, rather than one good and one slightly less good but cheaper board? AFAIK the workstation market was less dominated by Nvidia then than now. A lot of this work is also transferable, at least if the workstation board doesn't require something like a ton of PCB layers or more premium PCB materials that would necessitate a separate design for the consumer card.

The red spirit said:
In terms of QC those cards just didn't have any extra. v3750 is literally the same HD 4670 down to same capacitors used. Literally identical. Maybe super high end workstation cards actually get higher QC, but there's no evidence for that. BTW that v3750 BSODed every time YT was launched or if YT video was embedded into website, so same crappy ATi drivers also make into FirePros. AMD today give "pro" drivers to Polaris cards, but those pro drivers are literally old mainstream drivers, which are somewhat more stable. There was no other advantage to them. You also get the same as mainstream video quality. And yes, new features take forever to get incorporated and I was finally fed up with them, when I saw driver problems with modern games and that problem just didn't exist in mainstream drivers anymore. Pro drivers weren't updated for months.

Pro drivers are literally never updated as frequently as consumer ones - that's a feature, not a bug. Bugfixes are ideally pushed out rapidly, but not necessarily, and regular driver updates are intentionally slow, as the last thing you want when doing work is for a driver update to break something. And yes, the same goes for adding features - unless those features add significant value to pro customers, it's safer to leave them out in case they have some unexpected side effect. I kind of doubt AMD - especially considering how little money they had to spend on driver development back then - had the resources to care about youtube crashing a Pro GPU.

The red spirit said:
Maybe nVidia does things differently, but AMD's pro stuff is truly nothing special.

I'd like to see some more conclusive evidence as to that than decade-old pro GPUs and half-decade old consumer GPUs running stripped down "pro" drivers.

The red spirit said:
More like nV and AMD completely don't give a shit about them, rather than them being useless. They don't have any competition in this market, so they can do as they please.

Making an FP64 accelerator isn't all that difficult - at least compared to the other hardware accelerators various startups make all the time for AI and the like. If there was a market for this, they would exist. It's reasonably clear that to the degree the market for this exists, it is sufficiently saturated by AMD and Nvidia's top-end compute accelerators, which still have 2:1 FP64 capabilities. And given the massive surge in cloud compute in recent years, the need for on-site FP64 compute is further diminishing, given its mostly specialized applicability.

The red spirit said:
I guess I forgot that, however Zen ran obscenely hot at launch. Just as hot as FX chips.

That's a misconception - though one AMD caused for themselves with some really friggin' weird sensors. Those thermals included significant (>20°C) offsets, if you remember. So when my first gen 1600X told the system it was running at 80+°C, the reality was that it was sitting in the low to mid 60s. They barely ran warm at all, they just made themselves look that way.

AFAIK this (stupid) way of doing things was done to maintain a (very) low tJmax rating without forcing motherboard makers and OEMs to fundamentally reconfigure their fan curves and BIOS configurations. Which seemed to stem from an insecurity regarding the actual thermal tolerances of this new architecture on an untested (and not all that great) node. The subsequent removal of these thermal offsets on newer series of CPUs tells us that this caution is no longer there.

The red spirit said:
AMD still haven't completely solved that problem with connecting chiplets to IHS, therefore heat is trapped in transmission.

That ... again, just isn't a very accurate description. Zen 3 (and to some degree Zen 2) is difficult to cool due to its very high thermal density, which is in turn due to its relatively small core design on a dense node. Higher thermal density makes spreading the heat sufficiently and with sufficient speed more of a challenge, simply due to it being more concentrated - which both raises absolute temperatures in the hotspot and means heat has to travel further across an IHS. Of course the newer cores drawing as much or more power per core than older ones in order to reach higher clocks doesn't help. In the end, this comes down to physics: concentrating the same heat load in a smaller area makes it more difficult for any material contacting that area to efficiently and rapidly dissipate that heat, and to keep the temperature at the heat source at the desired temperature. This is unavoidable unless you also change the material or construction of the IHS and TIM in between die and IHS. I've speculated previously on whether we'll see vapor chamber based IHSes at some point specifically for this reason, as at some point copper alone just won't be sufficient.

There are obviously optimizations to be done: a thinner diffusion barrier on top of the die improves thermal transfer (see the Intel ... was it 10900K, where they effectively sanded down every die for those?), but runs some risk of premature hardware failure due to TIM materials diffusing into the silicon and changing its characteristics. This is generally avoidable though, with some forethought. Thinner solder between the die and IHS will also help, though that's difficult in practice for mass production. Liquid metal outperforms solder, so that's another possible solution - and one that's been used in mass produced electronics with great success for quite a few years now. So there are still ways to improve things. But none of that boils down to "AMD not having solved that problem with connecting chiplets to IHS".

The red spirit said:
FX chips weren't as hot as many remember, just compared to Sandy I series they were. Initially maximum temperature spec was just 62C due to poor sensor calibration, but later it was lifted to 72C. FX chips were great at being spread out design, which was very efficient at transferring heat to IHS. I also found out that FX 6300 can run passively with Mugen 4 heatsink in enclosed case and it only reaches 58C reported temperature in long prime95 small FFT run. Do that with Ryzen 1600x and it will throttle, not to mention reach way higher temperature.

FX didn't run all that hot unless you wanted them to compete with Intel at the time, which forced you into 200+W overclocks (or buying one of their 200+W SKUs). Outside of that it was ... fine?

The red spirit said:
KSP

Yep, that one game that is essentially a gamified high precision physics simulation at its very core, with very few other aspects to the game. One might almost wonder if something makes it uniquely suited to adopting FP64?

The red spirit said:
But for real, it's very hard to find info about how PhysX worked, I know that it did FP, but which? KSP is the only game I found example of utilizing FP64. Others might use it to, but like I said, nobody says that their game uses FP64 or even FP32. companies just don't share their super technical details so freely.

PhysX was, AFAIK, FP32 through CUDA.

And given how game developers love to use the technical specificities of their games to drum up interest, it would be really weird if there were any notable titles out there using FP64 in any significant way with nobody aware of it. As has been said time and time again now: FP64 is more complex to program for and its only benefit is higher precision, which generally isn't necessary in games (as FP32 is already quite precise). The reason nobody is using it or talking about using it is that the only thing that would gain them and their gain would be more work and worse performance. It would have zero noticeable benefits in the vast majority of games.

The red spirit said:
I read more and it seems that Arma 2, Universe Sandbox may use it. Star Citizen's game engine does use FP64. Hellion certainly uses FP64.

Yet they run fine on modern, low-FP64 architectures, right? So, presumably, for whatever it's used for in these games, either it's relatively low intensity, or it's run server-side rather than locally. And crucially, one of the acutal useful applications of "AI" (neural networks) is running these kinds of simulations much faster, at much lower precision, yet with similar accuracy in the outcomes as the algorithms are that much more precise.

The red spirit said:
Form what I read, FP64 is being utilized more off-screen, where compute capabilities can be utilized.

It would likely be useful in things like simulating a vast physics-based universe (Star Citizen) or large amounts of highly complex AI over a significant amount of time, in scenarios where it's crucially important that those AIs or physics simulations are repeatable and predictable to a very high degree. That's what FP64 is good for - high precision, i.e. predictable and repeatable outcomes when running the same simulation many times, to a very high degree. FP32 is already pretty good at that, but generally insufficient for something really complex and high stakes like MRI imaging. Simulating a persistent physics-based universe isn't all that different from that - but that's also mainly done server-side.

The red spirit said:
But on other hand, Unity engine has used double precision floats too, but I can't find where specifically. UE5 is speculated to use FP64, but there's no confirmation of that. UE4 can support FP64 if needed, but that's avoided due to performance penalty, lack of support on certain hardware for doubles altogether and hardness of troubleshooting. So devs do a lot of work to make things work in FP32. If FP64 performance was great and capable cards common, it could take gaming to quite a different space than it is today. FP32 is a bottleneck, not a natural sweet spot.

The first half of what you're saying here is correct, but the second half is pure conjecture and speculation - and to some degree contradicts the first half. After all, if difficulty troubleshooting is a problem of FP64, how can that mean that devs are doing more work to make FP32 work for them? That sentence literally tells us that FP64 would be more work, not the other way around. And it's not surprising that most game engines can or do support FP64 - it exists, and it has certain specialized uses. You're also speculating that there is significant pent-up demand for the ability to use high performance FP64 in games, which isn't supported by what you said before. That's just speculation. That it's avoided due to having drawbacks doesn't mean it would be widely adopted if some or all of those drawbacks were reduced, as it would still need to be useful in some significant way.

The red spirit said:
Yes them, dunno about you, but in Lithuania even big businesses get their hardware from retailers, retailer do get their stuff from single big logistic center and that center gets stuff from perhaps nVidia. No business would want to deal with expense or hassle of basically doing retail themselves.

Large enterprises either buy directly from the producing companies, or do so through the intermediary of a distributor, but with the deal then typically being a three-party deal, with the distributor getting a cut due to their infrastructure and personnel being used. The question is how large of an enterprise you have, as this takes some size to do - but all businesses of sufficient size do so, as it is both faster, cheaper, and far more flexible than buying retail or buying from the distributor wihtout involving the producing company. If that doesn't happen in Lithuania, either the companies in question aren't of a sufficient size, or someone really needs to tell them that they ought to be doing this, as they would otherwise be throwing away a lot of time and money. They're in the EU after all, so they can use Nvidia/Intel/AMD/whoever's EU distribution networks and professional sales networks.

The red spirit said:
I'm not, but don't you think that FX revisions also changed core internals? It did, I just don't see any point in mentioning that, because it's obvious. Chip layout was indeed very similar, but if you compare K10 core layout to FX, there's no similarity. Zen didn't do that.

You are confusing those two. Yes, the various heavy machinery revisions updated various architectural traits of the cores - like that 20% IPC gain we spoke about before with Carrizo. Just as Zen 1, 1+, 2 and 3 also have significant changes between their cores. The difference is, these are revisions on the same design (though Zen3 is described as a ground-up redesign, tweaking literally every part of the core, with previous revisions being much more conservative). Within the same instruction set, and especially the same company and design teams (and associated patents, techniques and IP) there is obviously a fuzzy line between a new architecture and a tweaked one, but drawing lines is still possible - and frequently done. Zen1 was far greater of a departure from anything AMD had made previously than any subsequent design, despite Zen3 being a ground-up redesign. Sharing a vaguely similar layout tells us nothing of any significant value about the low level architectural details of a chip.

The red spirit · May 12, 2022

Valantar said:
I have absolutely zero idea what tasks those PCs were running, so ... okay? If anything, that just indicates that those tasks can't have been particularly performance or precision sensitive. I would be rather shocked if the people designing or building a power plant, or running CFI simulations for a reactor, a high-load heat exchange system, etc. were using that class of software.

Well I guess we are fucked then. My dad engineers some power plant periphery. Docs use those computers for MRI, X-Rays and other stuff. In terms of engineering, most often used software is AutoCAD, Bentley, Dassault, Maya and some other bits. That old 7770 has decent FP64 ratio at least. But yeah, this is where you could get a boost from workstation card sometimes maybe. But it's FP64 that is important, meanwhile having a workstation card is not.

I have found this due, who has been making videos and seemingly does similar job:

Yep, not much love for pro cards from him. The only other person I found that actually can utilize a pro card was some filmmaker with high budget, but those people buy several pro cards, use all VRAM and they really need bleeding edge and they use some special features, but liek this engineering dude said some people will have a need for 24/7 support and can't compromise on VRAM performance or some features, but we are still talking about tiny niche.

Valantar said:
You seem to be mistaking statements of "these products are used in these industries" with "these products are used in every instance, everywhere in these industries". Nobody here has made the latter claim. The average PC running in a hospital or university isn't likely running heavy computational workloads, but might just be used for reading and writing to various databases and other organizational tasks, and only needs hardware to match. Heck, a lot of stuff in hospitals and all kinds of laboratories is run on low-power, low performance embedded terminals of various kinds as well.

I dunno, I'm quite torn about such statement. You would want pro hardware in life critical computers and universities should have at least one machine capable of some pro stuff. Think about IT researchers or other things. But still, if those cards actually did enough things to be truly pro, there wouldn't be so much resistance in buying them.

Valantar said:
Of course a lot of the heavier compute tasks, particularly within research, is increasingly offloaded to off-site cloud compute services - Azure, AWS, or any of dozens and dozens of smaller providers. Unless the hospital or university is sufficiently wealthy to build and run their own HPC clusters, of course.

I don't think that it would be so bad for them to have one or two machines with mid tier pro cards for research purposes only. Those are quite "affordable".

Valantar said:
Absolutely - GCN has aged very well, especially in terms of compute, even if its efficiency is obviously crap by today's standards.

I sort of made it awesome. If you set TDP limit to 100 watts, it still reaches almost whole frequency. You lose only 100-200 MHz, that's not much, considering you lose 85 watts. Such stupidly simple tweak basically makes it superior to RX 5500 XT or RX 6500 XT in terms of performance and power efficiency. GCN was really efficient, but AMD in whitepapers state, that they decided to clock it higher at efficiency trade-off. All I can say that it was a mistake, but on other hand people wouldn't have bought GCN cards if they were slower than GeForces. And AMD ended up liek that, because Polaris was supposed to come earlier and once it came out, nVidia already released their next gen, that made Polaris look bad. Not to mention how damn late Vega was and how pathetic it ended up looking. It was supposed to be competitor to 980 Ti, but it came out when 1080 ti was already a thing. Meanwhile, HD 7000 series and R series were quite well timed, overall decent, but did they blew on coolers and power consumptions, not to mention tessellation. nVidia at that point was suffering form bad releases, so yeah.

Valantar said:
And even consumer electronics can run for a long, long time - I ran my old Core2Quad for nearly a decade as my main CPU, with a 31% overclock for the last few years of my ownership of it, before selling it on to someone else (who at least never reported to me that it had failed, so I'm assuming it still works). GPUs can also work for a long time as long as they aren't hammered or some onboard component fails - MOSFETs and capacitors are the main killers of those things, on top of ensuring good overall designs that avoid internal resonance, hot spots, etc.

Well I had awful experience with reliability of computers. I went through 3 AM3+ boards and each lasted only like 2 years until they all croaked. I have to admit that I abused one, but two others just died for no good reason. I also went through 3 boards with socket 754, one lasted over decade, meanwhile others were DOA. I had 2 graphics cards die on me for no reason. I had HDD fail from popping some components and stinking for the rest of day. One HDD died randomly after 2 years of use. One HDD croaked silently too. I had warranty hell with FM2+ boards and those that work are kinda shit and have their own problems like very hot VRMs, downclocking and etc. BTW that FirePro v5800 just decided to stop working one day and just died like that. It wasn't overheating, artifacting or doing anything strange a day before, but it stopped outputting video completely. I had one malfunctional router, which's WAN sort of failed (dropped connection at any higher load). I never saw memory, PSU or CPU fail, but rest dies rather easily and often for no obvious reason. Some of this hardware was bought new, some was eBay specials. I still have paranoia that motherboards can just die for some reason. It's pretty obvious that my experience hasn't been great. I now have quite a bit leftover CPUs, RAM, GPUs, that I can't use. Those things last.

Valantar said:
Yes, that's what I've been saying the whole time. That is not equal to them being "crippled" or "harvested". Different tools for different uses made to different standards.

Sort of. Most Quadros use dies with less cores, lower clock speed, but faster memory. Same deal with AMD.

Valantar said:
I never said they had "secret features" or magic. Also, have you considered that for a resource-strapped company like AMD was back in those days, they might take the cheaper approach of engineering one good board, rather than one good and one slightly less good but cheaper board? AFAIK the workstation market was less dominated by Nvidia then than now. A lot of this work is also transferable, at least if the workstation board doesn't require something like a ton of PCB layers or more premium PCB materials that would necessitate a separate design for the consumer card.

Or imagine that they really don't give them any better hardware

Valantar said:
Pro drivers are literally never updated as frequently as consumer ones - that's a feature, not a bug. Bugfixes are ideally pushed out rapidly, but not necessarily, and regular driver updates are intentionally slow, as the last thing you want when doing work is for a driver update to break something. And yes, the same goes for adding features - unless those features add significant value to pro customers, it's safer to leave them out in case they have some unexpected side effect. I kind of doubt AMD - especially considering how little money they had to spend on driver development back then - had the resources to care about youtube crashing a Pro GPU.

But that sort of makes web usable nowadays at least. So many sites have YT embedded. Also if YT crashes, who says that any other video won't trigger BSOD? It might be decoder wide bug. That's just a really pathetic driver fail.

Valantar said:
I'd like to see some more conclusive evidence as to that than decade-old pro GPUs and half-decade old consumer GPUs running stripped down "pro" drivers.

Wanna buy be some Radeons?

I would bench them, look like a hawk for visual quality differences, but I would like to keep Radeon Pro model.

Valantar said:
Making an FP64 accelerator isn't all that difficult - at least compared to the other hardware accelerators various startups make all the time for AI and the like. If there was a market for this, they would exist. It's reasonably clear that to the degree the market for this exists, it is sufficiently saturated by AMD and Nvidia's top-end compute accelerators, which still have 2:1 FP64 capabilities. And given the massive surge in cloud compute in recent years, the need for on-site FP64 compute is further diminishing, given its mostly specialized applicability.

Like I said, those products are inaccessible for many buyers, not mention likely sky high price. If you are prosumer, you are better off getting Radeon VII or some Vega card from eBay.

Valantar said:
That's a misconception - though one AMD caused for themselves with some really friggin' weird sensors. Those thermals included significant (>20°C) offsets, if you remember. So when my first gen 1600X told the system it was running at 80+°C, the reality was that it was sitting in the low to mid 60s. They barely ran warm at all, they just made themselves look that way.

AFAIK this (stupid) way of doing things was done to maintain a (very) low tJmax rating without forcing motherboard makers and OEMs to fundamentally reconfigure their fan curves and BIOS configurations. Which seemed to stem from an insecurity regarding the actual thermal tolerances of this new architecture on an untested (and not all that great) node. The subsequent removal of these thermal offsets on newer series of CPUs tells us that this caution is no longer there.

It's basically the same with FX then, but even later Ryzen gens suffer from poor heat transfer through IHS, so I'm not sure if I can agree that it was just sensor offset issue. Not to mention the fact that some Zen chips have really unevenly flat IHS surface. It just hasn't been a problem before Zen.

Valantar said:
That ... again, just isn't a very accurate description. Zen 3 (and to some degree Zen 2) is difficult to cool due to its very high thermal density, which is in turn due to its relatively small core design on a dense node. Higher thermal density makes spreading the heat sufficiently and with sufficient speed more of a challenge, simply due to it being more concentrated - which both raises absolute temperatures in the hotspot and means heat has to travel further across an IHS. Of course the newer cores drawing as much or more power per core than older ones in order to reach higher clocks doesn't help. In the end, this comes down to physics: concentrating the same heat load in a smaller area makes it more difficult for any material contacting that area to efficiently and rapidly dissipate that heat, and to keep the temperature at the heat source at the desired temperature. This is unavoidable unless you also change the material or construction of the IHS and TIM in between die and IHS. I've speculated previously on whether we'll see vapor chamber based IHSes at some point specifically for this reason, as at some point copper alone just won't be sufficient.

There are obviously optimizations to be done: a thinner diffusion barrier on top of the die improves thermal transfer (see the Intel ... was it 10900K, where they effectively sanded down every die for those?), but runs some risk of premature hardware failure due to TIM materials diffusing into the silicon and changing its characteristics. This is generally avoidable though, with some forethought. Thinner solder between the die and IHS will also help, though that's difficult in practice for mass production. Liquid metal outperforms solder, so that's another possible solution - and one that's been used in mass produced electronics with great success for quite a few years now. So there are still ways to improve things. But none of that boils down to "AMD not having solved that problem with connecting chiplets to IHS".

All in all Intel fared a bit better there and despite advertised nm numbers, they actually had smaller nodes. That really looks more like AMD only problem. And then there's nVidia, who don't have problems due to just cooling a bare die. I guess we need some "free titty" movement for computer hardware too.

Valantar said:
FX didn't run all that hot unless you wanted them to compete with Intel at the time, which forced you into 200+W overclocks (or buying one of their 200+W SKUs). Outside of that it was ... fine?

Yep and if you turn off boost, it gained a lot of power efficiency. But that's a fail. It was built for high clocks, it was the sole reason for IPC sacrifices, older node, smaller cores and it just couldn't achieve that. Even FX based Opetrons ended up being quite crap compared to Sandy Xeons, because of how ridiculously slow each core was. You could buy 16 core Opteron, just to see it beaten by 8 core Sandy Xeon. And then again, Sandy Core i parts had ridiculously low power usage too. i7 only had around 60 or 70 watt real usage. i5 was even more economical. So FX truly was only better at a lot of weak cores, for low price. And I think that was reasonable value preposition. You could get FX 6300 for the price of Pentium and FX 8320 for the price of i5. But parts like FX 9590 were just plain stupid and insane. And that thing murdered boards and their VRMs. But I have to admit, that I admire it a bit for being crazy and something unusual and for showing what happened with architecture at such high speeds. Looking back now, it doesn't even look that insane, when we have even hotter and even less efficient chips like Core i9s. I just hate those i9s with passion for some reason. And I really don't liek that Intel didn't have guts to strongly enforce TDP limits on them, since those chips then become very efficient and not as stupid. And I hate nVidia even more, because they don't even have non-K (aka non clocked balls to the wall) versions of their cards. I would like to see LE cards with same cores, but lower TDP.

Valantar said:
PhysX was, AFAIK, FP32 through CUDA.

And it was actually good technology, really made a difference, but sadly it basically died like EAX or A3D. nVidia did a lot to monopolize it and kill it.

Valantar said:
And given how game developers love to use the technical specificities of their games to drum up interest, it would be really weird if there were any notable titles out there using FP64 in any significant way with nobody aware of it.

Counterquestion, but are you aware of games using INT8 or FP16 too? It's just very technical info, that nobody really tells, because barely anyone understand that and it's non-marketable. I have never seen any game dev marketing their game that use FP32 either.

Valantar said:
It would likely be useful in things like simulating a vast physics-based universe (Star Citizen) or large amounts of highly complex AI over a significant amount of time, in scenarios where it's crucially important that those AIs or physics simulations are repeatable and predictable to a very high degree. That's what FP64 is good for - high precision, i.e. predictable and repeatable outcomes when running the same simulation many times, to a very high degree. FP32 is already pretty good at that, but generally insufficient for something really complex and high stakes like MRI imaging. Simulating a persistent physics-based universe isn't all that different from that - but that's also mainly done server-side.

Those still could be games, just not the conventional games like GTA or CoD. I think that various simulators could benefit from strong FP64 presence in market. Imagine Assetto Corsa with FP64, good crash modelling, voxels maybe and other tech stuff. Game would be simulation porn.

Valantar said:
The first half of what you're saying here is correct, but the second half is pure conjecture and speculation - and to some degree contradicts the first half. After all, if difficulty troubleshooting is a problem of FP64, how can that mean that devs are doing more work to make FP32 work for them? That sentence literally tells us that FP64 would be more work, not the other way around. And it's not surprising that most game engines can or do support FP64 - it exists, and it has certain specialized uses. You're also speculating that there is significant pent-up demand for the ability to use high performance FP64 in games, which isn't supported by what you said before. That's just speculation. That it's avoided due to having drawbacks doesn't mean it would be widely adopted if some or all of those drawbacks were reduced, as it would still need to be useful in some significant way.

No, you probably missed something. Game developers could benefit from FP64. FP64 is good for them and could make their job easier, faster or just higher quality, however currently existing tools in game engines aren't great, they have their own problems that makes whole ordeal with FP64 problematic. Due to that and FP64 hardware rareness or poor performance, devs are basically forced to convert FP64 code into functional FP32 code, process is complicated, but tools to do it are of higher quality.

Valantar said:
Large enterprises either buy directly from the producing companies, or do so through the intermediary of a distributor, but with the deal then typically being a three-party deal, with the distributor getting a cut due to their infrastructure and personnel being used. The question is how large of an enterprise you have, as this takes some size to do - but all businesses of sufficient size do so, as it is both faster, cheaper, and far more flexible than buying retail or buying from the distributor wihtout involving the producing company. If that doesn't happen in Lithuania, either the companies in question aren't of a sufficient size, or someone really needs to tell them that they ought to be doing this, as they would otherwise be throwing away a lot of time and money. They're in the EU after all, so they can use Nvidia/Intel/AMD/whoever's EU distribution networks and professional sales networks.

Lithuania is small country with population of 2.7 million people. Most companies aren't that big and most companies aren't tech companies. Our biggest companies only have like 2k people hired and they are multinational companies too, so people hired in Lithuania isn't 2k most likely. Our biggest cap companies are also listed as small cap companies by Morningstar. Many medium size companies here only have 10-100 people. That definitely limits some business and some kinds of investments. Either way, I'm not a fan of buying restrictions of certain hardware. There's no need for them to exist. Neither there's any need to have say Quadros and FP64 Quadros. That's just more overhead for nV to manage almost identical products.

Valantar said:
You are confusing those two. Yes, the various heavy machinery revisions updated various architectural traits of the cores - like that 20% IPC gain we spoke about before with Carrizo. Just as Zen 1, 1+, 2 and 3 also have significant changes between their cores. The difference is, these are revisions on the same design (though Zen3 is described as a ground-up redesign, tweaking literally every part of the core, with previous revisions being much more conservative). Within the same instruction set, and especially the same company and design teams (and associated patents, techniques and IP) there is obviously a fuzzy line between a new architecture and a tweaked one, but drawing lines is still possible - and frequently done. Zen1 was far greater of a departure from anything AMD had made previously than any subsequent design, despite Zen3 being a ground-up redesign. Sharing a vaguely similar layout tells us nothing of any significant value about the low level architectural details of a chip.

Fine, fair enough about core internals, but AM3+ chis later got additional instruction sets liek FMA3 I think and some other bits. Also Carizzo may have been a small win in performance, but despite 20% IP gain, it also gained like 50% power efficiency, which is huge. Also if we go from Zambezi to Carrizo, so many things were tweaked, changed or modified. It was a lot more than ever before. Importantly, Carrizo wasn't even the last of FX derivative, there were some AM4 FX based chips, which were the last revision and they also gained some performance, efficiency, overclockability, DDR4 and instructions again. But those chips again were gimped due to lack of L3 cache and 28nm node. Had they been ported to 14nm and got L3 cache, they might not have sucked at all and we may have got Zen like performance out of them. Compared to Athlon 200GE (the closest actual competitor), FX derived AM4 chips doesn't seem to have much of multicore performance difference. I know that it falls flat in single core benches, but then we get into that core feud. Should we count module as core with HT or two cores.

They were pretty close in Cinebench, but that A8 is quite low clocked part. So, whether FX was bad or just on ancient node is hard to say now. Not to mention that FX had shit ton of revisions, redesigns. It was more like rolling architecture.

zx128k · May 16, 2022

I have been reading some reviews about this card, performance is much lower PCIe 3. PCIe4 has much higher performance. Basically dont pair this card with a system with no PCIe4 support.

Valantar · May 16, 2022

zx128k said:
I have been reading some reviews about this card, performance is much lower PCIe 3. PCIe4 has much higher performance. Basically dont pair this card with a system with no PCIe4 support.

During the previous 12 pages of discussion in this thread, several such reviews have already come up - from here at TPU, TechSpot/Hardware Unboxed, and more. So yeah, that's quite well known - though also very dependent on the game according to members here with actual hardware in hand.

@The red spirit see below:

The red spirit said:
Well I guess we are fucked then. My dad engineers some power plant periphery. Docs use those computers for MRI, X-Rays and other stuff. In terms of engineering, most often used software is AutoCAD, Bentley, Dassault, Maya and some other bits. That old 7770 has decent FP64 ratio at least. But yeah, this is where you could get a boost from workstation card sometimes maybe. But it's FP64 that is important, meanwhile having a workstation card is not.

In those cases I would absolutely agree that FP64 capabilities are more important than whatever branding your GPU has - depending on software and driver support, of course. Software like that is often very picky in what hardware they will let accelerate their workloads. I would be quite surprised to see the types of computers you describe running MRIs, but most likely that's more down to old equipment than anything else - if it ran the software when it was new, it obviously still does so, and not changing anything unnecessarily is good in these situations. But damn, it must be slow.

The red spirit said:
I have found this due, who has been making videos and seemingly does similar job:

Yep, not much love for pro cards from him. The only other person I found that actually can utilize a pro card was some filmmaker with high budget, but those people buy several pro cards, use all VRAM and they really need bleeding edge and they use some special features, but liek this engineering dude said some people will have a need for 24/7 support and can't compromise on VRAM performance or some features, but we are still talking about tiny niche.

That's an interesting video! I don't quite see where you get "not much love for pro cards" from though - he's crystal clear that if he's recommending mission critical hardware purchases for a business, he only recommends Quadros, as they can be relied on to get direct professional-level support from Nvidia. And that's a crucial difference: as an individual with the knowledge and skills to pick hardware for yourself, you're also most likely able to work around various issues on your own. That's not a solution that scales to a business with any number of non-enthusiast employees. That's where the need for reliability and support comes in - and why it costs the money it does. Remember: in most professional applications, performance is always second to reliability. You want the job done as quickly as possible, but running a few percent slower but always finishing is far better than randomly having to spend hours or days troubleshooting some weird issue.

The red spirit said:
I dunno, I'm quite torn about such statement. You would want pro hardware in life critical computers and universities should have at least one machine capable of some pro stuff. Think about IT researchers or other things. But still, if those cards actually did enough things to be truly pro, there wouldn't be so much resistance in buying them.

But for the type of professionals you're describing, there isn't - outside of hobbyist/enthusiast circles where everyone is aware that consumer products can do a lot of the same stuff much cheaper, but without the driver validation, development support, and end user support.

The red spirit said:
I don't think that it would be so bad for them to have one or two machines with mid tier pro cards for research purposes only. Those are quite "affordable".

It's entirely possible that they do, but for the most part it's far cheaper for them to just rent a remote compute node.

The red spirit said:
I sort of made it awesome. If you set TDP limit to 100 watts, it still reaches almost whole frequency. You lose only 100-200 MHz, that's not much, considering you lose 85 watts. Such stupidly simple tweak basically makes it superior to RX 5500 XT or RX 6500 XT in terms of performance and power efficiency. GCN was really efficient, but AMD in whitepapers state, that they decided to clock it higher at efficiency trade-off. All I can say that it was a mistake, but on other hand people wouldn't have bought GCN cards if they were slower than GeForces. And AMD ended up liek that, because Polaris was supposed to come earlier and once it came out, nVidia already released their next gen, that made Polaris look bad. Not to mention how damn late Vega was and how pathetic it ended up looking. It was supposed to be competitor to 980 Ti, but it came out when 1080 ti was already a thing. Meanwhile, HD 7000 series and R series were quite well timed, overall decent, but did they blew on coolers and power consumptions, not to mention tessellation. nVidia at that point was suffering form bad releases, so yeah.

You're presenting a pretty uneven playing field here though. "If I undervolt and underclock my older GPU it's more efficient and as fast as a newer, lower tier GPU clocked much higher" isn't particularly surprising. That's just how hardware works, especially with clock scaling. The issue then becomes absolute performance and die area. If your RX 580, with its 36 CUs and 232mm² die can match a 22CU/158mm² RX 5500 or 16CU/107mm² RX 6500 XT, that's .. okay? Not great though. In a low-to-midrange tier like that it might be fine if the process node is sufficiently cheap, but you quickly run into issues scaling performance upwards, as you either run into reticle size limits for the die or just end up with a massive, very expensive die. And, of course, GCN didn't allow for more than 64CUs no matter what, which is why Vega was the shitshow that it was. If AMD could have revised GCN to scale past 64CUs, they could have delivered lower clocked, much more efficient Vega cards that would have competed well in both efficiency and absolute performance with conetmporaneous Geforce cards. But they hit an architectural limit, and didn't have any other choice than pushing clocks high to compete at a higher tier.

So, while RDNA does also improve game performance per compute performance significantly (~+50% IIRC, this has been tested and confirmed), it also removes that CU limit and allows for much more flexible GPU designs. Of course CDNA is also a modified GCN with the CU limit overcome, but it has other changes too that render it unusable as a desktop GPU - among other things, there isn't a display pipeline at all.

The red spirit said:
Sort of. Most Quadros use dies with less cores, lower clock speed, but faster memory. Same deal with AMD.

To a degree, but as I showed above, this isn't a hard and fast rule, but varies across all SKUs. Some are fully enabled or close to it, others are severely cut. This just comes down to yields and the very different rationales behind product segmentation in pro and consumer markets. Different priorities make for different configurations.

The red spirit said:
Or imagine that they really don't give them any better hardware

You seem to be missing the point: "better hardware" isn't the same as proper engineering. The components might in the end be entirely the same - buying one component in huge quantities is often cheaper than buying two even if one of the two starts out cheaper, after all. What I'm talking about is the implementation - PCB quality, trace routing, all the minutiae of PCB engineering. And, again, a lot of this also carries over to consumer GPUs. And, of course, many third party GPUs are ridiculously over-engineered, and go far beyond workstation specs in some regards - mostly to facilitate overclocking and the like. None of that disproves the existence of stricter quality control and engineering standards for pro cards.

The red spirit said:
But that sort of makes web usable nowadays at least. So many sites have YT embedded. Also if YT crashes, who says that any other video won't trigger BSOD? It might be decoder wide bug. That's just a really pathetic driver fail.

While I do agree that youtube is a basic part of internet infrastructure these days, that's hardly a reasonable line to draw for a workstation GPU from, what, 2015? Sure, there are absolutely uses where a YT video could be useful for a workstation. But back then? Not likely.

The red spirit said:
Like I said, those products are inaccessible for many buyers, not mention likely sky high price. If you are prosumer, you are better off getting Radeon VII or some Vega card from eBay.

But if you're a prosumer, you're extremely unlikely to be running FP64-dependent caluclations. And if you are, you're still better off renting a compute node on AWS or Azure than buying your own hardware.

The red spirit said:
It's basically the same with FX then, but even later Ryzen gens suffer from poor heat transfer through IHS, so I'm not sure if I can agree that it was just sensor offset issue. Not to mention the fact that some Zen chips have really unevenly flat IHS surface. It just hasn't been a problem before Zen.

... because designs before Zen weren't even close to as dense as Zen. And you're conflating two things here: 14/12nm Zen: hot because of thermal offsets. 7nm Zen: Hot because of thermal density. Different issues, similar "problems".

The red spirit said:
All in all Intel fared a bit better there and despite advertised nm numbers, they actually had smaller nodes. That really looks more like AMD only problem. And then there's nVidia, who don't have problems due to just cooling a bare die. I guess we need some "free titty" movement for computer hardware too.

Have you seen all the people struggling to cool their higher end 10th, 11th and 12th gen Intel CPUs? 'Cause there are plenty of reports on this. They are just as difficult to cool as similarly dense AMD designs - but crucially, they are somewhat less dense, and thus struggle somewhat less. The difference is negligible though.

The red spirit said:
Yep and if you turn off boost, it gained a lot of power efficiency. But that's a fail. It was built for high clocks, it was the sole reason for IPC sacrifices, older node, smaller cores and it just couldn't achieve that. Even FX based Opetrons ended up being quite crap compared to Sandy Xeons, because of how ridiculously slow each core was. You could buy 16 core Opteron, just to see it beaten by 8 core Sandy Xeon. And then again, Sandy Core i parts had ridiculously low power usage too. i7 only had around 60 or 70 watt real usage. i5 was even more economical. So FX truly was only better at a lot of weak cores, for low price. And I think that was reasonable value preposition. You could get FX 6300 for the price of Pentium and FX 8320 for the price of i5. But parts like FX 9590 were just plain stupid and insane. And that thing murdered boards and their VRMs. But I have to admit, that I admire it a bit for being crazy and something unusual and for showing what happened with architecture at such high speeds. Looking back now, it doesn't even look that insane, when we have even hotter and even less efficient chips like Core i9s. I just hate those i9s with passion for some reason. And I really don't liek that Intel didn't have guts to strongly enforce TDP limits on them, since those chips then become very efficient and not as stupid. And I hate nVidia even more, because they don't even have non-K (aka non clocked balls to the wall) versions of their cards. I would like to see LE cards with same cores, but lower TDP.

Well, that's what you get with a design made for low IPC, high clocks, and run into a clock ceiling. Things become crap. Just look at the whole Pentium 4 debacle.

The red spirit said:
And it was actually good technology, really made a difference, but sadly it basically died like EAX or A3D. nVidia did a lot to monopolize it and kill it.

"Did a lot to monopolize and kill it" is ... a very weird way of putting it. This was an Nvidia technology. It was cool and innovative when new, but it died due to Nvidia's penchant for proprietary solutions and the difficulty of running physics compute loads alongside graphics on GPUs of the time. Other alternatives have since replaced it.

The red spirit said:
Counterquestion, but are you aware of games using INT8 or FP16 too? It's just very technical info, that nobody really tells, because barely anyone understand that and it's non-marketable. I have never seen any game dev marketing their game that use FP32 either.

Not that I know of, but then, that's relatively new technology that's still barely getting off the ground. And as with FP64, I doubt we'll see much, as, again, the use of neural network inference in games is likely to be somewhat limited. It could have pretty cool applications for smart in-game AI, for example, but that's as much of a problem as an opportunity, as it takes control away from game designers, making balancing and scripting all the more difficult.

The red spirit said:
Those still could be games, just not the conventional games like GTA or CoD. I think that various simulators could benefit from strong FP64 presence in market. Imagine Assetto Corsa with FP64, good crash modelling, voxels maybe and other tech stuff. Game would be simulation porn.

Even with modern GPUs I doubt we'd have the compute capabilities to do that in raw FP64 without also crippling rendering performance. And, most likely, some kind of machine learning solution could do that to a sufficient degree of accuracy while being much, much faster. Large-scale, repeatable simulations would still need higher accuracy, but there are very, very few game simulations that can actually make meaningful use of that. If you're not running the same simulation a bunch of times, a handfull of minuscule errors or inaccuracies aren't going to break your game.

The red spirit said:
No, you probably missed something. Game developers could benefit from FP64.

Yes, they probably could, but essentially nobody actually does, which tells us the applications are few and far between.

The red spirit said:
FP64 is good for them and could make their job easier, faster or just higher quality,

This is pure speculation, and does not reflect reality. How would FP64 make game development easier? How would it make it faster? At the very least they'd need to adapt to a new way of programming, and performance would - at best - be half of FP32. As for "higher quality" - that only matters if you're running calculations that need an extreme degree of accuracy. Which is generally not the case for games.

The red spirit said:
however currently existing tools in game engines aren't great, they have their own problems that makes whole ordeal with FP64 problematic. Due to that and FP64 hardware rareness or poor performance, devs are basically forced to convert FP64 code into functional FP32 code, process is complicated, but tools to do it are of higher quality.

I'd love to see some sources of game developers complaining that they have to run FP32 instead of FP64 in their games. 'Cause to me, this sounds like a made-up problem.

The red spirit said:
Lithuania is small country with population of 2.7 million people. Most companies aren't that big and most companies aren't tech companies. Our biggest companies only have like 2k people hired and they are multinational companies too, so people hired in Lithuania isn't 2k most likely. Our biggest cap companies are also listed as small cap companies by Morningstar. Many medium size companies here only have 10-100 people. That definitely limits some business and some kinds of investments.

Well, obviously. And they're subject to the same limitations as companies of the same size in other locations - or even more, as there's less likelihood of there being local representatives and the like. Being in the EU alleviates a lot of that though, with EU reps for Nvidia likely covering this area.

The red spirit said:
Either way, I'm not a fan of buying restrictions of certain hardware. There's no need for them to exist. Neither there's any need to have say Quadros and FP64 Quadros. That's just more overhead for nV to manage almost identical products.

I'm not arguing that there should be restrictions on buying anything either, I'm just describing the realities of professional/enterprise distribution vs. retail. Nvidia does not operate directly in retail markets, so products sold directly by Nvidia - such as their pro GPUs - are thus harder to get there, and more expensive. Buying pro products directly is another matter entirely.

The red spirit said:
Fine, fair enough about core internals, but AM3+ chis later got additional instruction sets liek FMA3 I think and some other bits. Also Carizzo may have been a small win in performance, but despite 20% IP gain, it also gained like 50% power efficiency, which is huge. Also if we go from Zambezi to Carrizo, so many things were tweaked, changed or modified. It was a lot more than ever before. Importantly, Carrizo wasn't even the last of FX derivative, there were some AM4 FX based chips, which were the last revision and they also gained some performance, efficiency, overclockability, DDR4 and instructions again. But those chips again were gimped due to lack of L3 cache and 28nm node. Had they been ported to 14nm and got L3 cache, they might not have sucked at all and we may have got Zen like performance out of them. Compared to Athlon 200GE (the closest actual competitor), FX derived AM4 chips doesn't seem to have much of multicore performance difference. I know that it falls flat in single core benches, but then we get into that core feud. Should we count module as core with HT or two cores.

But again, all you're describing here are tweaks. And, crucially, tweaks with tradeoffs - for example, Carrizo and Bristol Ridge didn't clock much past 4GHz, much lower than the FX designs they were derived from. So, they improved IPC and added some instructions, improved efficiency, but cut the clock ceiling. Look at the OC results in this Anandtech article: 4.8GHz and barely beating a Haswell i3. If that is the best a pro overclocker can do with them, that's not much. And they still topped out at 2 modules/4 threads. And no, you wouldn't have gotten Zen-like performance out of them - 1st generation Zen clocked about the same, and absolutely trounced the performance of these chips per core, while quadrupling core counts. Bristol Ridge on 14nm might have clocked a few hundred MHz higher than on 28nm, and would no doubt have consumed less power, but it wouldn't have come close to Zen.

The red spirit said:
They were pretty close in Cinebench, but that A8 is quite low clocked part. So, whether FX was bad or just on ancient node is hard to say now. Not to mention that FX had shit ton of revisions, redesigns. It was more like rolling architecture.

Pretty close? I see 333 points vs. 258 - that's a 29% advantage, and at just 100MHz higher clocks. Of course the Athlon 200GE is a 35W chip vs the 65/45 of the A8, but the node difference makes up for that mostly. But, remember: this is two Zen cores, with HT. Against four "cores" (two modules, four threads) on Bristol Ridge. This just drives home how much more performant Zen is per core compared to previous AMD designs.

catulitechup · May 16, 2022

For courtesy of videocardz leave this:

AMD claims to offer better performance per dollar than NVIDIA GPUs across its entire Radeon RX 6000 stack - VideoCardz.com

AMD RX 6000 vs NVIDIA RTX 30: up to 80% better FPS/$ AMD’s new battleground is performance per dollar. AMD’s Chief Architect of Gaming Solutions and Marketing, Frank Azor, published a chart illustrating AMD greatest potential right now, and that’s better performance per dollar. The chart...

videocardz.com

miserable fucking scumbag company to try justify non sense prices, well after this only left put lower prices because nobody believe any word of this type of company

catulitechup said:

various prices stay wrong case rtx 3060 non ti stay in 390us dont 430us

ASUS NVIDIA GeForce RTX 3060 Phoenix V2 Single-Fan 12GB GDDR6 PCIe 4.0 Graphics Card - Micro Center

Get it now! The ASUS Phoenix GeForce RTX 3060 derives its name from a high performance output in a robust package. A large single fan takes advantage of our Axial-tech fan design and a dual-ball bearing fan that lasts twice as long as sleeve-bearing alternatives.

www.microcenter.com

rtx 3060ti stay at 479us dont 580us

EVGA NVIDIA GeForce RTX 3060 Ti XC Gaming Dual-Fan 8GB GDDR6 PCIe 4.0 Graphics Card - Micro Center

Get it now! The EVGA RTX 3060 Ti XC cards are designed for the no-frills gamer who needs a high-performance card that can also fit into tight spaces.

www.microcenter.com

rtx 3070 stay at 499us dont 700us and 50us more lower than amd put rx 6750 xt at 550us according slides but in reality rx 6750 xt begins on 580us on microcenter

NVIDIA GeForce RTX 3070 Dual-Fan 8GB GDDR6 PCIe 4.0 Graphics Card - Micro Center

Get it now! The GeForce RTX 3070 is powered by Ampere - NVIDIAs 2nd gen RTX architecture. Built with enhanced RT Cores and Tensor Cores, new streaming multiprocessors, and high-speed G6 memory, it gives you the power you need to rip through the most demanding games.

www.microcenter.com

ASUS AMD Radeon RX 6750 XT Dual Overclocked Dual Fan 12GB GDDR6 PCIe 4.0 Graphics Card - Micro Center

Get it now! Delivering the latest AMD RDNA 2 architecture experience in its purest form, the ASUS Dual Radeon RX 6750 XT melds performance and simplicity like no other.

www.microcenter.com

rtx 3070ti stay at 699us dont 750us

ASUS NVIDIA GeForce RTX 3070 Ti TUF Gaming Overclocked Triple-Fan 8GB GDDR6X PCIe 4.0 Graphics Card - Micro Center

Get it now! The TUF GAMING GeForce RTX 3070 Ti has been stripped down and built back up to provide more robust power and cooling. A new all-metal shroud houses three powerful axial-tech fans that utilize durable dual ball fan bearings.

www.microcenter.com

and rx 6800 xt begins on 799us dont 850us

ASRock AMD Radeon RX 6800 XT Phantom Gaming Overclocked Triple-Fan 16GB GDDR6 PCIe 4.0 Graphics Card - Micro Center

Get it now! Delivers great performance which is more higher than reference cards based on the solid hardware design. Crafted for the best balance between the thermal efficiency and silence by all the details

www.microcenter.com

and now beside liars in information can add lazy to searching proper prices, another frank assnor classic like as justify rx 6500 xt on 200us for crappy adapted laptop gpu with 64bit memory bus, reduced capabilities (video decoding - encoding) and reduced pci-e lanes

compared before models like rx 5500xt with 128bit memory bus, more ram options like 8gb models, complete capabilities (video decoding - encoding) and more proper pci-e lanes like x8 lanes rx 6500 xt seems utter garbage

The red spirit · May 17, 2022

Valantar said:
@The red spirit see below:

Perhaps we should stop already. We just keep on bickering about random things that have no relevance to this thread. It's quite exhausting to write an essay per day. It's not like we both will agree about our points.

I will reply, but only to limited amount of points.

Valantar said:
That's an interesting video! I don't quite see where you get "not much love for pro cards" from though - he's crystal clear that if he's recommending mission critical hardware purchases for a business, he only recommends Quadros, as they can be relied on to get direct professional-level support from Nvidia.

That's my point. If Quadros aren't worth for common types of productivity, then CAD, perhaps visual design and other things, the niche for them becomes tiny. Like I also said, some countries don't even have businesses high end enough or big enough to even have such type of applications in any sustainable way, so they opt for consumer cards anyway. And if you have no budget, but something is mission critical, it's much better idea to add time buffers, hell AMD replacements or something like that, because Quadros are damn expensive.

At this point, perhaps if you are Pixar or some medical equipment makers, you need and benefit from Quadros enough to justify the expense. But then there's only one Pixar. I'm not even talking about things like GV100 Quadros that are even more offensively priced and their sale is restricted. Like you said before, there's only benefit in actually buying them if you have supercomputer or something like that. No consumer is going to drop 10k dollars on just one card alone. Well, maybe if you are Linus, but he's rich.

I actually looked at Turing's whitepaper. Turns out that nVidia specifically cripples GeForce drivers so that they perform worse at OpenGL. Also RTX card RT cores are slightly crippled in accumulation, compared to Quadros. Like some person perfectly put it "nVidia made surprisingly good product at very low costs, that was very powerful and then had to find ways to cripple it to make another product to make money". This quote was used to describe original Quadro cards, but I think that it applies even today. Also back then it replaced things like SGI RISC workstation cards, 3d Labs! cards and bunch of other very expensive equipment, which was very limited, frankly not so great and often required very specific programming knowledge too.

And while in the past ATi went hard with FireGL cards and outright didn't give Radeons OGL support, that changed over time too. And while OGL was once workstation/scientific standard, it came to consumer hardware and lost exclusivity, meanwhile cards like FireGL kinda lost a lot of their hard selling power. over time there were less and less reasons to get workstation card. FP64 was actually an unfortunate upscaling victim and in the end ended up being excluded from workstation cards too. It really got out of hand. Somehow it slipped both from nV's and from AMD's hands, therefore we have nonsense like GV100 Quadro. It also hurt workstation card hard selling power too, because it used to be an unique selling point. Now you are only left with less. Therefore my comment that Quadros are losing reasons to sell and nV with AMD are running out of reasons why you should buy cards like that. Card makers also have snafus.

Valantar said:
You're presenting a pretty uneven playing field here though. "If I undervolt and underclock my older GPU it's more efficient and as fast as a newer, lower tier GPU clocked much higher" isn't particularly surprising.

No, my point was about lowering TDP value, the power target. There's no undervoltign at all, you lose some performance for big efficiency gains, that's all. Polaris perf/watt was great, until diminishing return wall.

Valantar said:
Have you seen all the people struggling to cool their higher end 10th, 11th and 12th gen Intel CPUs? 'Cause there are plenty of reports on this. They are just as difficult to cool as similarly dense AMD designs - but crucially, they are somewhat less dense, and thus struggle somewhat less. The difference is negligible though.

I own 10th gen CPU myself and no they aren't that hard to cool as long as you stick to exactly Intel spec TDP, TAU limits for base speed and turbo speed. Problem is that nobody cares about that, board makers crank those to the moon and also add stupid overclocking tools like MCE, some reviewers have no idea what boost is and what it does and makes "sensational" videos about turbo peaks and spread misinformation about i9s being furnaces. If you have i9 10900K and leave it at PL1 - 125W, PL2 - 227W, Tau 27 sec, like Intel recommends, then it's actually not nearly as bad as media says (including TPU reviews) and in most tasks you also don't loose much performance. Only tasks that manage to saturate all CPU internal resources get throttled hard, but it's like we are running prime95 all the time and if you do, then get better cooler, better board anyway. In gaming, most productivity and etc. Intel spec limits are completely reasonable.

And also people spread nonsense about throttling. They don't understand that base speed is the hard spec and you should get it in any case. Boost is non guaranteed, never was and will never be. Sure most of the time you will get some boost, but if say Dell decides to make prebuilt with i9 10900 and it only achieves 3Ghz, then there's no issue with that. It's only rated for 2.8GHz base, anything more is extra and can't be relied on. It seems like people didn't get this either. I also want to partly blame benchmarkers too, because they don't test with turbo off and if they keep it on, results wary a lot and don't meaningfully represent what you may get with your hardware with your thermal and power restrictions.

On AMD side it's even more confusing, because AMD pushes PBO and PBO definitely is overclocking. You won't get any warranty if you keep using it, yet AMD doesn't seem to enforce it much and take losses, to avoid scandals, while their chips run way out of spec. And that's mostly due to AMD's marketing and consumer's own ignorance about computer component actual power usage. The only difference is that Intel did PR worse, but either way PBO, Turbo or whatever else is complete shitstorm from legal point of view. And it's not liek either tech is new. Intel had turbo from at least first gen Core i series with adjustable power limits. Only AMD didn't and only had generic turbo with it either on or off, but difference was that it was used to avoid poor utilization of already set TDP, instead of going nuts and just clocking chips to the moon while flushing power usage concerns down the toilet as well as warranty.

All in all, this stuff was really stupid, likely could have been avoided if Intel and AMD truly held iron grip over board makers, but they didn't and thus we have this unintuitive dumpster fire of failed communication among different sides and no, it's not just Intel's problem with certain gens, AMD is affected too, but they just suck it up for whatever reason. If AMD wanted too, they could launch lawsuits to board makers for making out of spec hardware, reject RMAs for using PBO and clamping down on XMP. Perhaps they don't want to end up like Intel, but they could and they would be damn right to do so.

Valantar said:
"Did a lot to monopolize and kill it" is ... a very weird way of putting it. This was an Nvidia technology. It was cool and innovative when new, but it died due to Nvidia's penchant for proprietary solutions and the difficulty of running physics compute loads alongside graphics on GPUs of the time. Other alternatives have since replaced it.

They killed Ageia, as well as their card, they paid game devs to not us competing tech and when that hurt other competitors enough, they killed PhysX too. Therefore, we didn't get much in return, but nV eliminated competitors. I don't believe that nVidia actually gave two shits about PhysX. It was basically Hairworks, before Hairworks. Good tech made for flexing and extinguishing competitors and axed once that's done. All they actually did was expanding Huang's leather jacket collection, we as gamers, content creators or devs got nothing out of it.

Valantar said:
Not that I know of, but then, that's relatively new technology that's still barely getting off the ground.

lmao what? FP16 is now common. Common on all Radeon cards ever since like 2011 or 2012, basically ever since "Surface Format Optimization" option was added to catalyst. All it does is demoting FP32 calculations to FP16. Has been on by default for over decade. And yes, AMD clearly says that it can degrade graphics quality in their own control panel.

Valantar said:
Even with modern GPUs I doubt we'd have the compute capabilities to do that in raw FP64 without also crippling rendering performance. And, most likely, some kind of machine learning solution could do that to a sufficient degree of accuracy while being much, much faster. Large-scale, repeatable simulations would still need higher accuracy, but there are very, very few game simulations that can actually make meaningful use of that. If you're not running the same simulation a bunch of times, a handfull of minuscule errors or inaccuracies aren't going to break your game.

I meant it if we had 1:2 ratio FP64 GPUs. Now it's lost cause.

Valantar said:
But again, all you're describing here are tweaks. And, crucially, tweaks with tradeoffs - for example, Carrizo and Bristol Ridge didn't clock much past 4GHz, much lower than the FX designs they were derived from. So, they improved IPC and added some instructions, improved efficiency, but cut the clock ceiling. Look at the OC results in this Anandtech article: 4.8GHz and barely beating a Haswell i3. If that is the best a pro overclocker can do with them, that's not much.

Bruh, overclocker only used stock cooler. 4.8GHz with stock cooler is actually great. There also weren't any higher wattage Carrizo chips, since they were made for laptops only and only 65 watt chips made to desktops as harvested laptop chips, therefore severe limitations. Even Athlon X4 845 was laptop part, its die was only made with it being 15 watt part, but it got seriously stretched to work with 65 watts. I think that laptop dies aren't made the same and can't stand such wattages. Not to mention that Athlon also had locked multiplier which is big limitation to overclocking as boards nowadays don't have any speed locks for separate buses anymore. We haven't seen what those chips can actually achieve. I would say that 5.5 GHz would be achievable with just modest 120mm AIO if anyone bothered to buy them for overclocking and AMD unlocked multis.

The question is what would have happened if Carrizo or Bristol Ridge was on 14nm. Would we get faster chips or just more efficient chips.

Valantar said:
And they still topped out at 2 modules/4 threads.

Imagine making a flagship level chip for already discontinued FM2+ platform and imagine sabotaging your own companies other product launch for no good reason.

Valantar said:
And no, you wouldn't have gotten Zen-like performance out of them - 1st generation Zen clocked about the same, and absolutely trounced the performance of these chips per core, while quadrupling core counts. Bristol Ridge on 14nm might have clocked a few hundred MHz higher than on 28nm, and would no doubt have consumed less power, but it wouldn't have come close to Zen.

I highly doubt your claims. Even by your logic, it would have been possible to make 8 module Carrizo or Bristol Ridge chips within the space of Zen1 chips. That's 16 cores, before Zen scaled to that. Mind you, not exactly full cores, but it would have had more ALUs. And you seem to downplay the impact of lithography shrink, meanwhile you showed me an article, where AMD themselves mentioned that most changes come from exactly that, shrinking. Either way, it won't happen, but pretty cool to think about what would have happened.

Valantar said:
Pretty close? I see 333 points vs. 258 - that's a 29% advantage, and at just 100MHz higher clocks. Of course the Athlon 200GE is a 35W chip vs the 65/45 of the A8, but the node difference makes up for that mostly. But, remember: this is two Zen cores, with HT. Against four "cores" (two modules, four threads) on Bristol Ridge. This just drives home how much more performant Zen is per core compared to previous AMD designs.

No it doesn't, there's only ~30% difference with unfair node advantage. And essentially, it's fair to compare two modules to two full Zen cores, since only whole modules is completely independent execution unit, FX like core, however, isn't. There's no point in arguing about FX cores, when AMD lost lawsuit and had to pay FX chip buyers a compensation about core count claims. And to be fair, most correct way to describe those "cores" would be 4 ALUs, 2 FPUs with shared logic with two completely independent execution units (in case of Athlon X4 845). Hence why I compare FX derivative quad cores to Zen Athlon.

Valantar · May 17, 2022

catulitechup said:
miserable fucking scumbag company to try justify non sense prices, well after this only left put lower prices because nobody believe any word of this type of company

Lol, this made me laugh out loud - not that I explicitly disagree, more the degree of vitriol for what is in the grand scheme of things (or really the everyday workings of large corporations) such a small thing. Oh, and while I have no idea whether their price comparisons are correct, pointing out that prices have changed a week later in the middle of the largest GPU price drop in recent history is kind of redundant. Unless you've got prices from May 10th to compare to, any later comparison is essentially invalid after all. That doesn't mean what they're saying here is right, just that your complaint boils down to "prices fluctuate", which is just how online retail operates. GPU pricing is still stupidly high, and even if they're creeping downwards that hasn't changed meaningfully, and while I agree that advertising "better value" in a market like that is pretty dumb, it's also just ... meh. I'd take dumb marketing over misleading performance claims, market manipulation or the various anticompetitive behaviours we've seen in the PC business any day of the week.

catulitechup · Jun 7, 2022

at least rx 6400 in some way better price in microcenter (open box)

$127.96 us

PowerColor AMD Radeon RX 6400 ITX Single Fan 4GB GDDR6 PCIe 4.0 Graphics Card - Micro Center

Get it now! The cooling fan utilizes two-ball bearing technology, increasing the longevity of the fans by up to 4 times. Mute Fan Technology intelligently turns off the fan below 60C, providing silent gaming during medium and low-load while reducing power consumption.

www.microcenter.com

System Name	Shizuka
Processor	Intel Core i5 10400F
Motherboard	Gigabyte B460M Aorus Pro
Cooling	Scythe Choten
Memory	2x8GB G.Skill Aegis 2666 MHz
Video Card(s)	PowerColor Red Dragon V2 RX 580 8GB ~100 watts in Wattman
Storage	512GB WD Blue + 256GB WD Green + 4TH Toshiba X300
Display(s)	BenQ BL2420PT
Case	Cooler Master Silencio S400
Audio Device(s)	Topping D10 + AIWA NSX-V70
Power Supply	Chieftec A90 550W (GDP-550C)
Mouse	Steel Series Rival 100
Keyboard	Hama SL 570
Software	Windows 10 Enterprise

System Name	Hotbox
Processor	AMD Ryzen 7 5800X, 110/95/110, PBO +150Mhz, CO -7,-7,-20(x6),
Motherboard	ASRock Phantom Gaming B550 ITX/ax
Cooling	LOBO + Laing DDC 1T Plus PWM + Corsair XR5 280mm + 2x Arctic P14
Memory	32GB G.Skill FlareX 3200c14 @3800c15
Video Card(s)	PowerColor Radeon 6900XT Liquid Devil Ultimate, UC@2250MHz max @~200W
Storage	2TB Adata SX8200 Pro
Display(s)	Dell U2711 main, AOC 24P2C secondary
Case	SSUPD Meshlicious
Audio Device(s)	Optoma Nuforce μDAC 3
Power Supply	Corsair SF750 Platinum
Mouse	Logitech G603
Keyboard	Keychron K3/Cooler Master MasterKeys Pro M w/DSA profile caps
Software	Windows 10 Pro

System Name	Shizuka
Processor	Intel Core i5 10400F
Motherboard	Gigabyte B460M Aorus Pro
Cooling	Scythe Choten
Memory	2x8GB G.Skill Aegis 2666 MHz
Video Card(s)	PowerColor Red Dragon V2 RX 580 8GB ~100 watts in Wattman
Storage	512GB WD Blue + 256GB WD Green + 4TH Toshiba X300
Display(s)	BenQ BL2420PT
Case	Cooler Master Silencio S400
Audio Device(s)	Topping D10 + AIWA NSX-V70
Power Supply	Chieftec A90 550W (GDP-550C)
Mouse	Steel Series Rival 100
Keyboard	Hama SL 570
Software	Windows 10 Enterprise

System Name	Hotbox
Processor	AMD Ryzen 7 5800X, 110/95/110, PBO +150Mhz, CO -7,-7,-20(x6),
Motherboard	ASRock Phantom Gaming B550 ITX/ax
Cooling	LOBO + Laing DDC 1T Plus PWM + Corsair XR5 280mm + 2x Arctic P14
Memory	32GB G.Skill FlareX 3200c14 @3800c15
Video Card(s)	PowerColor Radeon 6900XT Liquid Devil Ultimate, UC@2250MHz max @~200W
Storage	2TB Adata SX8200 Pro
Display(s)	Dell U2711 main, AOC 24P2C secondary
Case	SSUPD Meshlicious
Audio Device(s)	Optoma Nuforce μDAC 3
Power Supply	Corsair SF750 Platinum
Mouse	Logitech G603
Keyboard	Keychron K3/Cooler Master MasterKeys Pro M w/DSA profile caps
Software	Windows 10 Pro

System Name	My second and third PCs are Intel + Nvidia
Processor	AMD Ryzen 7 7800X3D @ 45 W TDP Eco Mode
Motherboard	MSi Pro B650M-A Wifi
Cooling	Noctua NH-U9S chromax.black push+pull
Memory	2x 24 GB Corsair Vengeance DDR5-6000 CL36
Video Card(s)	PowerColor Reaper Radeon RX 9070 XT
Storage	2 TB Corsair MP600 GS, 4 TB Seagate Barracuda
Display(s)	Dell S3422DWG 34" 1440 UW 144 Hz
Case	Corsair Crystal 280X
Audio Device(s)	Logitech Z333 2.1 speakers, AKG Y50 headphones
Power Supply	750 W Seasonic Prime GX
Mouse	Logitech MX Master 2S
Keyboard	Logitech G413 SE
Software	Bazzite (Fedora Linux) KDE Plasma

System Name	Custom Watercooled
Processor	10900k 5.1GHz SSE 5.0GHz AVX
Motherboard	Asus Maximus XIII hero z590
Cooling	XSPC Raystorm Pro, XSPC D5 Vario, EK Water Blocks EK-CoolStream XE 360 (Triple Fan) Radiator
Memory	Team Group 8Pack RIPPED Edition TDPPD416G3600HC14CDC01 @ DDR4-4000 CL15 Dual Rank 4x8GB (32GB)
Video Card(s)	KFA2 GeForce RTX 3080 Ti SG 1-Click OC 12GB LHR GDDR6X PCI-Express Graphics Card
Storage	WD Blue SN550 1TB NVME M.2 2280 PCIe Gen3 Solid State Drive (WDS100T2B0C)
Display(s)	LG 3D TV 32LW450U-ZB and Samsung U28D590
Case	Full Tower Case
Audio Device(s)	ROG SupremeFX 7.1 Surround Sound High Definition Audio CODEC ALC4082, ESS SABRE9018Q2C DAC/AMP
Power Supply	Corsair AX1000 Titanium 80 Plus Titanium Power Supply
Mouse	Logitech G502SE
Keyboard	Logitech Y-BP62a
Software	Windows 11 Pro
Benchmark Scores	https://valid.x86.fr/2rdbdl https://www.3dmark.com/spy/27927340 https://ibb.co/YjQFw5t

Processor	AMD Ryzen 5 4600G @4300mhz
Motherboard	MSI B550-Pro VC
Cooling	Scythe Mugen 5 Black Edition
Memory	16GB DDR4 4133Mhz Dual Channel
Video Card(s)	IGP AMD Vega 7 Renoir @2300mhz (8GB Shared memory)
Storage	256GB NVMe PCI-E 3.0 - 6TB HDD - 4TB HDD
Display(s)	Samsung SyncMaster T22B350
Software	Xubuntu 24.04 LTS x64 + Windows 10 x64