Dear AMD, NVIDIA, INTEL and others, we need cheap (192-bit to 384-bit), high VRAM, consumer, GPUs to locally self-host/inference AI/LLMs

Rover4444 · Feb 15, 2025

Vayra86 said:
Time will tell how silly this market will get over time, I guess.

Here you go..

Despite Frank Azor's Dismissal, Whispers of a 32 GB Radeon RX 9070 XTX Resurface

Recent rumors hinted at a 32 GB variant of the Radeon RX 9070 XT being in the works, which were quickly dismissed as false information by AMD's Frank Azor. However, reliable sources seem to point to the contrary, stating that a 32 GB variant of the RX 9070 XT, likely dubbed the RX 9070 XTX, is...

www.techpowerup.com

Would you look at that. Nice to see it might actually come out, this Frank guy really sucks when it comes to rumors honestly. Too early to say "I told you so" though, I'll wait for something to actually get released to do that :cool:

Dr. Dro said:
Agree, but I think that eventually given the rising the demands of LLMs, they'll develop a solution that is dedicated to training and inference. IMHO, Project DIGITS is probably a "prototype" of sorts, remember the CMP crypto mining processor series? It'll likely be something like that, but not rushed out and derived from gaming cards as those used to be: probably a dedicated GPGPU or FPGA processor that lacks a display engine and pretty much every other area useful for most computing tasks, but tailored specifically for inferencing, kind of something like this AMD/Xilinx card, which is a bit older now:

Eh, they'll have to do something at some point. Once CPUs get better at inference and training, people might just use that instead. I don't think devs are even doing anything for XDNA hardware, but unified memory alone is pretty huge. Strix and Medusa Halo should come cheaper than Nvidia's $3k option...

Dr. Dro · Feb 15, 2025

Rover4444 said:
Would you look at that. Nice to see it might actually come out, this Frank guy really sucks when it comes to rumors honestly. Too early to say "I told you so" though, I'll wait for something to actually get released to do that

Eh, they'll have to do something at some point. Once CPUs get better at inference and training, people might just use that instead. I don't think devs are even doing anything for XDNA hardware, but unified memory alone is pretty huge. Strix and Medusa Halo should come cheaper than Nvidia's $3k option...

Unified memory is only necessary because LLM data sets are too large to fit on a GPU's dedicated VRAM. That enables marketing departments to make deceitful, misleading slides that fanboys of respective companies (and this is not a problem with AMD or NVIDIA, it applies to pretty much both of them) will often parrot without question, such as this little gem right here:

Of course it's "up to 2.2x faster" when you can actually load the model onto memory (provided you have at least 96 or 128 GB of RAM in this case), and you're not at all compute bottlenecked, which is the issue quite literally any GPU short of NVIDIA's many-thousand-dollar, 80 GB+ HBM AI accelerators right now. Needless to say, the person who posted this slide to me as a rebuttal on X (where I told them, that if they believed this product was faster than a 4090 at anything, I had a bridge to sell 'em) summarily blocked me right after posting it and calling me a "smug f**k", go figure. For context, Ryzen AI Max Plus 395 is rated at 126 AI TOPS, an RTX 4090 is, at a worst case scenario basis, 10x faster.

Rover4444 · Feb 15, 2025

Dr. Dro said:
Unified memory is only necessary because LLM data sets are too large to fit on a GPU's dedicated VRAM. That enables marketing departments to make deceitful, misleading slides that fanboys of respective companies (and this is not a problem with AMD or NVIDIA, it applies to pretty much both of them) will often parrot without question, such as this little gem right here:

View attachment 385052

Of course it's "up to 2.2x faster" when you can actually load the model onto memory (provided you have at least 96 or 128 GB of RAM in this case), and you're not at all compute bottlenecked, which is the issue quite literally any GPU short of NVIDIA's many-thousand-dollar, 80 GB+ HBM AI accelerators right now. Needless to say, the person who posted this slide to me as a rebuttal on X (where I told them, that if they believed this product was faster than a 4090 at anything, I had a bridge to sell 'em) summarily blocked me right after posting it and calling me a "smug f**k", go figure. For context, Ryzen AI Max Plus 395 is rated at 126 AI TOPS, an RTX 4090 is, at a worst case scenario basis, 10x faster.

Unified memory is pretty fast in it's own right and has an APU that can use it. It should be around GDDR5 performance or so. I don't expect DDR5 or quad channel to do anywhere near as well in inference as an M2 or Strix Halo, even if you could offload the same model quant completely on GPU and the rest on RAM.

I don't really consider TOPS that great of a metric, either. It's just comes off as another FLOPS thing that doesn't really matter in real world performance.

Dr. Dro · Feb 15, 2025

Rover4444 said:
Unified memory is pretty fast in it's own right and has an APU that can use it. It should be around GDDR5 performance or so. I don't expect DDR5 or quad channel to do anywhere near as well in inference as an M2 or Strix Halo, even if you could offload the same model quant completely on GPU and the rest on RAM.

I don't really consider TOPS that great of a metric, either. It's just comes off as another FLOPS thing that doesn't really matter in real world performance.

Mostly because memory is so important here. It's pretty much the only thing that matters until that requirement is satisfied.

SRS · May 3, 2025

Assimilator said:
The self-entitlement from the OP is exactly what I've come to expect from "AI" companies and the people who believe those companies are in any way shape or form useful to humanity.

"Capable of running" is not in the same solar system as "good at running". If you want the latter for a nonstandard consumer use case you're not a consumer, you're a professional, and you need to pull the stick outta your a** and pony up the cash for professional products.

"Get yourself on the Trabant waiting list."

The borderline between consumer and professional is arbitrary when there is monopoly and duopoly — which there is. If we were in a situation with actual capitalism you might have a point.

AI models that need a lot more than 32 GB of VRAM are freely available for download right now. Yet, even though there absolutely is a consumer market for using these, the gods (uh... I mean quasi-monopolists) don't deign to even use older nodes like Samsung 8 nm to produce anything for them. Peons are not prioritized, even when they're willing to pony up reasonable amounts of cash. Chiding ordinary people for not having enterprise-level budgets is absurd.

qxp · May 4, 2025

SRS said:
The borderline between consumer and professional is arbitrary when there is monopoly and duopoly — which there is. If we were in a situation with actual capitalism you might have a point.

AI models that need a lot more than 32 GB of VRAM are freely available for download right now. Yet, even though there absolutely is a consumer market for using these, the gods (uh... I mean quasi-monopolists) don't deign to even use older nodes like Samsung 8 nm to produce anything for them. Peons are not prioritized, even when they're willing to pony up reasonable amounts of cash. Chiding ordinary people for not having enterprise-level budgets is absurd.

Strictly speaking, plain "capitalism" is just about the idea of pooling capital to achieve something no single entity has enough capital for and then sharing the profits. We have plenty of that going on.

Free market capitalism implies having enough players to compete, and it is relatively rare in capital heavy industries, especially now. Few companies make large passenger airplanes, few companies make gene sequencers, few companies make fast CPUs, few companies make fast GPUs, few companies make fast FPGAs.

This is not just because of money, but also human resources. We are stretched thin and one or two companies away from losing leadership in many industries.

As for VRAM - it really does not cost that much to add a few chips. What's more a lot of people would gladly get VRAM that is 10-20% slower, but quadruple the capacity. Limited VRAM is purely market segmentation.

SRS · May 6, 2025

qxp said:
This is not just because of money, but also human resources. We are stretched thin and one or two companies away from losing leadership in many industries.

As for VRAM - it really does not cost that much to add a few chips. What's more a lot of people would gladly get VRAM that is 10-20% slower, but quadruple the capacity. Limited VRAM is purely market segmentation.

In the 80s, Japanese DRAM firms dumpled DRAM and drove US companies, innovative companies, out of business, then raised prices. The DRAM prices went way up not because there weren't people capable of making good DRAM in the US. Not having enough people to produce competitor GPUs I think is way way down on the list of reasons why there is inadequate competition. The system faciliates too much wealth concentration. It enables too much monopolization and collusion.

I agree about the VRAM. That makes it quite galling to see the level of shenanigans that are gotten away with. This "market segmentation" exists because of inadequate competition, in a manner similar to Intel's endless quad core CPU iterations until AMD decided to compete. If AMD hadn't started selling dual-core chips, Intel would probably still be selling consumers single-core CPUs.

Capitalism is a neat idea but it seems to be designed to turn itself into corporate socialism, where people are on waiting lists (and having to jump through other absurd hoops like joining Discord clubs to beg) to purchase overpriced products.

There is more than enough money and manpower to have GPU competition. The problem is that money is allowed to be hoarded to extremes by massive corporations like Apple and Nvidia. I posited a Potato GPU corporation a few months ago and I have just about one potato in capital to get it going. Superior beings have it so they can build flaming moats.

qxp · May 6, 2025

SRS said:
In the 80s, Japanese DRAM firms dumpled DRAM and drove US companies, innovative companies, out of business, then raised prices. The DRAM prices went way up not because there weren't people capable of making good DRAM in the US. Not having enough people to produce competitor GPUs I think is way way down on the list of reasons why there is inadequate competition. The system faciliates too much wealth concentration. It enables too much monopolization and collusion.

I agree about the VRAM. That makes it quite galling to see the level of shenanigans that are gotten away with. This "market segmentation" exists because of inadequate competition, in a manner similar to Intel's endless quad core CPU iterations until AMD decided to compete. If AMD hadn't started selling dual-core chips, Intel would probably still be selling consumers single-core CPUs.

Capitalism is a neat idea but it seems to be designed to turn itself into corporate socialism, where people are on waiting lists (and having to jump through other absurd hoops like joining Discord clubs to beg) to purchase overpriced products.

There is more than enough money and manpower to have GPU competition. The problem is that money is allowed to be hoarded to extremes by massive corporations like Apple and Nvidia. I posited a Potato GPU corporation a few months ago and I have just about one potato in capital to get it going. Superior beings have it so they can build flaming moats.

One reason that we only have AMD and Intel is that only AMD has a license to x86 architecture. You'd think implementing it from scratch would be fair use, but that's not what we have. The "intellectual property" focused too much on the second and not enough on the first. One issue is that patents and copyrights encourage preventing others from using the invention. We should instead have legal framework that encourages accessibility and time to market.

Zaphod B · May 17, 2025

igormp said:
You'll be seeing such products with higher VRAM amount in their datacenter/workstation offerings with a heavier price tag, not in the consumer space.

Don't expect those to be any cheaper than a 5090. All in all, if you want a large vram consumer product, the 5090 existis for this sole reason.

Most DDR5 motherboards should support 256GB of RAM. The problem is that there are no 64GB UDIMMs available for sale yet.

Maybe you try these: https://www.crucial.de/memory/ddr5/...naFJYpuKSG8A70e-3SzLzTJwcDyDvFsuWlLH3Yih_jMNu

https://geizhals.de/crucial-pro-dimm-kit-128gb-cp2k64g56c46u5-a3423953.html

igormp · May 17, 2025

Zaphod B said:
Maybe you try these: https://www.crucial.de/memory/ddr5/...naFJYpuKSG8A70e-3SzLzTJwcDyDvFsuWlLH3Yih_jMNu

Since then I gave already bought 4x64GB from Kingston, but thanks still!

Zaphod B · May 17, 2025

It looks like Crucial heared you complain: Gelistet seit 21.02.2025, 08:26 :roll:

Rover4444 · May 21, 2025

God DAMN I'm feeling real fucking smug right now :pimp:

Processor	13th Gen Intel Core i9-13900KS
Motherboard	ASUS ROG Maximus Z790 Apex Encore
Cooling	Pichau Lunara ARGB 360 + Honeywell PTM7950
Memory	32 GB G.Skill Trident Z5 RGB @ 7600 MT/s
Video Card(s)	Palit GameRock OC GeForce RTX 5090 32 GB
Storage	500 GB WD Black SN750 + 4x 300 GB WD VelociRaptor WD3000HLFS HDDs
Display(s)	55-inch LG G3 OLED
Case	Cooler Master MasterFrame 700 benchtable
Audio Device(s)	EVGA NU Audio + Sony MDR-V7 headphones
Power Supply	EVGA 1300 G2 1.3kW 80+ Gold
Mouse	Microsoft Classic IntelliMouse
Keyboard	IBM Model M type 1391405
Software	Windows 10 Pro 22H2
Benchmark Scores	I pulled a Qiqi~

Processor	13th Gen Intel Core i9-13900KS
Motherboard	ASUS ROG Maximus Z790 Apex Encore
Cooling	Pichau Lunara ARGB 360 + Honeywell PTM7950
Memory	32 GB G.Skill Trident Z5 RGB @ 7600 MT/s
Video Card(s)	Palit GameRock OC GeForce RTX 5090 32 GB
Storage	500 GB WD Black SN750 + 4x 300 GB WD VelociRaptor WD3000HLFS HDDs
Display(s)	55-inch LG G3 OLED
Case	Cooler Master MasterFrame 700 benchtable
Audio Device(s)	EVGA NU Audio + Sony MDR-V7 headphones
Power Supply	EVGA 1300 G2 1.3kW 80+ Gold
Mouse	Microsoft Classic IntelliMouse
Keyboard	IBM Model M type 1391405
Software	Windows 10 Pro 22H2
Benchmark Scores	I pulled a Qiqi~

Processor	9950x \| 5950x
Motherboard	x670e ProArt\| B550 ProArt
Cooling	PA 120 SE \|Fuma 2
Memory	4x64GB Kingston CUDIMM @5200MHz \| 4x32GB 3200MHz Corsair LPX
Video Card(s)	2x RTX 3090
Display(s)	LG 42" C2 4k OLED
Power Supply	Corsair RM1000e \| XPG Core Reactor 850W
Software	I use Arch btw