Dear AMD, NVIDIA, INTEL and others, we need cheap (192-bit to 384-bit), high VRAM, consumer, GPUs to locally self-host/inference AI/LLMs

igormp · 2025-02-04T21:30:39+0000

Chrispy_ said:
Needed a BIOS flash on the board too, it was in limp-mode on the 9950X at 0.56GHz and didn't work with the 64GB DIMM until I updated to the December BIOS with AGESA 1.2.0.2b

Also, make sure you go 9000-series, I couldn't get the CUDIMMs working on a 7900X which makes me sad, because that's the bulk of our workstations. I have a nasty feeling AMD don't support them on 7000-series and either don't plan to, or physically can't.

FYI Raptor lake has solid CUDIMM support. Most of the rabbit holes I dove into when hunting for 64GB DIMMs were LGA1700.

I'm more interested on an AMD platform because it's both cheaper than arrow lake, and also because of AVX-512.
Raptor lake is not that interesting for me. It doesn't support CUDIMM either, so didn't you mean Arrow lake instead?

But yeah, I'm planning on a 9950x nonetheless, likely with a b650 or x670e proart.

Chrispy_ · 2025-02-04T21:35:58+0000

igormp said:
I'm more interested on an AMD platform because it's both cheaper than arrow lake, and also because of AVX-512.
Raptor lake is not that interesting for me. It doesn't support CUDIMM either, so didn't you mean Arrow lake instead?

But yeah, I'm planning on a 9950x nonetheless, likely with a b650 or x670e proart.

It seems like 9000-series only are supported, and operating in bypass mode, whatever that means.

There's a verified AMD engineer responding in this thread, so you can take it as pretty accurate (for a Reddit thread).
https://www.reddit.com/r/Amd/comments/1gbm20a
Potentially there's faster 256GB AM5 configurations possible than the 4800 I've achieved. I'm just not particularly clued up on manual DDR5 timings and I'm aiming for stability over speed. All of our 128GB AM4/AM5 systems are running at JEDEC speeds. I've been punished by user callouts for instability where I've found a kit that passes a memtest loop and subsequent OCCT certificate, only to start crashing a few weeks later - either degredation or marginal stability from the outset.

igormp · 2025-02-04T21:42:43+0000

Chrispy_ said:
It seems like 9000-series only are supported, and operating in bypass mode, whatever that means.

Means that they will work as regular UDIMMs, and the clock buffering circuitry on the DIMMs won't be used.

Chrispy_ said:
Potentially there's faster 256GB AM5 configurations possible than the 4800 I've achieved. I'm just not particularly clued up on manual DDR5 timings and I'm aiming for stability over speed. All of our 128GB AM4/AM5 systems are running at JEDEC speeds. I've been punished by user callouts for instability where I've found a kit that passes a memtest loop and subsequent OCCT certificate, only to start crashing a few weeks later - either degredation or marginal stability from the outset.

I'd be happy with 4800MHz already, just want to double up on quantity.

lexluthermiester · 2025-02-04T23:42:46+0000

Zazigalka said:
Call me stupid, but I think I do, cause I haven't the faintest idea what someone would need LLMs for on a home PC.

That was my point..

igormp said:
Deepseek hasn't proven anything. Their actual impressive model is 671B params in size, which requires at least 350GB of VRAM/RAM to run, that's not modest.
The models you are talking about that ran on a 6GB GPU and a raspberry pi are the distilled models, which are the ones based on existing models (llama and qwen).
Larger models of the same generation always give have better quality than smaller ones.
Of course that as time improves, the smaller models improve as well, but so do their larger counterparts.

Watch, learn.

igormp said:
It's just too much entitlement for something that's a hobby.
If not a hobby, then one should have enough money to pony up on professional stuff.

Agreed on both points.

Visible Noise · 2025-02-04T23:48:54+0000

I want a pony.

igormp · 2025-02-04T23:50:57+0000

lexluthermiester said:
That was my point..

Watch, learn.

If you give the video a proper watch, you'd know what I'm talking about.
But let me make it even easier for you:

How is Deepseek R1 on a Raspberry Pi? | Jeff Geerling

www.jeffgeerling.com

But sensationalist headlines aren't telling you the full story.

The Raspberry Pi can technically run Deepseek R1... but it's not the same thing as Deepseek R1 671b, which is a four hundred gigabyte model.

That model (the one that actually beats ChatGPT), still requires a massive amount of GPU compute.

lexluthermiester · 2025-02-05T00:04:05+0000

igormp said:
If you give the video a proper watch, you'd know what I'm talking about.
But let me make it even easier for you:

How is Deepseek R1 on a Raspberry Pi? | Jeff Geerling

www.jeffgeerling.com

How did you miss the point twice in a row? The point is, it's doable on low end machines. You don't need high end specs to do AI stuff now. Sure it might take longer, but it's doable. And in reference to the OP, we don't need specialized hardware or GPU's to do it. General everyday hardware is all a person needs.

igormp · 2025-02-05T00:11:20+0000

lexluthermiester said:
How did you miss the point twice in a row? The point is, it's doable on low end machines

You are the one missing it. For those smaller models, they used already existing models, and improved their quality a little bit. There's nothing new regarding this.

lexluthermiester said:
You don't need high end specs to do AI stuff now.

That's the point I'm making, this level of performance on consumer devices has been available for over an year now.
The new stuff deepseek brought is all related to their bigger models, that's where the innovation lies in.
Don't fall into the sensationalism some outlets are spouting (as Jeff himself said), and specially try not to reinforce those since this is just giving way to misinformation.

lexluthermiester · 2025-02-05T00:14:47+0000

igormp said:
You are the one missing it.

Yup, that's gotta be it. See ya.

igormp · 2025-02-05T00:21:43+0000

lexluthermiester said:
Yup, that's gotta be it. See ya.

Well, in case you're open to learning something and having a proper discussion, I can heavily recommend giving the actual deepseek paper a read:

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities. Through RL...

arxiv.org

Processor	5950x
Motherboard	B550 ProArt
Cooling	Fuma 2
Memory	4x32GB 3200MHz Corsair LPX
Video Card(s)	2x RTX 3090
Display(s)	LG 42" C2 4k OLED
Power Supply	XPG Core Reactor 850W
Software	I use Arch btw

System Name	Bragging Rights
Processor	Atom Z3735F 1.33GHz
Motherboard	It has no markings but it's green
Cooling	No, it's a 2.2W processor
Memory	2GB DDR3L-1333
Video Card(s)	Gen7 Intel HD (4EU @ 311MHz)
Storage	32GB eMMC and 128GB Sandisk Extreme U3
Display(s)	10" IPS 1280x800 60Hz
Case	Veddha T2
Audio Device(s)	Apparently, yes
Power Supply	Samsung 18W 5V fast-charger
Mouse	MX Anywhere 2
Keyboard	Logitech MX Keys (not Cherry MX at all)
VR HMD	Samsung Oddyssey, not that I'd plug it into this though....
Software	W10 21H1, barely
Benchmark Scores	I once clocked a Celeron-300A to 564MHz on an Abit BE6 and it scored over 9000.

Processor	5950x
Motherboard	B550 ProArt
Cooling	Fuma 2
Memory	4x32GB 3200MHz Corsair LPX
Video Card(s)	2x RTX 3090
Display(s)	LG 42" C2 4k OLED
Power Supply	XPG Core Reactor 850W
Software	I use Arch btw

System Name	XPS, Lenovo and HP Laptops, HP Xeon Mobile Workstation, HP Servers, Dell Desktops
Processor	Everything from Turion to 13900kf
Motherboard	MSI - they own the OEM market
Cooling	Air on laptops, lots of air on servers, AIO on desktops
Memory	I think one of the laptops is 2GB, to 64GB on gamer, to 128GB on ZFS Filer
Video Card(s)	A pile up to my knee, with a RTX 4090 teetering on top
Storage	Rust in the closet, solid state everywhere else
Display(s)	Laptop crap, LG UltraGear of various vintages
Case	OEM and a 42U rack
Audio Device(s)	Headphones
Power Supply	Whole home UPS w/Generac Standby Generator
Software	ZFS, UniFi Network Application, Entra, AWS IoT Core, Splunk
Benchmark Scores	1.21 GigaBungholioMarks

Processor	5950x
Motherboard	B550 ProArt
Cooling	Fuma 2
Memory	4x32GB 3200MHz Corsair LPX
Video Card(s)	2x RTX 3090
Display(s)	LG 42" C2 4k OLED
Power Supply	XPG Core Reactor 850W
Software	I use Arch btw