Dear AMD, NVIDIA, INTEL and others, we need cheap (192-bit to 384-bit), high VRAM, consumer, GPUs to locally self-host/inference AI/LLMs

igormp · Feb 4, 2025

Chrispy_ said:
Needed a BIOS flash on the board too, it was in limp-mode on the 9950X at 0.56GHz and didn't work with the 64GB DIMM until I updated to the December BIOS with AGESA 1.2.0.2b

Also, make sure you go 9000-series, I couldn't get the CUDIMMs working on a 7900X which makes me sad, because that's the bulk of our workstations. I have a nasty feeling AMD don't support them on 7000-series and either don't plan to, or physically can't.

FYI Raptor lake has solid CUDIMM support. Most of the rabbit holes I dove into when hunting for 64GB DIMMs were LGA1700.

I'm more interested on an AMD platform because it's both cheaper than arrow lake, and also because of AVX-512.
Raptor lake is not that interesting for me. It doesn't support CUDIMM either, so didn't you mean Arrow lake instead?

But yeah, I'm planning on a 9950x nonetheless, likely with a b650 or x670e proart.

Chrispy_ · Feb 4, 2025

igormp said:
I'm more interested on an AMD platform because it's both cheaper than arrow lake, and also because of AVX-512.
Raptor lake is not that interesting for me. It doesn't support CUDIMM either, so didn't you mean Arrow lake instead?

But yeah, I'm planning on a 9950x nonetheless, likely with a b650 or x670e proart.

It seems like 9000-series only are supported, and operating in bypass mode, whatever that means.

There's a verified AMD engineer responding in this thread, so you can take it as pretty accurate (for a Reddit thread).
https://www.reddit.com/r/Amd/comments/1gbm20a
Potentially there's faster 256GB AM5 configurations possible than the 4800 I've achieved. I'm just not particularly clued up on manual DDR5 timings and I'm aiming for stability over speed. All of our 128GB AM4/AM5 systems are running at JEDEC speeds. I've been punished by user callouts for instability where I've found a kit that passes a memtest loop and subsequent OCCT certificate, only to start crashing a few weeks later - either degredation or marginal stability from the outset.

igormp · Feb 4, 2025

Chrispy_ said:
It seems like 9000-series only are supported, and operating in bypass mode, whatever that means.

Means that they will work as regular UDIMMs, and the clock buffering circuitry on the DIMMs won't be used.

Chrispy_ said:
Potentially there's faster 256GB AM5 configurations possible than the 4800 I've achieved. I'm just not particularly clued up on manual DDR5 timings and I'm aiming for stability over speed. All of our 128GB AM4/AM5 systems are running at JEDEC speeds. I've been punished by user callouts for instability where I've found a kit that passes a memtest loop and subsequent OCCT certificate, only to start crashing a few weeks later - either degredation or marginal stability from the outset.

I'd be happy with 4800MHz already, just want to double up on quantity.

lexluthermiester · Feb 4, 2025

Zazigalka said:
Call me stupid, but I think I do, cause I haven't the faintest idea what someone would need LLMs for on a home PC.

That was my point..

igormp said:
Deepseek hasn't proven anything. Their actual impressive model is 671B params in size, which requires at least 350GB of VRAM/RAM to run, that's not modest.
The models you are talking about that ran on a 6GB GPU and a raspberry pi are the distilled models, which are the ones based on existing models (llama and qwen).
Larger models of the same generation always give have better quality than smaller ones.
Of course that as time improves, the smaller models improve as well, but so do their larger counterparts.

Watch, learn.

igormp said:
It's just too much entitlement for something that's a hobby.
If not a hobby, then one should have enough money to pony up on professional stuff.

Agreed on both points.

Visible Noise · Feb 4, 2025

I want a pony.

igormp · Feb 4, 2025

lexluthermiester said:
That was my point..

Watch, learn.

If you give the video a proper watch, you'd know what I'm talking about.
But let me make it even easier for you:

How is Deepseek R1 on a Raspberry Pi? | Jeff Geerling

www.jeffgeerling.com

But sensationalist headlines aren't telling you the full story.

The Raspberry Pi can technically run Deepseek R1... but it's not the same thing as Deepseek R1 671b, which is a four hundred gigabyte model.

That model (the one that actually beats ChatGPT), still requires a massive amount of GPU compute.

lexluthermiester · Feb 5, 2025

igormp said:
If you give the video a proper watch, you'd know what I'm talking about.
But let me make it even easier for you:

How is Deepseek R1 on a Raspberry Pi? | Jeff Geerling

www.jeffgeerling.com

How did you miss the point twice in a row? The point is, it's doable on low end machines. You don't need high end specs to do AI stuff now. Sure it might take longer, but it's doable. And in reference to the OP, we don't need specialized hardware or GPU's to do it. General everyday hardware is all a person needs.

igormp · Feb 5, 2025

lexluthermiester said:
How did you miss the point twice in a row? The point is, it's doable on low end machines

You are the one missing it. For those smaller models, they used already existing models, and improved their quality a little bit. There's nothing new regarding this.

lexluthermiester said:
You don't need high end specs to do AI stuff now.

That's the point I'm making, this level of performance on consumer devices has been available for over an year now.
The new stuff deepseek brought is all related to their bigger models, that's where the innovation lies in.
Don't fall into the sensationalism some outlets are spouting (as Jeff himself said), and specially try not to reinforce those since this is just giving way to misinformation.

lexluthermiester · Feb 5, 2025

igormp said:
You are the one missing it.

Yup, that's gotta be it. See ya.

igormp · Feb 5, 2025

lexluthermiester said:
Yup, that's gotta be it. See ya.

Well, in case you're open to learning something and having a proper discussion, I can heavily recommend giving the actual deepseek paper a read:

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities. Through RL...

arxiv.org

lexluthermiester · Feb 5, 2025

igormp said:
Well, in case you're open to learning something and having a proper discussion, I can heavily recommend giving the actual deepseek paper a read:

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities. Through RL...

arxiv.org

Oh, you mean this paper?

Have you actually read that PDF? Page 13 & 14 are most interesting..

Bronan · Feb 5, 2025

mb194dc said:
This already exists, 2k is pretty cheap anyway:

running that on my powerprices will cost me a small fortune so no go

Konomi · Feb 5, 2025

Or here's an idea: we actually for the hardware instead of trying to brute force everything. You can already use AI at home - won't be particularly fast, but it isn't something you need at home at this point in time. Developers can't even optimise for games properly and you're asking for hardware that you probably won't even truly benefit from.

igormp · Feb 5, 2025

lexluthermiester said:
Oh, you mean this paper?

Have you actually read that PDF? Page 13 & 14 are most interesting..

Yeah, that's the exact paper I linked.
Page 13 is not really relevant since it's only pertaining to the bigger model. 14 is indeed where the fun is at, along with page 15, which has the comparison between a distilled model vs one just using their techniques from scratch (table 6), with this conclusion:

Therefore, we can draw two conclusions: First, distilling more powerful models into smaller ones yields excellent results, whereas smaller models relying on the large-scale RL mentioned in this paper require enormous computational power and may not even achieve the performance of distillation. Second, while distillation strategies are both economical and effective, advancing beyond the boundaries of intelligence may still require more powerful base models and larger-scale reinforcement learning.

It does reasonably good on reasoning benchmarks, however if you compare it to their regular base models, the distilled ones aren't that impressive. On HF's leaderboard, the distilled deepseek models rank quite low:

Open LLM Leaderboard - a Hugging Face Space by open-llm-leaderboard

Track, rank and evaluate open LLMs and chatbots

huggingface.co

I'll assume you don't have much experience with running LLMs locally. You could either go with ollama on the CLI, or you could try something like LM Studio:

LM Studio - Discover, download, and run local LLMs

Run Llama, Mistral, Phi-3 locally on your computer.

lmstudio.ai

I personally haven't used it (I just run ollama myself), but I've heard it makes it pretty easy for people that are not that tech-savvy to run LLMs, even though it's not the most performant stack.

This way you could give those different models a go, and even compare to the big deepseek model somehwere and see how they fare.

lexluthermiester · Feb 6, 2025

igormp said:
Yeah, that's the exact paper I linked.
Page 13 is not really relevant since it's only pertaining to the bigger model. 14 is indeed where the fun is at, along with page 15, which has the comparison between a distilled model vs one just using their techniques from scratch (table 6), with this conclusion:

It does reasonably good on reasoning benchmarks, however if you compare it to their regular base models, the distilled ones aren't that impressive. On HF's leaderboard, the distilled deepseek models rank quite low:

Open LLM Leaderboard - a Hugging Face Space by open-llm-leaderboard

Track, rank and evaluate open LLMs and chatbots

huggingface.co

I'll assume you don't have much experience with running LLMs locally. You could either go with ollama on the CLI, or you could try something like LM Studio:

LM Studio - Discover, download, and run local LLMs

Run Llama, Mistral, Phi-3 locally on your computer.

lmstudio.ai

I personally haven't used it (I just run ollama myself), but I've heard it makes it pretty easy for people that are not that tech-savvy to run LLMs, even though it's not the most performant stack.

This way you could give those different models a go, and even compare to the big deepseek model somehwere and see how they fare.

Ok, sure, moving on..

Ultron1337 · Thursday at 11:09 AM

10tothemin9volts said:
With the AI age being here, we need fast memory and lots of it, so we can host our favorite LLMs locally.

+1. With 32GB AMD 9070XT possibly coming, at least AMD is listening. nVidia has project DIGITS. I think AMD has good chance to make unified RAM platform based on PS5/XBOX.

10tothemin9volts said:
We also need DDR6 quad channel (or something entirely new and faster) consumer desktop motherboards with up to 256 or 384GB RAM (a/my current B650 mobo supports only up to 128GB RAM)

sRT5 socket Threadripper has 8x DDR5 channels, but MB prices are four figures ofc.

With LLMs expected to exceed all aspects of human capabilities in 1-2 years according to Anthropic, this topic is going to be huge and change humanity in ways none of us can imagine. Local LLMs will be part of upcoming change. Excellent time for various hardware and software companies to ride the LLM wave.

igormp · Thursday at 7:27 PM

Ultron1337 said:
I think AMD has good chance to make unified RAM platform based on PS5/XBOX.

That'd be strix halo, up to 128GB unified memory on a 256-bit bus.

TPUnique · Thursday at 7:31 PM

Denver said:
However, it seems to me that the current trend of running large LLMs locally will initially make powerful APUs like the Halo Strix scarce. In a second phase, however, it will stimulate the development of bigger and better APUs. Just my theory, but I believe this will bring significant changes to the market.

The big three players will likely try to sell CPU+GPU as a single product, effectively eliminating the low-end and mid-end dGPU market in the medium term.

Agreed, with nVidia rumoured to launch an ARM APU in 2026, and AMD its Medusa Halo around the same timeframe (IIRC), the trend seems here to stay. Intel better have something ready as well, otherwise they'll face even more difficulties than today

Rover4444 · 2025-02-14T02:06:53+0000

Don't get the hate for OP. These companies are artificially holding us back, and not just in the AI space.

Either give us more VRAM or developers will optimize for CPU and crash your stock prices in the process. Hell, tons of people are getting M2s just for this stuff, and that's just sad.

lexluthermiester said:
Yup, that's gotta be it. See ya.

He's literally right though. Flux.1-dev =/= Flux.1-schnell, either.

lexluthermiester · 2025-02-14T11:07:30+0000

Rover4444 said:
Don't get the hate for OP.

It's not hate for the OP. It's that cards for this kind of use already exist. They're called professional cards. Right now they come in 32GB, 48GB and 64GB flavors. We don't need consumers cards with those memory banks. That failure of understanding is common. Don't sweat it.

Rover4444 said:
He's literally right though. Flux.1-dev =/= Flux.1-schnell, either.

And another.. :rolleyes:

Rover4444 · 2025-02-14T11:19:48+0000

lexluthermiester said:
It's not hate for the OP. It's that cards for this kind of use already exist. They're called profession cards. Right now that come in 32GB, 48GB and 64GB flavors. We don't need consumers cards with those memory banks. That failure of understanding is common. Don't sweat it.

Ah, yes, the big fat "We" dictating consumer "needs". I guess "we", the consumers, should just get the exact same card and move on. It's not like different consumers have different needs or different tastes or anything which drives the market in the first place, after all.

And no, those cards the OP is talking about don't exist, because they aren't "cheap" at all - which is the entire point of his initial post.

lexluthermiester said:
And another..

I don't understand this response. Sorry.

lexluthermiester · 2025-02-14T12:16:21+0000

Rover4444 said:
which is the entire point of his initial post.

Yes and there are some of us that are trying to help them and everyone else parroting this idea that it is not going to happen. And it does NOT take a genius to figure that out.

Either spend the money for the professional compute cards or live with the reduced speed of the consumer cards. Those are the choices. There are no others.

Rover4444 said:
I don't understand this response. Sorry.

That's ok, no worries.

Vayra86 · 2025-02-14T12:51:41+0000

Bomby569 said:
what we need is cheaper cards without AI crap, gtx back.
you want AI crap there should be dedicated gpus with rtx and ai stuff and pay for them, not use the normal gpus

All in due time.

First, the conclusion must be drawn that RT is too costly for all those involved.

Bomby569 · 2025-02-14T12:58:47+0000

Vayra86 said:
All in due time.

First, the conclusion must be drawn that RT is too costly for all those involved.

i think we're past that, unless you have a very high end card, there is no point in turning RT on

Vayra86 · 2025-02-14T13:02:59+0000

Bomby569 said:
i think we're past that, unless you have a very high end card, there is no point in turning RT on

The response you'll get is 'lies, because we can use DLSS'. Also engines have indeed added software based RT.

Processor	5950x
Motherboard	B550 ProArt
Cooling	Fuma 2
Memory	4x32GB 3200MHz Corsair LPX
Video Card(s)	2x RTX 3090
Display(s)	LG 42" C2 4k OLED
Power Supply	XPG Core Reactor 850W
Software	I use Arch btw

System Name	Bragging Rights
Processor	Atom Z3735F 1.33GHz
Motherboard	It has no markings but it's green
Cooling	No, it's a 2.2W processor
Memory	2GB DDR3L-1333
Video Card(s)	Gen7 Intel HD (4EU @ 311MHz)
Storage	32GB eMMC and 128GB Sandisk Extreme U3
Display(s)	10" IPS 1280x800 60Hz
Case	Veddha T2
Audio Device(s)	Apparently, yes
Power Supply	Samsung 18W 5V fast-charger
Mouse	MX Anywhere 2
Keyboard	Logitech MX Keys (not Cherry MX at all)
VR HMD	Samsung Oddyssey, not that I'd plug it into this though....
Software	W10 21H1, barely
Benchmark Scores	I once clocked a Celeron-300A to 564MHz on an Abit BE6 and it scored over 9000.

Processor	5950x
Motherboard	B550 ProArt
Cooling	Fuma 2
Memory	4x32GB 3200MHz Corsair LPX
Video Card(s)	2x RTX 3090
Display(s)	LG 42" C2 4k OLED
Power Supply	XPG Core Reactor 850W
Software	I use Arch btw

System Name	XPS, Lenovo and HP Laptops, HP Xeon Mobile Workstation, HP Servers, Dell Desktops
Processor	Everything from Turion to 13900kf
Motherboard	MSI - they own the OEM market
Cooling	Air on laptops, lots of air on servers, AIO on desktops
Memory	I think one of the laptops is 2GB, to 64GB on gamer, to 128GB on ZFS Filer
Video Card(s)	A pile up to my knee, with a RTX 4090 teetering on top
Storage	Rust in the closet, solid state everywhere else
Display(s)	Laptop crap, LG UltraGear of various vintages
Case	OEM and a 42U rack
Audio Device(s)	Headphones
Power Supply	Whole home UPS w/Generac Standby Generator
Software	ZFS, UniFi Network Application, Entra, AWS IoT Core, Splunk
Benchmark Scores	1.21 GigaBungholioMarks

Processor	5950x
Motherboard	B550 ProArt
Cooling	Fuma 2
Memory	4x32GB 3200MHz Corsair LPX
Video Card(s)	2x RTX 3090
Display(s)	LG 42" C2 4k OLED
Power Supply	XPG Core Reactor 850W
Software	I use Arch btw

System Name	Whitewonder
Processor	7800X3D
Motherboard	Asus Proart X670-E Creator
Cooling	Corsair custom Watercooled
Memory	64 Gb
Video Card(s)	RX 6800 XT
Storage	Too much to mention in all 1190 TB
Display(s)	2 x Dell 4K @ 60 hz
Case	White XL case
Audio Device(s)	Realtek + Bayer Dynamics 990 Pro headset
Power Supply	1300 watt
Mouse	Corsair cord mouse
Keyboard	Corsair red lighter cabled keyboard ages old ;)

Processor	AMD 5600X
Motherboard	ASUS TUF GAMING B550M-Plus WiFi
Cooling	be quiet! Dark Rock 4
Memory	G.Skill Ripjaws 2 x 32 GB DDR4-3600 CL18-22-22-42 1.35V F4-3600C18D-64GVK
Video Card(s)	Sapphire Pulse RX 7800XT 16GB
Storage	Kingston KC3000 2TB + QNAP TBS-464
Display(s)	LG 35" LCD 35WN75C-B 3440x1440
Case	Kolink Bastion RGB Midi-Tower
Power Supply	Enermax Digifanless 550W
Mouse	Razer Deathadder v2
Benchmark Scores	phi4 - 42.00 tokens/s

System Name	Tiny the White Yeti
Processor	7800X3D
Motherboard	MSI MAG Mortar b650m wifi
Cooling	CPU: Thermalright Peerless Assassin / Case: Phanteks T30-120 x3
Memory	32GB Corsair Vengeance 30CL6000
Video Card(s)	ASRock RX7900XT Phantom Gaming
Storage	Lexar NM790 4TB + Samsung 850 EVO 1TB + Samsung 980 1TB + Crucial BX100 250GB
Display(s)	Gigabyte G34QWC (3440x1440)
Case	Lian Li A3 mATX White
Audio Device(s)	Harman Kardon AVR137 + 2.1
Power Supply	EVGA Supernova G2 750W
Mouse	Steelseries Aerox 5
Keyboard	Lenovo Thinkpad Trackpoint II
VR HMD	HD 420 - Green Edition ;)
Software	W11 IoT Enterprise LTSC
Benchmark Scores	Over 9000

Processor	Ryzen 5 5700x
Motherboard	B550 Elite
Cooling	Thermalright Perless Assassin 120 SE
Memory	32GB Fury Beast DDR4 3200Mhz
Video Card(s)	Gigabyte 3060 ti gaming oc pro
Storage	Samsung 970 Evo 1TB, WD SN850x 1TB, plus some random HDDs
Display(s)	LG 27gp850 1440p 165Hz 27''
Case	Lian Li Lancool II performance
Power Supply	MSI 750w
Mouse	G502

Dear AMD, NVIDIA, INTEL and others, we need cheap (192-bit to 384-bit), high VRAM, consumer, GPUs to locally self-host/inference AI/LLMs

New Member