• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Dear AMD, NVIDIA, INTEL and others, we need cheap (192-bit to 384-bit), high VRAM, consumer, GPUs to locally self-host/inference AI/LLMs

Joined
May 10, 2023
Messages
598 (0.92/day)
Location
Brazil
Processor 5950x
Motherboard B550 ProArt
Cooling Fuma 2
Memory 4x32GB 3200MHz Corsair LPX
Video Card(s) 2x RTX 3090
Display(s) LG 42" C2 4k OLED
Power Supply XPG Core Reactor 850W
Software I use Arch btw
Needed a BIOS flash on the board too, it was in limp-mode on the 9950X at 0.56GHz and didn't work with the 64GB DIMM until I updated to the December BIOS with AGESA 1.2.0.2b

Also, make sure you go 9000-series, I couldn't get the CUDIMMs working on a 7900X which makes me sad, because that's the bulk of our workstations. I have a nasty feeling AMD don't support them on 7000-series and either don't plan to, or physically can't.

FYI Raptor lake has solid CUDIMM support. Most of the rabbit holes I dove into when hunting for 64GB DIMMs were LGA1700.
I'm more interested on an AMD platform because it's both cheaper than arrow lake, and also because of AVX-512.
Raptor lake is not that interesting for me. It doesn't support CUDIMM either, so didn't you mean Arrow lake instead?

But yeah, I'm planning on a 9950x nonetheless, likely with a b650 or x670e proart.
 
Joined
Feb 20, 2019
Messages
8,775 (4.01/day)
System Name Bragging Rights
Processor Atom Z3735F 1.33GHz
Motherboard It has no markings but it's green
Cooling No, it's a 2.2W processor
Memory 2GB DDR3L-1333
Video Card(s) Gen7 Intel HD (4EU @ 311MHz)
Storage 32GB eMMC and 128GB Sandisk Extreme U3
Display(s) 10" IPS 1280x800 60Hz
Case Veddha T2
Audio Device(s) Apparently, yes
Power Supply Samsung 18W 5V fast-charger
Mouse MX Anywhere 2
Keyboard Logitech MX Keys (not Cherry MX at all)
VR HMD Samsung Oddyssey, not that I'd plug it into this though....
Software W10 21H1, barely
Benchmark Scores I once clocked a Celeron-300A to 564MHz on an Abit BE6 and it scored over 9000.
I'm more interested on an AMD platform because it's both cheaper than arrow lake, and also because of AVX-512.
Raptor lake is not that interesting for me. It doesn't support CUDIMM either, so didn't you mean Arrow lake instead?

But yeah, I'm planning on a 9950x nonetheless, likely with a b650 or x670e proart.
It seems like 9000-series only are supported, and operating in bypass mode, whatever that means.

There's a verified AMD engineer responding in this thread, so you can take it as pretty accurate (for a Reddit thread).
https://www.reddit.com/r/Amd/comments/1gbm20a
Potentially there's faster 256GB AM5 configurations possible than the 4800 I've achieved. I'm just not particularly clued up on manual DDR5 timings and I'm aiming for stability over speed. All of our 128GB AM4/AM5 systems are running at JEDEC speeds. I've been punished by user callouts for instability where I've found a kit that passes a memtest loop and subsequent OCCT certificate, only to start crashing a few weeks later - either degredation or marginal stability from the outset.
 
Joined
May 10, 2023
Messages
598 (0.92/day)
Location
Brazil
Processor 5950x
Motherboard B550 ProArt
Cooling Fuma 2
Memory 4x32GB 3200MHz Corsair LPX
Video Card(s) 2x RTX 3090
Display(s) LG 42" C2 4k OLED
Power Supply XPG Core Reactor 850W
Software I use Arch btw
It seems like 9000-series only are supported, and operating in bypass mode, whatever that means.
Means that they will work as regular UDIMMs, and the clock buffering circuitry on the DIMMs won't be used.
Potentially there's faster 256GB AM5 configurations possible than the 4800 I've achieved. I'm just not particularly clued up on manual DDR5 timings and I'm aiming for stability over speed. All of our 128GB AM4/AM5 systems are running at JEDEC speeds. I've been punished by user callouts for instability where I've found a kit that passes a memtest loop and subsequent OCCT certificate, only to start crashing a few weeks later - either degredation or marginal stability from the outset.
I'd be happy with 4800MHz already, just want to double up on quantity.
 
Joined
Jul 5, 2013
Messages
29,218 (6.88/day)
Call me stupid, but I think I do, cause I haven't the faintest idea what someone would need LLMs for on a home PC.
That was my point..

Deepseek hasn't proven anything. Their actual impressive model is 671B params in size, which requires at least 350GB of VRAM/RAM to run, that's not modest.
The models you are talking about that ran on a 6GB GPU and a raspberry pi are the distilled models, which are the ones based on existing models (llama and qwen).
Larger models of the same generation always give have better quality than smaller ones.
Of course that as time improves, the smaller models improve as well, but so do their larger counterparts.
Watch, learn.
It's just too much entitlement for something that's a hobby.
If not a hobby, then one should have enough money to pony up on professional stuff.
Agreed on both points.
 
Last edited:
Joined
Jun 19, 2024
Messages
450 (1.86/day)
System Name XPS, Lenovo and HP Laptops, HP Xeon Mobile Workstation, HP Servers, Dell Desktops
Processor Everything from Turion to 13900kf
Motherboard MSI - they own the OEM market
Cooling Air on laptops, lots of air on servers, AIO on desktops
Memory I think one of the laptops is 2GB, to 64GB on gamer, to 128GB on ZFS Filer
Video Card(s) A pile up to my knee, with a RTX 4090 teetering on top
Storage Rust in the closet, solid state everywhere else
Display(s) Laptop crap, LG UltraGear of various vintages
Case OEM and a 42U rack
Audio Device(s) Headphones
Power Supply Whole home UPS w/Generac Standby Generator
Software ZFS, UniFi Network Application, Entra, AWS IoT Core, Splunk
Benchmark Scores 1.21 GigaBungholioMarks
Joined
May 10, 2023
Messages
598 (0.92/day)
Location
Brazil
Processor 5950x
Motherboard B550 ProArt
Cooling Fuma 2
Memory 4x32GB 3200MHz Corsair LPX
Video Card(s) 2x RTX 3090
Display(s) LG 42" C2 4k OLED
Power Supply XPG Core Reactor 850W
Software I use Arch btw
That was my point..


Watch, learn.
If you give the video a proper watch, you'd know what I'm talking about.
But let me make it even easier for you:
But sensationalist headlines aren't telling you the full story.

The Raspberry Pi can technically run Deepseek R1... but it's not the same thing as Deepseek R1 671b, which is a four hundred gigabyte model.

That model (the one that actually beats ChatGPT), still requires a massive amount of GPU compute.
 
Joined
Jul 5, 2013
Messages
29,218 (6.88/day)
If you give the video a proper watch, you'd know what I'm talking about.
But let me make it even easier for you:
How did you miss the point twice in a row? The point is, it's doable on low end machines. You don't need high end specs to do AI stuff now. Sure it might take longer, but it's doable. And in reference to the OP, we don't need specialized hardware or GPU's to do it. General everyday hardware is all a person needs.
 
Joined
May 10, 2023
Messages
598 (0.92/day)
Location
Brazil
Processor 5950x
Motherboard B550 ProArt
Cooling Fuma 2
Memory 4x32GB 3200MHz Corsair LPX
Video Card(s) 2x RTX 3090
Display(s) LG 42" C2 4k OLED
Power Supply XPG Core Reactor 850W
Software I use Arch btw
How did you miss the point twice in a row? The point is, it's doable on low end machines
You are the one missing it. For those smaller models, they used already existing models, and improved their quality a little bit. There's nothing new regarding this.
You don't need high end specs to do AI stuff now.
That's the point I'm making, this level of performance on consumer devices has been available for over an year now.
The new stuff deepseek brought is all related to their bigger models, that's where the innovation lies in.
Don't fall into the sensationalism some outlets are spouting (as Jeff himself said), and specially try not to reinforce those since this is just giving way to misinformation.
 
Joined
May 10, 2023
Messages
598 (0.92/day)
Location
Brazil
Processor 5950x
Motherboard B550 ProArt
Cooling Fuma 2
Memory 4x32GB 3200MHz Corsair LPX
Video Card(s) 2x RTX 3090
Display(s) LG 42" C2 4k OLED
Power Supply XPG Core Reactor 850W
Software I use Arch btw
Yup, that's gotta be it. See ya.
Well, in case you're open to learning something and having a proper discussion, I can heavily recommend giving the actual deepseek paper a read:
 
Joined
Jul 5, 2013
Messages
29,218 (6.88/day)
Well, in case you're open to learning something and having a proper discussion, I can heavily recommend giving the actual deepseek paper a read:
Oh, you mean this paper?

Have you actually read that PDF? Page 13 & 14 are most interesting..
 
Joined
Oct 21, 2009
Messages
129 (0.02/day)
Location
Netherlands
System Name Whitewonder
Processor 7800X3D
Motherboard Asus Proart X670-E Creator
Cooling Corsair custom Watercooled
Memory 64 Gb
Video Card(s) RX 6800 XT
Storage Too much to mention in all 1190 TB
Display(s) 2 x Dell 4K @ 60 hz
Case White XL case
Audio Device(s) Realtek + Bayer Dynamics 990 Pro headset
Power Supply 1300 watt
Mouse Corsair cord mouse
Keyboard Corsair red lighter cabled keyboard ages old ;)

Konomi

New Member
Joined
Aug 3, 2024
Messages
15 (0.08/day)
Or here's an idea: we actually for the hardware instead of trying to brute force everything. You can already use AI at home - won't be particularly fast, but it isn't something you need at home at this point in time. Developers can't even optimise for games properly and you're asking for hardware that you probably won't even truly benefit from.
 
Joined
May 10, 2023
Messages
598 (0.92/day)
Location
Brazil
Processor 5950x
Motherboard B550 ProArt
Cooling Fuma 2
Memory 4x32GB 3200MHz Corsair LPX
Video Card(s) 2x RTX 3090
Display(s) LG 42" C2 4k OLED
Power Supply XPG Core Reactor 850W
Software I use Arch btw
Oh, you mean this paper?

Have you actually read that PDF? Page 13 & 14 are most interesting..
Yeah, that's the exact paper I linked.
Page 13 is not really relevant since it's only pertaining to the bigger model. 14 is indeed where the fun is at, along with page 15, which has the comparison between a distilled model vs one just using their techniques from scratch (table 6), with this conclusion:
Therefore, we can draw two conclusions: First, distilling more powerful models into smaller ones yields excellent results, whereas smaller models relying on the large-scale RL mentioned in this paper require enormous computational power and may not even achieve the performance of distillation. Second, while distillation strategies are both economical and effective, advancing beyond the boundaries of intelligence may still require more powerful base models and larger-scale reinforcement learning.
It does reasonably good on reasoning benchmarks, however if you compare it to their regular base models, the distilled ones aren't that impressive. On HF's leaderboard, the distilled deepseek models rank quite low:

I'll assume you don't have much experience with running LLMs locally. You could either go with ollama on the CLI, or you could try something like LM Studio:
I personally haven't used it (I just run ollama myself), but I've heard it makes it pretty easy for people that are not that tech-savvy to run LLMs, even though it's not the most performant stack.

This way you could give those different models a go, and even compare to the big deepseek model somehwere and see how they fare.
 
Joined
Jul 5, 2013
Messages
29,218 (6.88/day)
Yeah, that's the exact paper I linked.
Page 13 is not really relevant since it's only pertaining to the bigger model. 14 is indeed where the fun is at, along with page 15, which has the comparison between a distilled model vs one just using their techniques from scratch (table 6), with this conclusion:

It does reasonably good on reasoning benchmarks, however if you compare it to their regular base models, the distilled ones aren't that impressive. On HF's leaderboard, the distilled deepseek models rank quite low:

I'll assume you don't have much experience with running LLMs locally. You could either go with ollama on the CLI, or you could try something like LM Studio:
I personally haven't used it (I just run ollama myself), but I've heard it makes it pretty easy for people that are not that tech-savvy to run LLMs, even though it's not the most performant stack.

This way you could give those different models a go, and even compare to the big deepseek model somehwere and see how they fare.
Ok, sure, moving on..
 
Joined
Feb 12, 2025
Messages
8 (2.00/day)
Location
EU
Processor AMD 5600X
Motherboard ASUS TUF GAMING B550M-Plus WiFi
Cooling be quiet! Dark Rock 4
Memory G.Skill Ripjaws 2 x 32 GB DDR4-3600 CL18-22-22-42 1.35V F4-3600C18D-64GVK
Video Card(s) Sapphire Pulse RX 7800XT 16GB
Storage Kingston KC3000 2TB + QNAP TBS-464
Display(s) LG 35" LCD 35WN75C-B 3440x1440
Case Kolink Bastion RGB Midi-Tower
Power Supply Enermax Digifanless 550W
Mouse Razer Deathadder v2
Benchmark Scores phi4 - 42.00 tokens/s
With the AI age being here, we need fast memory and lots of it, so we can host our favorite LLMs locally.
+1. With 32GB AMD 9070XT possibly coming, at least AMD is listening. nVidia has project DIGITS. I think AMD has good chance to make unified RAM platform based on PS5/XBOX.

We also need DDR6 quad channel (or something entirely new and faster) consumer desktop motherboards with up to 256 or 384GB RAM (a/my current B650 mobo supports only up to 128GB RAM)
sRT5 socket Threadripper has 8x DDR5 channels, but MB prices are four figures ofc.

With LLMs expected to exceed all aspects of human capabilities in 1-2 years according to Anthropic, this topic is going to be huge and change humanity in ways none of us can imagine. Local LLMs will be part of upcoming change. Excellent time for various hardware and software companies to ride the LLM wave.
 
Joined
May 10, 2023
Messages
598 (0.92/day)
Location
Brazil
Processor 5950x
Motherboard B550 ProArt
Cooling Fuma 2
Memory 4x32GB 3200MHz Corsair LPX
Video Card(s) 2x RTX 3090
Display(s) LG 42" C2 4k OLED
Power Supply XPG Core Reactor 850W
Software I use Arch btw
I think AMD has good chance to make unified RAM platform based on PS5/XBOX.
That'd be strix halo, up to 128GB unified memory on a 256-bit bus.
 
Joined
Dec 17, 2024
Messages
62 (1.02/day)
However, it seems to me that the current trend of running large LLMs locally will initially make powerful APUs like the Halo Strix scarce. In a second phase, however, it will stimulate the development of bigger and better APUs. Just my theory, but I believe this will bring significant changes to the market.

The big three players will likely try to sell CPU+GPU as a single product, effectively eliminating the low-end and mid-end dGPU market in the medium term.
Agreed, with nVidia rumoured to launch an ARM APU in 2026, and AMD its Medusa Halo around the same timeframe (IIRC), the trend seems here to stay. Intel better have something ready as well, otherwise they'll face even more difficulties than today
 
Joined
Nov 23, 2023
Messages
40 (0.09/day)
Don't get the hate for OP. These companies are artificially holding us back, and not just in the AI space.

Either give us more VRAM or developers will optimize for CPU and crash your stock prices in the process. Hell, tons of people are getting M2s just for this stuff, and that's just sad.

Yup, that's gotta be it. See ya.
He's literally right though. Flux.1-dev =/= Flux.1-schnell, either.
 
Last edited:
Joined
Jul 5, 2013
Messages
29,218 (6.88/day)
Don't get the hate for OP.
It's not hate for the OP. It's that cards for this kind of use already exist. They're called professional cards. Right now they come in 32GB, 48GB and 64GB flavors. We don't need consumers cards with those memory banks. That failure of understanding is common. Don't sweat it.

He's literally right though. Flux.1-dev =/= Flux.1-schnell, either.
And another.. :rolleyes:
 
Last edited:
Joined
Nov 23, 2023
Messages
40 (0.09/day)
It's not hate for the OP. It's that cards for this kind of use already exist. They're called profession cards. Right now that come in 32GB, 48GB and 64GB flavors. We don't need consumers cards with those memory banks. That failure of understanding is common. Don't sweat it.
Ah, yes, the big fat "We" dictating consumer "needs". I guess "we", the consumers, should just get the exact same card and move on. It's not like different consumers have different needs or different tastes or anything which drives the market in the first place, after all.

And no, those cards the OP is talking about don't exist, because they aren't "cheap" at all - which is the entire point of his initial post.
And another.. :rolleyes:
I don't understand this response. Sorry.
 
Joined
Jul 5, 2013
Messages
29,218 (6.88/day)
which is the entire point of his initial post.
Yes and there are some of us that are trying to help them and everyone else parroting this idea that it is not going to happen. And it does NOT take a genius to figure that out.

Either spend the money for the professional compute cards or live with the reduced speed of the consumer cards. Those are the choices. There are no others.

I don't understand this response. Sorry.
That's ok, no worries.
 
Joined
Sep 17, 2014
Messages
23,289 (6.12/day)
Location
The Washing Machine
System Name Tiny the White Yeti
Processor 7800X3D
Motherboard MSI MAG Mortar b650m wifi
Cooling CPU: Thermalright Peerless Assassin / Case: Phanteks T30-120 x3
Memory 32GB Corsair Vengeance 30CL6000
Video Card(s) ASRock RX7900XT Phantom Gaming
Storage Lexar NM790 4TB + Samsung 850 EVO 1TB + Samsung 980 1TB + Crucial BX100 250GB
Display(s) Gigabyte G34QWC (3440x1440)
Case Lian Li A3 mATX White
Audio Device(s) Harman Kardon AVR137 + 2.1
Power Supply EVGA Supernova G2 750W
Mouse Steelseries Aerox 5
Keyboard Lenovo Thinkpad Trackpoint II
VR HMD HD 420 - Green Edition ;)
Software W11 IoT Enterprise LTSC
Benchmark Scores Over 9000
what we need is cheaper cards without AI crap, gtx back.
you want AI crap there should be dedicated gpus with rtx and ai stuff and pay for them, not use the normal gpus
All in due time.

First, the conclusion must be drawn that RT is too costly for all those involved.
 
Joined
May 17, 2021
Messages
3,497 (2.55/day)
Processor Ryzen 5 5700x
Motherboard B550 Elite
Cooling Thermalright Perless Assassin 120 SE
Memory 32GB Fury Beast DDR4 3200Mhz
Video Card(s) Gigabyte 3060 ti gaming oc pro
Storage Samsung 970 Evo 1TB, WD SN850x 1TB, plus some random HDDs
Display(s) LG 27gp850 1440p 165Hz 27''
Case Lian Li Lancool II performance
Power Supply MSI 750w
Mouse G502
All in due time.

First, the conclusion must be drawn that RT is too costly for all those involved.

i think we're past that, unless you have a very high end card, there is no point in turning RT on
 
Joined
Sep 17, 2014
Messages
23,289 (6.12/day)
Location
The Washing Machine
System Name Tiny the White Yeti
Processor 7800X3D
Motherboard MSI MAG Mortar b650m wifi
Cooling CPU: Thermalright Peerless Assassin / Case: Phanteks T30-120 x3
Memory 32GB Corsair Vengeance 30CL6000
Video Card(s) ASRock RX7900XT Phantom Gaming
Storage Lexar NM790 4TB + Samsung 850 EVO 1TB + Samsung 980 1TB + Crucial BX100 250GB
Display(s) Gigabyte G34QWC (3440x1440)
Case Lian Li A3 mATX White
Audio Device(s) Harman Kardon AVR137 + 2.1
Power Supply EVGA Supernova G2 750W
Mouse Steelseries Aerox 5
Keyboard Lenovo Thinkpad Trackpoint II
VR HMD HD 420 - Green Edition ;)
Software W11 IoT Enterprise LTSC
Benchmark Scores Over 9000
i think we're past that, unless you have a very high end card, there is no point in turning RT on
The response you'll get is 'lies, because we can use DLSS'. Also engines have indeed added software based RT.
 
Top