What local LLM-s you use?

csendesmark · 2025-02-14T11:17:23+0000

@AusWolf
Local LLM-s are very important, I really reject the trend everything getting "cloud" based, micro$oft wants even you windows account to be online...
But after you posted 3 times, you could tell us what LLM-s you use, and maybe some performance data too!

Ultron1337 · 2025-02-14T15:09:30+0000

AusWolf said:
Maybe I'm a little bit behind on stuff but... Got to ask... What's the point of this for any regular home user?

It's like asking "whats the point of using your brain ?". First time in recorded history humanity has the ability to use thinking tool. It can supercharge almost any skill you have. Just as human brain can be used in nearly limitless ways, same applies to local LLM. Use it to check kids homework, use it to make your homework, help analyze scientific papers, write code for you, explain why vitamin K is good, count starts in the sky, analyze insurance offerings etc etc etc. I am not even going to pretend I know even a fraction of uses cases local LLMs will have next 10 years, but I know its going to be wild. On a level how internet changed our lives (yeah some of us grew up without internet).

JWNoctis · 2025-02-14T16:16:04+0000

AusWolf said:
Maybe I'm a little bit behind on stuff but... Got to ask... What's the point of this for any regular home user?

AusWolf said:
That sounds interesting. Can you explain?

At least the larger, 70B+ models, are typically sufficiently knowledgeable that you can ask them some complicated questions and expect a reasonable and not necessarily banal and expected answer. There are things you do not want to send to commercial services, most typically personal information. Some of the latest advancements made even 70B scale models competent with illusion-shattering problems previous generations of models have difficulty with, like how many r's in strawberry, how many boys does Mary have when one of them is gross, et cetera.

Now they are usually useful for common math and programming problems when used with care, can explore human philosophy and condition quite competently, and can tell stories of some interest with the right prompt. They are also useful for getting familiar with what LLM output looked like. Half of the internet looks LLM generated these days.

Some open-weight models are capable of API use, such as those provided by the framework it is running on, including requesting web services. Usefulness of such capabilities is apparently unremarkable given other limitations, and for that matter, the state of the internet and search engine results these days. That require support by the framework the model is running on, and is usually the only time the model - note, not the framework - would access the Internet.

They can also provide some silly fun, especially the roleplay-finetuned ones, for uses where hallucination actually provides some emulation of creativity. Think of it as a text-based holodeck. Throw in an image generator and it is text and image. You typically don't want a lot of those elsewhere, as well: As with all things requiring an account, all things you put into a networked service would be recorded by the provider, and linked to you. Not everyone feel comfortable with the nothing-to-hide mentality even when they really don't, and more than a few have objections to their interactions and personal info being used to train future commercial AI models.

Personally, I've sized my setup to be able to run a "future larger model" in early 2024, which would turn out to be mistral-large-2407, 123B, quantized. The best correctness and general task performance is probably currently achieved by the LLAMA 3 70B distilled version of DeepSeek R1. Anything larger would be costly and impractical for the moment, to me. Might as well make them useful while they are there.

csendesmark · 2025-02-14T16:27:20+0000

JWNoctis said:
70B+ models, are typically sufficiently knowledgeable that you can ask them some complicated questions and expect a reasonable and not necessarily banal and expected answer.

Did you skip DeepSeek?

AusWolf · 2025-02-14T16:50:48+0000

csendesmark said:
@AusWolf
Local LLM-s are very important, I really reject the trend everything getting "cloud" based, micro$oft wants even you windows account to be online...
But after you posted 3 times, you could tell us what LLM-s you use, and maybe some performance data too!

I'm not using anything. I didn't even know that you could run them locally until recently. I'm only trying to learn what use it is, to see whether it's something I'd want to do or not.

JWNoctis said:
At least the larger, 70B+ models, are typically sufficiently knowledgeable that you can ask them some complicated questions and expect a reasonable and not necessarily banal and expected answer. There are things you do not want to send to commercial services, most typically personal information. Some of the latest advancements made even 70B scale models competent with illusion-shattering problems previous generations of models have difficulty with, like how many r's in strawberry, how many boys does Mary have when one of them is gross, et cetera.

Now they are usually useful for common math and programming problems when used with care, can explore human philosophy and condition quite competently, and can tell stories of some interest with the right prompt. They are also useful for getting familiar with what LLM output looked like. Half of the internet looks LLM generated these days.

Some open-weight models are capable of API use, such as those provided by the framework it is running on, including requesting web services. Usefulness of such capabilities is apparently unremarkable given other limitations, and for that matter, the state of the internet and search engine results these days. That require support by the framework the model is running on, and is usually the only time the model - note, not the framework - would access the Internet.

They can also provide some silly fun, especially the roleplay-finetuned ones, for uses where hallucination actually provides some emulation of creativity. Think of it as a text-based holodeck. Throw in an image generator and it is text and image. You typically don't want a lot of those elsewhere, as well: As with all things requiring an account, all things you put into a networked service would be recorded by the provider, and linked to you. Not everyone feel comfortable with the nothing-to-hide mentality even when they really don't, and more than a few have objections to their interactions and personal info being used to train future commercial AI models.

Personally, I've sized my setup to be able to run a "future larger model" in early 2024, which would turn out to be mistral-large-2407, 123B, quantized. The best correctness and general task performance is probably currently achieved by the LLAMA 3 70B distilled version of DeepSeek R1. Anything larger would be costly and impractical for the moment, to me. Might as well make them useful while they are there.

Text-based holodeck running locally on your PC... Now that caught my attention!

I'm just having a hard time imagining it. LLM still lives in my head as a glorified search engine. :ohwell:

csendesmark · 2025-02-14T18:34:56+0000

AusWolf said:
I'm not using anything.

It is "current year"
Try starting here: https://lmstudio.ai

JWNoctis · 2025-02-15T02:53:03+0000

csendesmark said:
Did you skip DeepSeek?

The way they do chain-of-thought makes for interesting reading. I think they are the first to do it well enough in an open-weight model, too. Wherever they might be from, I do not have quite sufficient trust to send any hosted services anything confidential or profiling.

Even "free" services come with the implicit permission of using your interaction for any number of further purposes buried in the user agreement assuming it is followed, and God forbid if there is a data breach.

FWIW 70B Q6_k quantized models are a bit more than ~0.9 token/s to almost 1.2 token/s on my setup running on official distribution of Ollama 0.5.7. Latest llama.cpp compiled from source gives ~1.2 token/s.

AusWolf said:
Text-based holodeck running locally on your PC... Now that caught my attention!

I'm just having a hard time imagining it. LLM still lives in my head as a glorified search engine.

To be fair, they are still even worse than that when used for factual stuff without verification. And whatever it is that the various search engines are integrating, they certainly aren't doing it quite right yet.

They do have uses where even current models could play to their strengths though, and even the smaller models have the superhuman passing familiarity with almost everything anyone would - or could - ever have seen in text on a computer display. As long as you don't try something too unusual, they'd often do fine.

The King · 2025-02-15T03:25:43+0000

AusWolf said:
Maybe I'm a little bit behind on stuff but... Got to ask... What's the point of this for any regular home user?

This video helped me get started with running llm locally. It should also answer many of the questions you raised.

csendesmark · 2025-02-15T05:58:35+0000

The King said:
This video helped me get started with running llm locally. It should also answer many of the questions you raised.

OMG, this scare on the first seconds of the video.... :ohwell:

instant downvote from me for these kind of "content"
There is no worries when you run it locally on your local program.
I would never install DeepSeek's app on my phone tho...

ShiBDiB · 2025-02-15T18:32:36+0000

I'm still fairly new to the scene, but have a phi-4 and deepseek-r1 instance locally. Right now just use them when I run into coding issues or need some inspiration. Was using a lot of Grok to fill that role, but the local stuff is neat.

System Name	Kincsem
Processor	AMD Ryzen 9 9950X
Motherboard	ASUS ProArt X870E-CREATOR WIFI
Cooling	Be Quiet Dark Rock Pro 5
Memory	Kingston Fury KF560C32RSK2-96 (2×48GB 6GHz)
Video Card(s)	Sapphire AMD RX 7900 XT Pulse
Storage	Samsung 990PRO 2TB + Samsung 980PRO 2TB + FURY Renegade 2TB+ Adata 2TB + WD Ultrastar HC550 16TB
Display(s)	Acer QHD 27"@144Hz 1ms + UHD 27"@60Hz
Case	Cooler Master CM 690 III
Power Supply	Seasonic 1300W 80+ Gold Prime
Mouse	Logitech G502 Hero
Keyboard	HyperX Alloy Elite RGB
Software	Windows 10-64
Benchmark Scores	https://valid.x86.fr/9qw7iq https://valid.x86.fr/4d8n02 X570 https://www.techpowerup.com/gpuz/g46uc

Processor	AMD 5600X
Motherboard	ASUS TUF GAMING B550M-Plus WiFi
Cooling	be quiet! Dark Rock 4
Memory	G.Skill Ripjaws 2 x 32 GB DDR4-3600 CL18-22-22-42 1.35V F4-3600C18D-64GVK
Video Card(s)	Sapphire Pulse RX 7800XT 16GB
Storage	Kingston KC3000 2TB + QNAP TBS-464
Display(s)	LG 35" LCD 35WN75C-B 3440x1440
Case	Kolink Bastion RGB Midi-Tower
Power Supply	Enermax Digifanless 550W
Mouse	Razer Deathadder v2
Benchmark Scores	phi4 - 42.00 tokens/s

System Name	Kuro
Processor	AMD Ryzen 7 7800X3D@65W
Motherboard	MSI MAG B650 Tomahawk WiFi
Cooling	Thermalright Phantom Spirit 120 EVO
Memory	Corsair DDR5 6000C30 2x48GB (Hynix M)@6000 30-36-36-76 1.36V
Video Card(s)	PNY XLR8 RTX 4070 Ti SUPER 16G@200W
Storage	Crucial T500 2TB + WD Blue 8TB
Case	Lian Li LANCOOL 216
Power Supply	MSI MPG A850G
Software	Ubuntu 24.04 LTS + Windows 10 Home Build 19045
Benchmark Scores	17761 C23 Multi@65W

System Name	Kincsem
Processor	AMD Ryzen 9 9950X
Motherboard	ASUS ProArt X870E-CREATOR WIFI
Cooling	Be Quiet Dark Rock Pro 5
Memory	Kingston Fury KF560C32RSK2-96 (2×48GB 6GHz)
Video Card(s)	Sapphire AMD RX 7900 XT Pulse
Storage	Samsung 990PRO 2TB + Samsung 980PRO 2TB + FURY Renegade 2TB+ Adata 2TB + WD Ultrastar HC550 16TB
Display(s)	Acer QHD 27"@144Hz 1ms + UHD 27"@60Hz
Case	Cooler Master CM 690 III
Power Supply	Seasonic 1300W 80+ Gold Prime
Mouse	Logitech G502 Hero
Keyboard	HyperX Alloy Elite RGB
Software	Windows 10-64
Benchmark Scores	https://valid.x86.fr/9qw7iq https://valid.x86.fr/4d8n02 X570 https://www.techpowerup.com/gpuz/g46uc

Processor	Various Intel and AMD CPUs
Motherboard	Micro-ATX and mini-ITX
Cooling	Yes
Memory	Overclocking is overrated
Video Card(s)	Various Nvidia and AMD GPUs
Storage	A lot
Display(s)	Monitors and TVs
Case	It's not about size, but how you use it
Audio Device(s)	Speakers and headphones
Power Supply	300 to 750 W, bronze to gold
Mouse	Wireless
Keyboard	Mechanic
VR HMD	Not yet
Software	Linux gaming master race

Processor	AMD R7 1700X @ 4100Mhz
Motherboard	MSI B450M MORTAR MAX (MS-7B89)
Cooling	Phanteks PH-TC14PE
Memory	Crucial Technology 16GB DR (DDR4-3600) - C9BLM:045M:E BL16G36C16U4W.M16FE1 X2 @ CL14
Video Card(s)	XFX RX480 GTR 8GB @ 1408Mhz (AMD Auto OC)
Storage	Samsung SSD 850 EVO 250GB
Display(s)	Acer KG271 1080p @ 81Hz
Power Supply	SuperFlower Leadex II 750W 80+ Gold
Keyboard	Redragon Devarajas RGB
Software	Microsoft Windows 10 (10.0) Professional 64-bit
Benchmark Scores	https://valid.x86.fr/mvvj3a

System Name	[Daily Driver]
Processor	[Ryzen 7 5800X3D]
Motherboard	[MSI MAG B550 TOMAHAWK]
Cooling	[be quiet! Dark Rock Slim]
Memory	[64GB Crucial Pro 3200MHz (32GBx2)]
Video Card(s)	[PNY RTX 3070Ti XLR8]
Storage	[1TB SN850 NVMe, 4TB 990 Pro NVMe, 2TB 870 EVO SSD, 2TB SA510 SSD]
Display(s)	[2x 27" HP X27q at 1440p]
Case	[Fractal Meshify-C]
Audio Device(s)	[Fanmusic TRUTHEAR IEM, HyperX Duocast]
Power Supply	[CORSAIR RMx 1000]
Mouse	[Logitech G Pro Wireless]
Keyboard	[Logitech G512 Carbon (GX-Brown)]
Software	[Windows 11 64-Bit]