What local LLM-s you use?

csendesmark · 2025-02-12T11:43:08+0000

Hello Forum,
I occasionally run LLM-s locally with LM Studio.
And I mostly use
DeepSeek-R1-Distill-Qwen-14B-GGUF (Q8_0) ~42 token/s
phi-4-GGUF (Q8_0) ~42 token/s
DeepSeek-R1-Distill-Qwen-32B-GGUF (Q6_K) ~3,2 token/s
Llama-3.3-70B-Instruct-GGUF (4Q_K_M) ~1,9 token/s
Using DeepSeek for logical and math problems while Phi and Llama for language question usually.
Really curious what do you use and how fast is it on your rig.

Ultron1337 · 2025-02-12T15:13:15+0000

I've used many models from 1B tinyllama to mixtral:8x22b.
I've found Phi4:14B and Gemma2:27B most useful.
My 7800XT runs these models Q4_K at these speeds:
phi4:27B - 42 tokens/s
gemma2:27B - 8.5 tokens/s

For reference
llama3.3:70B - 1.5 tokens/s painfully slow
llama3.2:3.2B - 115 tokens/s, super fast, but super dumb

I use ollama for running models, it has nice API that can be used with special Python module. I run Open WebUI in my QNAP container, that is connected to my models via same API. Another QNAP container runs nginx reverse proxy, to make the WebUI connection secure (HTTP->HTTPS).

I must admit much bigger online models like GPT-4o, Gemini 2 Pro etc are obviously superior to local ones, but the local ones get surprising amount stuff done and are fun to play with.

Easy Rhino · 2025-02-12T15:15:47+0000

I am running ollama here on my Fedora 41 desktop with an AMD RX 6750 XT. I was able to pass the Environment variable that enables that GPU despite it not directly being supported. I mostly am running deepseek-coder-v2 for AI chat inside VSCodium and Qwen2.5-coder:1.5 for code completion. I get 75-80 tokens per second which is pretty good.

edit: i get 83 tps with llama3.2:3b

csendesmark · 2025-02-12T15:59:45+0000

Made a second round with the same models but with a fresh start not opening any other programs
I am not sure why my system get this "slow" regarding AI
DS 14B Q8 from 42 -> 73 token/s
Phi4 Q8 from 42 still 42 token/s
DS 32B Q6 from 3.2 to 3.3 token/s
Llama3.3 70B Q3 from 1.9 to 2.6 token/s
I understand that the browser using some VRAM, but how is that LM Studio not using priority and offloading data not currently in use to system memory?

Ultron1337 said:
phi4:27B - 42 tokens/s

Same on mine, even with fresh start - strange
Llama 3.3 performance also looks strange

Easy Rhino · 2025-02-12T16:22:46+0000

csendesmark said:
Made a second round with the same models but with a fresh start not opening any other programs
I am not sure why my system get this "slow" regarding AI
DS 14B Q8 from 42 -> 73 token/s
Phi4 Q8 from 42 still 42 token/s
DS 32B Q6 from 3.2 to 3.3 token/s
Llama3.3 70B Q3 from 1.9 to 2.6 token/s
I understand that the browser using some VRAM, but how is that LM Studio not using priority and offloading data not currently in use to system memory?

Same on mine, even with fresh start - strange
Llama 3.3 performance also looks strange

Try llama3.2:3b and see what you get?

System Name	Kincsem
Processor	AMD Ryzen 9 9950X
Motherboard	ASUS ProArt X870E-CREATOR WIFI
Cooling	Be Quiet Dark Rock Pro 5
Memory	Kingston Fury KF560C32RSK2-96 (2×48GB 6GHz)
Video Card(s)	Sapphire AMD RX 7900 XT Pulse
Storage	Samsung 990PRO 2TB + Samsung 980PRO 2TB + FURY Renegade 2TB+ Adata 2TB + WD Ultrastar HC550 16TB
Display(s)	Acer QHD 27"@144Hz 1ms + UHD 27"@60Hz
Case	Cooler Master CM 690 III
Power Supply	Seasonic 1300W 80+ Gold Prime
Mouse	Logitech G502 Hero
Keyboard	HyperX Alloy Elite RGB
Software	Windows 10-64
Benchmark Scores	https://valid.x86.fr/9qw7iq https://valid.x86.fr/4d8n02 X570 https://www.techpowerup.com/gpuz/g46uc

Processor	AMD 5600X
Motherboard	ASUS TUF GAMING B550M-Plus WiFi
Cooling	be quiet! Dark Rock 4
Memory	G.Skill Ripjaws 2 x 32 GB DDR4-3600 CL18-22-22-42 1.35V F4-3600C18D-64GVK
Video Card(s)	Sapphire Pulse RX 7800XT 16GB
Storage	Kingston KC3000 2TB + QNAP TBS-464
Display(s)	LG 35" LCD 35WN75C-B 3440x1440
Case	Kolink Bastion RGB Midi-Tower
Power Supply	Enermax Digifanless 550W
Mouse	Razer Deathadder v2
Benchmark Scores	phi4 - 42.00 tokens/s

System Name	Desktop
Processor	i5 13600KF
Motherboard	AsRock B760M Steel Legend Wifi
Cooling	Noctua NH-U9S
Memory	4x 16 Gb Gskill S5 DDR5 @6000
Video Card(s)	Gigabyte Gaming OC 6750 XT 12GB
Storage	WD_BLACK 4TB SN850x
Display(s)	Gigabye M32U
Case	Corsair Carbide 400C
Audio Device(s)	On Board
Power Supply	EVGA Supernova 650 P2
Mouse	MX Master 3s
Keyboard	Logitech G915 Wireless Clicky
Software	Fedora KDE Spin

System Name	Kincsem
Processor	AMD Ryzen 9 9950X
Motherboard	ASUS ProArt X870E-CREATOR WIFI
Cooling	Be Quiet Dark Rock Pro 5
Memory	Kingston Fury KF560C32RSK2-96 (2×48GB 6GHz)
Video Card(s)	Sapphire AMD RX 7900 XT Pulse
Storage	Samsung 990PRO 2TB + Samsung 980PRO 2TB + FURY Renegade 2TB+ Adata 2TB + WD Ultrastar HC550 16TB
Display(s)	Acer QHD 27"@144Hz 1ms + UHD 27"@60Hz
Case	Cooler Master CM 690 III
Power Supply	Seasonic 1300W 80+ Gold Prime
Mouse	Logitech G502 Hero
Keyboard	HyperX Alloy Elite RGB
Software	Windows 10-64
Benchmark Scores	https://valid.x86.fr/9qw7iq https://valid.x86.fr/4d8n02 X570 https://www.techpowerup.com/gpuz/g46uc

System Name	Desktop
Processor	i5 13600KF
Motherboard	AsRock B760M Steel Legend Wifi
Cooling	Noctua NH-U9S
Memory	4x 16 Gb Gskill S5 DDR5 @6000
Video Card(s)	Gigabyte Gaming OC 6750 XT 12GB
Storage	WD_BLACK 4TB SN850x
Display(s)	Gigabye M32U
Case	Corsair Carbide 400C
Audio Device(s)	On Board
Power Supply	EVGA Supernova 650 P2
Mouse	MX Master 3s
Keyboard	Logitech G915 Wireless Clicky
Software	Fedora KDE Spin

What local LLM-s you use?

csendesmark

Ultron1337

Easy Rhino

Linux Advocate

csendesmark

Easy Rhino

Linux Advocate