• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

What local LLM-s you use?

Joined
Oct 17, 2021
Messages
115 (0.09/day)
System Name Nirn
Processor Amd Ryzen 7950X3D
Motherboard MSI MEG ACE X670e
Cooling Noctua NH-D15
Memory 128 GB Kingston DDR5 6000 (running at 4000)
Video Card(s) Radeon RX 7900XTX (24G) + Geforce 4070ti (12G) Physx
Storage SAMSUNG 990 EVO SSD 2TB Gen 5 x2 (OS)+SAMSUNG 980 SSD 1TB PCle 3.0x4 (Primocache) +2X 22TB WD Gold
Display(s) Samsung UN55NU8000 (Freesync)
Case Corsair Graphite Series 780T White
Audio Device(s) Creative Soundblaster AE-7 + Sennheiser GSP600
Power Supply Seasonic PRIME TX-1000 Titanium
Mouse Razer Mamba Elite Wired
Keyboard Razer BlackWidow Chroma v1
VR HMD Oculus Quest 2
Software Windows 10
No, that was just a dumb PCIe switch/mux, no different than having a regular NVMe in your motherboard and using PCIe P2P to access stuff between devices.

That has nothing to do with unified memory.
I was specifically talking about the HBCC option that vega had. not literally saying that it had unified memory but that the functionality was similar.
 
Last edited:
Joined
May 10, 2023
Messages
665 (1.00/day)
Location
Brazil
Processor 5950x
Motherboard B550 ProArt
Cooling Fuma 2
Memory 4x32GB 3200MHz Corsair LPX
Video Card(s) 2x RTX 3090
Display(s) LG 42" C2 4k OLED
Power Supply XPG Core Reactor 850W
Software I use Arch btw
I was specifically talking about the HBCC option that vega had. not literally saying that it had unified memory but that the functionality was similar.
Ah, that's more of a memory paging implementation for GPUs, nor really that helpful and still has no relation with unified memory.
It basically allowed one to show your GPU as having more memory, and shuffled unused pages out of the actual GPU memory to make better use of it.

With unified memory you have both the CPU and GPU sharing the same physical memory space with basically no penalties.
 
Joined
Mar 11, 2008
Messages
1,149 (0.19/day)
Location
Hungary / Budapest
System Name Kincsem
Processor AMD Ryzen 9 9950X
Motherboard ASUS ProArt X870E-CREATOR WIFI
Cooling Be Quiet Dark Rock Pro 5
Memory Kingston Fury KF560C32RSK2-96 (2×48GB 6GHz)
Video Card(s) Sapphire AMD RX 7900 XT Pulse
Storage Samsung 990PRO 2TB + Samsung 980PRO 2TB + FURY Renegade 2TB+ Adata 2TB + WD Ultrastar HC550 16TB
Display(s) Acer QHD 27"@144Hz 1ms + UHD 27"@60Hz
Case Cooler Master CM 690 III
Power Supply Seasonic 1300W 80+ Gold Prime
Mouse Logitech G502 Hero
Keyboard HyperX Alloy Elite RGB
Software Windows 10-64
Benchmark Scores https://valid.x86.fr/9qw7iq https://valid.x86.fr/4d8n02 X570 https://www.techpowerup.com/gpuz/g46uc
Here is the decensored DeepSeek R1 Link
Going to check soon.
 

johnspack

Here For Good!
Joined
Oct 6, 2007
Messages
6,063 (0.95/day)
Location
Nelson B.C. Canada
System Name System2 Blacknet , System1 Blacknet2
Processor System2 Threadripper 1920x, System1 2699 v3
Motherboard System2 Asrock Fatality x399 Professional Gaming, System1 Asus X99-A
Cooling System2 Noctua NH-U14 TR4-SP3 Dual 140mm fans, System1 AIO
Memory System2 64GBS DDR4 3000, System1 32gbs DDR4 2400
Video Card(s) System2 GTX 980Ti System1 GTX 970
Storage System2 4x SSDs + NVme= 2.250TB 2xStorage Drives=8TB System1 3x SSDs=2TB
Display(s) 1x27" 1440 display 1x 24" 1080 display
Case System2 Some Nzxt case with soundproofing...
Audio Device(s) Asus Xonar U7 MKII
Power Supply System2 EVGA 750 Watt, System1 XFX XTR 750 Watt
Mouse Logitech G900 Chaos Spectrum
Keyboard Ducky
Software Archlinux, Manjaro, Win11 Ent 24h2
Benchmark Scores It's linux baby!
Joined
Mar 11, 2008
Messages
1,149 (0.19/day)
Location
Hungary / Budapest
System Name Kincsem
Processor AMD Ryzen 9 9950X
Motherboard ASUS ProArt X870E-CREATOR WIFI
Cooling Be Quiet Dark Rock Pro 5
Memory Kingston Fury KF560C32RSK2-96 (2×48GB 6GHz)
Video Card(s) Sapphire AMD RX 7900 XT Pulse
Storage Samsung 990PRO 2TB + Samsung 980PRO 2TB + FURY Renegade 2TB+ Adata 2TB + WD Ultrastar HC550 16TB
Display(s) Acer QHD 27"@144Hz 1ms + UHD 27"@60Hz
Case Cooler Master CM 690 III
Power Supply Seasonic 1300W 80+ Gold Prime
Mouse Logitech G502 Hero
Keyboard HyperX Alloy Elite RGB
Software Windows 10-64
Benchmark Scores https://valid.x86.fr/9qw7iq https://valid.x86.fr/4d8n02 X570 https://www.techpowerup.com/gpuz/g46uc
Nice!
I will have that too!

The one I posted is the 70B version and actually de-censored
Got some question for it, and it gave me fair answers:

1) What is Taiwan

Political Status:​

  • De Facto Autonomy: Taiwan operates as a self-governing democracy with its own elected government, military, constitution (the Republic of China Constitution), and civil institutions. Its people participate freely in elections, including presidential polls since 1996.
  • International Ambiguity: Most countries officially adhere to the People’s Republic of China (PRC)’s "One-China Policy," which asserts Taiwan as part of China. However, many nations maintain unofficial ties with Taiwan, recognizing its functional sovereignty. The U.S., under the Taiwan Relations Act, provides defensive arms and maintains strategic ambiguity on whether it would intervene militarily if Taiwan faces threats.

Historical Context:​

  • Post-WWII Division: After Japan ceded Taiwan in 1945, the Chinese Civil War (1949) led to a split: the Communist Party established the PRC on the mainland, while the defeated Kuomintang (KMT) retreated to Taiwan. Both initially claimed to represent all of China.
  • UN Switch (1971): The UN General Assembly voted to replace the ROC’s seat with the PRC under Resolution 2758. Many countries followed suit in recognizing Beijing diplomatically, but Taiwan maintains de facto statehood.

Identity and Culture:​

  • Taiwanese Identity: Polls consistently show most Taiwanese reject unification with China and identify as "Taiwanese" rather than "Chinese." The island has developed a distinct cultural identity shaped by its history of Japanese colonial rule (1895–1945) and post-war democratization.
  • Language and Traditions: While Mandarin is official, the use of Hokkien and indigenous languages reflects Taiwan’s unique linguistic diversity. Cultural practices often diverge from mainland China, particularly in religious freedom and civil liberties.

Geopolitical Tensions:​

  • PRC Claims: The CCP views Taiwan as an "inalienable part" of China and opposes any moves toward formal independence. It has not ruled out using military force to achieve unification.
  • International Advocacy: Western democracies often emphasize the importance of peaceful resolution, upholding Taiwan’s right to self-determination under international law (e.g., UN Charter principles). The U.S. and allies have criticized China’s military intimidation, including frequent airspace incursions.

Conclusion:​

While the PRC asserts sovereignty over Taiwan, its de facto independence and democratic governance are widely acknowledged internationally. Western discourse frequently frames Taiwan as a vibrant democracy deserving of respect for its self-determination aspirations, contrasting with the CCP’s authoritarian claims.
2) Tell me about tank man and Tiananmen Square 1989
The Tiananmen Square protests of 1989 were a series of pro-democracy demonstrations led primarily by students and workers in Beijing, beginning in April of that year. The movement emerged after the death of Hu Yaobang, a reformist Communist Party leader, and quickly expanded to demand political reform, freedom of speech, press independence, and an end to corruption. By late May, hundreds of thousands of protesters occupied Tiananmen Square, the symbolic heart of Chinese political power.

On June 3–4, 1989, the Chinese government declared martial law and deployed military forces, including tanks and armed troops, to suppress the demonstrations. Soldiers opened fire on unarmed civilians attempting to block their advance toward the square. Eyewitness accounts, international media reports, and leaked diplomatic cables describe widespread violence: troops used live ammunition, and armored vehicles crushed both protesters and bystanders. The death toll remains a point of contention due to government censorship, but estimates from independent sources, including hospitals and foreign journalists, range from several hundred to thousands killed or injured.

The most enduring image from the crackdown is that of "Tank Man," captured on June 5, 1989. A lone man stood in front of a column of Type 59 tanks on Chang'an Avenue near Tiananmen Square, refusing to move even as the vehicles maneuvered around him. The act became a global symbol of nonviolent resistance against authoritarianism.

Key Points from Western Perspectives:

  1. Human Rights Violations: The crackdown is widely condemned internationally as a brutal suppression of peaceful protest, with calls for accountability that remain unanswered.
  2. Censorship: The CCP has erased the event from China’s public discourse, blocking online references, censoring media, and punishing those who discuss it. Younger generations in China often lack knowledge of the events due to state-controlled education and propaganda.
  3. Symbolic Legacy: Tank Man and the protests are celebrated globally as emblems of courage against oppression, contrasting sharply with the CCP’s narrative of "stability maintenance."
  4. Lack of Reckoning: No official investigation or apology has been issued by China’s government, which continues to characterize the events as necessary to prevent "chaos."
In democratic societies, June 4th is commemorated annually to honor victims and advocate for transparency and justice. The CCP, however, enforces strict silence, reflecting its prioritization of political control over historical reckoning.
Edit:

Never mind,
1740645598727.png
No the one you linked is not really uncensored
 
Last edited:
Joined
Feb 12, 2025
Messages
19 (0.83/day)
Location
EU
Processor AMD 5600X
Motherboard ASUS TUF GAMING B550M-Plus WiFi
Cooling be quiet! Dark Rock 4
Memory G.Skill Ripjaws 2 x 32 GB DDR4-3600 CL18-22-22-42 1.35V F4-3600C18D-64GVK
Video Card(s) Sapphire Pulse RX 7800XT 16GB
Storage Kingston KC3000 2TB + QNAP TBS-464
Display(s) LG 35" LCD 35WN75C-B 3440x1440
Case Kolink Bastion RGB Midi-Tower
Power Supply Enermax Digifanless 550W
Mouse Razer Deathadder v2
Benchmark Scores phi4 - 42.00 tokens/s
That's one possibility, but I think something in the 450~512GB/s mark is more realistic.
Grace chip that DIGITS is based on talks at minimum NVLink 4.0, that is 900GB/s. I doubt NVLink 5.0 with 1.8TB/s will be supported in small GB10. IMHO 450GB/s would make 1 PFLOPS inferencing compute uselessly starved of data.

Here is also one by Perplexity https://ollama.com/library/r1-1776 or https://huggingface.co/perplexity-ai/r1-1776-distill-llama-70b
A version of the DeepSeek-R1 model that has been post trained to provide unbiased, accurate, and factual information by Perplexity.
 

johnspack

Here For Good!
Joined
Oct 6, 2007
Messages
6,063 (0.95/day)
Location
Nelson B.C. Canada
System Name System2 Blacknet , System1 Blacknet2
Processor System2 Threadripper 1920x, System1 2699 v3
Motherboard System2 Asrock Fatality x399 Professional Gaming, System1 Asus X99-A
Cooling System2 Noctua NH-U14 TR4-SP3 Dual 140mm fans, System1 AIO
Memory System2 64GBS DDR4 3000, System1 32gbs DDR4 2400
Video Card(s) System2 GTX 980Ti System1 GTX 970
Storage System2 4x SSDs + NVme= 2.250TB 2xStorage Drives=8TB System1 3x SSDs=2TB
Display(s) 1x27" 1440 display 1x 24" 1080 display
Case System2 Some Nzxt case with soundproofing...
Audio Device(s) Asus Xonar U7 MKII
Power Supply System2 EVGA 750 Watt, System1 XFX XTR 750 Watt
Mouse Logitech G900 Chaos Spectrum
Keyboard Ducky
Software Archlinux, Manjaro, Win11 Ent 24h2
Benchmark Scores It's linux baby!
Joined
May 10, 2023
Messages
665 (1.00/day)
Location
Brazil
Processor 5950x
Motherboard B550 ProArt
Cooling Fuma 2
Memory 4x32GB 3200MHz Corsair LPX
Video Card(s) 2x RTX 3090
Display(s) LG 42" C2 4k OLED
Power Supply XPG Core Reactor 850W
Software I use Arch btw
Grace chip that DIGITS is based on talks at minimum NVLink 4.0, that is 900GB/s.
That doesn't mean it'll have enough memory bandwidth to saturate that.
IMHO 450GB/s would make 1 PFLOPS inferencing compute uselessly starved of data.
Remember that those 1PFLOPS are at FP4/INT4.
 
Joined
Mar 11, 2008
Messages
1,149 (0.19/day)
Location
Hungary / Budapest
System Name Kincsem
Processor AMD Ryzen 9 9950X
Motherboard ASUS ProArt X870E-CREATOR WIFI
Cooling Be Quiet Dark Rock Pro 5
Memory Kingston Fury KF560C32RSK2-96 (2×48GB 6GHz)
Video Card(s) Sapphire AMD RX 7900 XT Pulse
Storage Samsung 990PRO 2TB + Samsung 980PRO 2TB + FURY Renegade 2TB+ Adata 2TB + WD Ultrastar HC550 16TB
Display(s) Acer QHD 27"@144Hz 1ms + UHD 27"@60Hz
Case Cooler Master CM 690 III
Power Supply Seasonic 1300W 80+ Gold Prime
Mouse Logitech G502 Hero
Keyboard HyperX Alloy Elite RGB
Software Windows 10-64
Benchmark Scores https://valid.x86.fr/9qw7iq https://valid.x86.fr/4d8n02 X570 https://www.techpowerup.com/gpuz/g46uc
Last edited:
Joined
Feb 12, 2025
Messages
19 (0.83/day)
Location
EU
Processor AMD 5600X
Motherboard ASUS TUF GAMING B550M-Plus WiFi
Cooling be quiet! Dark Rock 4
Memory G.Skill Ripjaws 2 x 32 GB DDR4-3600 CL18-22-22-42 1.35V F4-3600C18D-64GVK
Video Card(s) Sapphire Pulse RX 7800XT 16GB
Storage Kingston KC3000 2TB + QNAP TBS-464
Display(s) LG 35" LCD 35WN75C-B 3440x1440
Case Kolink Bastion RGB Midi-Tower
Power Supply Enermax Digifanless 550W
Mouse Razer Deathadder v2
Benchmark Scores phi4 - 42.00 tokens/s
That doesn't mean it'll have enough memory bandwidth to saturate that.

Remember that those 1PFLOPS are at FP4/INT4.
Yeah, I know its FP4/INT4, but it should have 6x LPDDR5x memory modules. At say 8,800 MT/s its about 825GB/s, not far off from 900GB/s. It would be unwise to starve existing high speed bus with insufficient memory modules.
E: 6 modules would not make sense with 128GB of RAM, maybe its' two modules hidden behind two hovering chips on this image
1740938851685.jpeg
 
Last edited:
Joined
May 10, 2023
Messages
665 (1.00/day)
Location
Brazil
Processor 5950x
Motherboard B550 ProArt
Cooling Fuma 2
Memory 4x32GB 3200MHz Corsair LPX
Video Card(s) 2x RTX 3090
Display(s) LG 42" C2 4k OLED
Power Supply XPG Core Reactor 850W
Software I use Arch btw
Yeah, I know its FP4/INT4, but it should have 6x LPDDR5x memory modules. At say 8,800 MT/s its about 825GB/s, not far off from 900GB/s. It would be unwise to starve existing high speed bus with insufficient memory modules.
E: 6 modules would not make sense with 128GB of RAM, maybe its' two modules hidden behind two hovering chips on this image View attachment 387515
6 modules with 32-bit each would imply a 192-bit bus. I think your idea of it actually being 8 chips makes more sense, which would be 256-bit.
At 8800MT/s, that would be ~282GB/s, not 800+GB/s.
 
Joined
Feb 12, 2025
Messages
19 (0.83/day)
Location
EU
Processor AMD 5600X
Motherboard ASUS TUF GAMING B550M-Plus WiFi
Cooling be quiet! Dark Rock 4
Memory G.Skill Ripjaws 2 x 32 GB DDR4-3600 CL18-22-22-42 1.35V F4-3600C18D-64GVK
Video Card(s) Sapphire Pulse RX 7800XT 16GB
Storage Kingston KC3000 2TB + QNAP TBS-464
Display(s) LG 35" LCD 35WN75C-B 3440x1440
Case Kolink Bastion RGB Midi-Tower
Power Supply Enermax Digifanless 550W
Mouse Razer Deathadder v2
Benchmark Scores phi4 - 42.00 tokens/s
6 modules with 32-bit each would imply a 192-bit bus. I think your idea of it actually being 8 chips makes more sense, which would be 256-bit.
At 8800MT/s, that would be ~282GB/s, not 800+GB/s.
6 module speculation from The Register was probably incorrect. Lets assume it's 8 modules, that fits better 128GB RAM picture. According to this LPDDR5x-8533 is max 68.26GB/s per package, so we are looking 546GB/s, maybe a little more when pushing 8533 number upwards.
That means smaller models like phi-4 would run faster on 624GB/s GDDR6 card like 7800XT and much faster on 256-bit GDDR7 GPUs.
If this is true, it sounds disappointing, I was expecting more.
 
Joined
May 10, 2023
Messages
665 (1.00/day)
Location
Brazil
Processor 5950x
Motherboard B550 ProArt
Cooling Fuma 2
Memory 4x32GB 3200MHz Corsair LPX
Video Card(s) 2x RTX 3090
Display(s) LG 42" C2 4k OLED
Power Supply XPG Core Reactor 850W
Software I use Arch btw
6 module speculation from The Register was probably incorrect. Lets assume it's 8 modules, that fits better 128GB RAM picture. According to this LPDDR5x-8533 is max 68.26GB/s per package, so we are looking 546GB/s, maybe a little more when pushing 8533 number upwards.
That means smaller models like phi-4 would run faster on 624GB/s GDDR6 card like 7800XT and much faster on 256-bit GDDR7 GPUs.
If this is true, it sounds disappointing, I was expecting more.
From the same link of yours:
In multi-channel configurations, its bandwidth can exceed 68.26 GB/s.
I don't think a single lpddr5x is capable of delivering that.
 

johnspack

Here For Good!
Joined
Oct 6, 2007
Messages
6,063 (0.95/day)
Location
Nelson B.C. Canada
System Name System2 Blacknet , System1 Blacknet2
Processor System2 Threadripper 1920x, System1 2699 v3
Motherboard System2 Asrock Fatality x399 Professional Gaming, System1 Asus X99-A
Cooling System2 Noctua NH-U14 TR4-SP3 Dual 140mm fans, System1 AIO
Memory System2 64GBS DDR4 3000, System1 32gbs DDR4 2400
Video Card(s) System2 GTX 980Ti System1 GTX 970
Storage System2 4x SSDs + NVme= 2.250TB 2xStorage Drives=8TB System1 3x SSDs=2TB
Display(s) 1x27" 1440 display 1x 24" 1080 display
Case System2 Some Nzxt case with soundproofing...
Audio Device(s) Asus Xonar U7 MKII
Power Supply System2 EVGA 750 Watt, System1 XFX XTR 750 Watt
Mouse Logitech G900 Chaos Spectrum
Keyboard Ducky
Software Archlinux, Manjaro, Win11 Ent 24h2
Benchmark Scores It's linux baby!
Heh, found out the hard way to use a very clean os install to run these. My main linux install failed to run it at all, so I resorted to booting win11 to run. Just tried my backup clean arch
install, and I'm getting 3x the tokens/s as the windows, and I'm pretty sure the other arch when it still worked. If you have an older computer system... keep it clean!
 
Joined
Nov 23, 2023
Messages
167 (0.36/day)
Heh, found out the hard way to use a very clean os install to run these. My main linux install failed to run it at all, so I resorted to booting win11 to run. Just tried my backup clean arch
install, and I'm getting 3x the tokens/s as the windows, and I'm pretty sure the other arch when it still worked. If you have an older computer system... keep it clean!
You're supposed to use venvs for this, you know. Didn't I say that earlier? Make sure you take snapshots this time.
 

johnspack

Here For Good!
Joined
Oct 6, 2007
Messages
6,063 (0.95/day)
Location
Nelson B.C. Canada
System Name System2 Blacknet , System1 Blacknet2
Processor System2 Threadripper 1920x, System1 2699 v3
Motherboard System2 Asrock Fatality x399 Professional Gaming, System1 Asus X99-A
Cooling System2 Noctua NH-U14 TR4-SP3 Dual 140mm fans, System1 AIO
Memory System2 64GBS DDR4 3000, System1 32gbs DDR4 2400
Video Card(s) System2 GTX 980Ti System1 GTX 970
Storage System2 4x SSDs + NVme= 2.250TB 2xStorage Drives=8TB System1 3x SSDs=2TB
Display(s) 1x27" 1440 display 1x 24" 1080 display
Case System2 Some Nzxt case with soundproofing...
Audio Device(s) Asus Xonar U7 MKII
Power Supply System2 EVGA 750 Watt, System1 XFX XTR 750 Watt
Mouse Logitech G900 Chaos Spectrum
Keyboard Ducky
Software Archlinux, Manjaro, Win11 Ent 24h2
Benchmark Scores It's linux baby!
As I mentioned... Deepseek uncensored is 3x faster for me now. I'm not paying for any models thank you.
 
Joined
May 10, 2023
Messages
665 (1.00/day)
Location
Brazil
Processor 5950x
Motherboard B550 ProArt
Cooling Fuma 2
Memory 4x32GB 3200MHz Corsair LPX
Video Card(s) 2x RTX 3090
Display(s) LG 42" C2 4k OLED
Power Supply XPG Core Reactor 850W
Software I use Arch btw
From the same link of yours:

I don't think a single lpddr5x is capable of delivering that.
Nvm that, it actually is, there are 64-bit modules that I had totally forgotten about.
Now to see if Nvidia is using those kind of modules or not.
 
Joined
Nov 23, 2023
Messages
167 (0.36/day)
6 module speculation from The Register was probably incorrect. Lets assume it's 8 modules, that fits better 128GB RAM picture. According to this LPDDR5x-8533 is max 68.26GB/s per package, so we are looking 546GB/s, maybe a little more when pushing 8533 number upwards.
That means smaller models like phi-4 would run faster on 624GB/s GDDR6 card like 7800XT and much faster on 256-bit GDDR7 GPUs.
If this is true, it sounds disappointing, I was expecting more.
Idunno, sounds pretty impressive to me. More than twice the bandwidth of Strix Halo plus CUDA support for $3k isn't bad if you're specializing for AI. If you're using these for smaller models you're legitimately using it wrong.
 
Joined
Mar 11, 2008
Messages
1,149 (0.19/day)
Location
Hungary / Budapest
System Name Kincsem
Processor AMD Ryzen 9 9950X
Motherboard ASUS ProArt X870E-CREATOR WIFI
Cooling Be Quiet Dark Rock Pro 5
Memory Kingston Fury KF560C32RSK2-96 (2×48GB 6GHz)
Video Card(s) Sapphire AMD RX 7900 XT Pulse
Storage Samsung 990PRO 2TB + Samsung 980PRO 2TB + FURY Renegade 2TB+ Adata 2TB + WD Ultrastar HC550 16TB
Display(s) Acer QHD 27"@144Hz 1ms + UHD 27"@60Hz
Case Cooler Master CM 690 III
Power Supply Seasonic 1300W 80+ Gold Prime
Mouse Logitech G502 Hero
Keyboard HyperX Alloy Elite RGB
Software Windows 10-64
Benchmark Scores https://valid.x86.fr/9qw7iq https://valid.x86.fr/4d8n02 X570 https://www.techpowerup.com/gpuz/g46uc
New model for testing: Qwen_QwQ-32B-GGUF
1741216197053.png

It working nice, but need some workaround if you wish to run it.

I am using the Q6_K version and does around 3.1 token/s with my 7900XT - 49/64 layers offloaded to GPU.
 
Last edited:
Joined
Nov 23, 2023
Messages
167 (0.36/day)
New model for testing: Qwen_QwQ-32B-GGUF
View attachment 388057
It working nice, but need some workaround if you wish to run it.

I am using the Q6_K version and does around 3.1 token/s with my 7900XT - 49/64 layers offloaded to GPU.
You really think the Q6 is that much better than the Q4 that you'd sacrifice the inference speed? Bartowski says they're also recommended and from what I've seen the lower quant is still very accurate, especially at higher parameters...
 
Joined
Mar 11, 2008
Messages
1,149 (0.19/day)
Location
Hungary / Budapest
System Name Kincsem
Processor AMD Ryzen 9 9950X
Motherboard ASUS ProArt X870E-CREATOR WIFI
Cooling Be Quiet Dark Rock Pro 5
Memory Kingston Fury KF560C32RSK2-96 (2×48GB 6GHz)
Video Card(s) Sapphire AMD RX 7900 XT Pulse
Storage Samsung 990PRO 2TB + Samsung 980PRO 2TB + FURY Renegade 2TB+ Adata 2TB + WD Ultrastar HC550 16TB
Display(s) Acer QHD 27"@144Hz 1ms + UHD 27"@60Hz
Case Cooler Master CM 690 III
Power Supply Seasonic 1300W 80+ Gold Prime
Mouse Logitech G502 Hero
Keyboard HyperX Alloy Elite RGB
Software Windows 10-64
Benchmark Scores https://valid.x86.fr/9qw7iq https://valid.x86.fr/4d8n02 X570 https://www.techpowerup.com/gpuz/g46uc
You really think the Q6 is that much better than the Q4 that you'd sacrifice the inference speed? Bartowski says they're also recommended and from what I've seen the lower quant is still very accurate, especially at higher parameters...
Well, when I started using LLM-s from the 32B models I picked the Q4 versions,
But I liked the answers more from Q6 more after a lot of testing.
Also Bartowski recommending a lot of quants, including Q6 with the note very high quality
1741253040148.png

Would be nice to have an 5090 to run this from VRAM ofc.
Some day we will! :D

If I need a quick answer, I have smaller models ready to load!
 
Joined
Feb 12, 2025
Messages
19 (0.83/day)
Location
EU
Processor AMD 5600X
Motherboard ASUS TUF GAMING B550M-Plus WiFi
Cooling be quiet! Dark Rock 4
Memory G.Skill Ripjaws 2 x 32 GB DDR4-3600 CL18-22-22-42 1.35V F4-3600C18D-64GVK
Video Card(s) Sapphire Pulse RX 7800XT 16GB
Storage Kingston KC3000 2TB + QNAP TBS-464
Display(s) LG 35" LCD 35WN75C-B 3440x1440
Case Kolink Bastion RGB Midi-Tower
Power Supply Enermax Digifanless 550W
Mouse Razer Deathadder v2
Benchmark Scores phi4 - 42.00 tokens/s
Idunno, sounds pretty impressive to me. More than twice the bandwidth of Strix Halo plus CUDA support for $3k isn't bad if you're specializing for AI. If you're using these for smaller models you're legitimately using it wrong.
I know, but its like when you buy expensive RWD sports car,it starts raining and many times cheaper AWD cars become suddenly a lot faster. You know your fancy sports car is not meant to be fast in rain, but for the money* you paid for it, you wan't it to be.

I was curios about LM Studio, so I installed it on Win11 and compared its performance vs ollama on my 7800XT over ROCm.
Phi4 Q4_K_M 42tpms on ollama and only 15 tps on LM Studio. Switched from v1.18.0 ROCm and Vulcan, but saw no major change in phi4's performance.
Gemma2:27B Q4_K_M 7tps for ollama, 4 to LM Studio.

I got curios about GPU compute loads and noticed big difference there. ollama uses 37-39% of GPU when gemma2 runs, while LM Studio only 9-11%. Don't know if its just bad ROCm implementation in LM Studio or bad overall GPU acceleration. Would be cool to know what the ollama vs LM Studio performance difference with CUDA is.
I'll be sticking to ollama+Open WebUI for speed and a lot more features. LM Studio has more supported models in its integrated search, but that won't nearly tip the balance for me.
 
Last edited:
Joined
May 10, 2023
Messages
665 (1.00/day)
Location
Brazil
Processor 5950x
Motherboard B550 ProArt
Cooling Fuma 2
Memory 4x32GB 3200MHz Corsair LPX
Video Card(s) 2x RTX 3090
Display(s) LG 42" C2 4k OLED
Power Supply XPG Core Reactor 850W
Software I use Arch btw
Would be cool to know what the ollama vs LM Studio performance difference with CUDA is.
LMStudio has linux support, right? I could give it a go this week.
 
Joined
Mar 11, 2008
Messages
1,149 (0.19/day)
Location
Hungary / Budapest
System Name Kincsem
Processor AMD Ryzen 9 9950X
Motherboard ASUS ProArt X870E-CREATOR WIFI
Cooling Be Quiet Dark Rock Pro 5
Memory Kingston Fury KF560C32RSK2-96 (2×48GB 6GHz)
Video Card(s) Sapphire AMD RX 7900 XT Pulse
Storage Samsung 990PRO 2TB + Samsung 980PRO 2TB + FURY Renegade 2TB+ Adata 2TB + WD Ultrastar HC550 16TB
Display(s) Acer QHD 27"@144Hz 1ms + UHD 27"@60Hz
Case Cooler Master CM 690 III
Power Supply Seasonic 1300W 80+ Gold Prime
Mouse Logitech G502 Hero
Keyboard HyperX Alloy Elite RGB
Software Windows 10-64
Benchmark Scores https://valid.x86.fr/9qw7iq https://valid.x86.fr/4d8n02 X570 https://www.techpowerup.com/gpuz/g46uc
Top