• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Dear AMD, NVIDIA, INTEL and others, we need cheap (192-bit to 384-bit), high VRAM, consumer, GPUs to locally self-host/inference AI/LLMs

Joined
Nov 23, 2023
Messages
55 (0.12/day)
Time will tell how silly this market will get over time, I guess.

Here you go..

Would you look at that. Nice to see it might actually come out, this Frank guy really sucks when it comes to rumors honestly. Too early to say "I told you so" though, I'll wait for something to actually get released to do that :cool:

Agree, but I think that eventually given the rising the demands of LLMs, they'll develop a solution that is dedicated to training and inference. IMHO, Project DIGITS is probably a "prototype" of sorts, remember the CMP crypto mining processor series? It'll likely be something like that, but not rushed out and derived from gaming cards as those used to be: probably a dedicated GPGPU or FPGA processor that lacks a display engine and pretty much every other area useful for most computing tasks, but tailored specifically for inferencing, kind of something like this AMD/Xilinx card, which is a bit older now:
Eh, they'll have to do something at some point. Once CPUs get better at inference and training, people might just use that instead. I don't think devs are even doing anything for XDNA hardware, but unified memory alone is pretty huge. Strix and Medusa Halo should come cheaper than Nvidia's $3k option...
 
Joined
Dec 25, 2020
Messages
7,603 (5.01/day)
Location
São Paulo, Brazil
System Name "Icy Resurrection"
Processor 13th Gen Intel Core i9-13900KS
Motherboard ASUS ROG Maximus Z790 Apex Encore
Cooling Noctua NH-D15S upgraded with 2x NF-F12 iPPC-3000 fans and Honeywell PTM7950 TIM
Memory 32 GB G.SKILL Trident Z5 RGB F5-6800J3445G16GX2-TZ5RK @ 7600 MT/s 36-44-44-52-96 1.4V
Video Card(s) NVIDIA RTX A2000
Storage 500 GB WD Black SN750 SE NVMe SSD + 4 TB WD Red Plus WD40EFPX HDD
Display(s) 55-inch LG G3 OLED
Case Pichau Mancer CV500 White Edition
Audio Device(s) Sony MDR-V7 connected through Apple USB-C
Power Supply EVGA 1300 G2 1.3kW 80+ Gold
Mouse Microsoft Classic IntelliMouse (2017)
Keyboard IBM Model M type 1391405
Software Windows 10 Pro 22H2
Benchmark Scores I pulled a Qiqi~
Would you look at that. Nice to see it might actually come out, this Frank guy really sucks when it comes to rumors honestly. Too early to say "I told you so" though, I'll wait for something to actually get released to do that :cool:


Eh, they'll have to do something at some point. Once CPUs get better at inference and training, people might just use that instead. I don't think devs are even doing anything for XDNA hardware, but unified memory alone is pretty huge. Strix and Medusa Halo should come cheaper than Nvidia's $3k option...

Unified memory is only necessary because LLM data sets are too large to fit on a GPU's dedicated VRAM. That enables marketing departments to make deceitful, misleading slides that fanboys of respective companies (and this is not a problem with AMD or NVIDIA, it applies to pretty much both of them) will often parrot without question, such as this little gem right here:

aimax.png


Of course it's "up to 2.2x faster" when you can actually load the model onto memory (provided you have at least 96 or 128 GB of RAM in this case), and you're not at all compute bottlenecked, which is the issue quite literally any GPU short of NVIDIA's many-thousand-dollar, 80 GB+ HBM AI accelerators right now. Needless to say, the person who posted this slide to me as a rebuttal on X (where I told them, that if they believed this product was faster than a 4090 at anything, I had a bridge to sell 'em) summarily blocked me right after posting it and calling me a "smug f**k", go figure. For context, Ryzen AI Max Plus 395 is rated at 126 AI TOPS, an RTX 4090 is, at a worst case scenario basis, 10x faster.
 
Joined
Nov 23, 2023
Messages
55 (0.12/day)
Unified memory is only necessary because LLM data sets are too large to fit on a GPU's dedicated VRAM. That enables marketing departments to make deceitful, misleading slides that fanboys of respective companies (and this is not a problem with AMD or NVIDIA, it applies to pretty much both of them) will often parrot without question, such as this little gem right here:

View attachment 385052

Of course it's "up to 2.2x faster" when you can actually load the model onto memory (provided you have at least 96 or 128 GB of RAM in this case), and you're not at all compute bottlenecked, which is the issue quite literally any GPU short of NVIDIA's many-thousand-dollar, 80 GB+ HBM AI accelerators right now. Needless to say, the person who posted this slide to me as a rebuttal on X (where I told them, that if they believed this product was faster than a 4090 at anything, I had a bridge to sell 'em) summarily blocked me right after posting it and calling me a "smug f**k", go figure. For context, Ryzen AI Max Plus 395 is rated at 126 AI TOPS, an RTX 4090 is, at a worst case scenario basis, 10x faster.
Unified memory is pretty fast in it's own right and has an APU that can use it. It should be around GDDR5 performance or so. I don't expect DDR5 or quad channel to do anywhere near as well in inference as an M2 or Strix Halo, even if you could offload the same model quant completely on GPU and the rest on RAM.

I don't really consider TOPS that great of a metric, either. It's just comes off as another FLOPS thing that doesn't really matter in real world performance.
 
Joined
Dec 25, 2020
Messages
7,603 (5.01/day)
Location
São Paulo, Brazil
System Name "Icy Resurrection"
Processor 13th Gen Intel Core i9-13900KS
Motherboard ASUS ROG Maximus Z790 Apex Encore
Cooling Noctua NH-D15S upgraded with 2x NF-F12 iPPC-3000 fans and Honeywell PTM7950 TIM
Memory 32 GB G.SKILL Trident Z5 RGB F5-6800J3445G16GX2-TZ5RK @ 7600 MT/s 36-44-44-52-96 1.4V
Video Card(s) NVIDIA RTX A2000
Storage 500 GB WD Black SN750 SE NVMe SSD + 4 TB WD Red Plus WD40EFPX HDD
Display(s) 55-inch LG G3 OLED
Case Pichau Mancer CV500 White Edition
Audio Device(s) Sony MDR-V7 connected through Apple USB-C
Power Supply EVGA 1300 G2 1.3kW 80+ Gold
Mouse Microsoft Classic IntelliMouse (2017)
Keyboard IBM Model M type 1391405
Software Windows 10 Pro 22H2
Benchmark Scores I pulled a Qiqi~
Unified memory is pretty fast in it's own right and has an APU that can use it. It should be around GDDR5 performance or so. I don't expect DDR5 or quad channel to do anywhere near as well in inference as an M2 or Strix Halo, even if you could offload the same model quant completely on GPU and the rest on RAM.

I don't really consider TOPS that great of a metric, either. It's just comes off as another FLOPS thing that doesn't really matter in real world performance.

Mostly because memory is so important here. It's pretty much the only thing that matters until that requirement is satisfied.
 
Top