• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD Introduces GAIA - an Open-Source Project That Runs Local LLMs on Ryzen AI NPUs

T0@st

News Editor
Joined
Mar 7, 2023
Messages
2,780 (3.69/day)
Location
South East, UK
System Name The TPU Typewriter
Processor AMD Ryzen 5 5600 (non-X)
Motherboard GIGABYTE B550M DS3H Micro ATX
Cooling DeepCool AS500
Memory Kingston Fury Renegade RGB 32 GB (2 x 16 GB) DDR4-3600 CL16
Video Card(s) PowerColor Radeon RX 7800 XT 16 GB Hellhound OC
Storage Samsung 980 Pro 1 TB M.2-2280 PCIe 4.0 X4 NVME SSD
Display(s) Lenovo Legion Y27q-20 27" QHD IPS monitor
Case GameMax Spark M-ATX (re-badged Jonsbo D30)
Audio Device(s) FiiO K7 Desktop DAC/Amp + Philips Fidelio X3 headphones, or ARTTI T10 Planar IEMs
Power Supply ADATA XPG CORE Reactor 650 W 80+ Gold ATX
Mouse Roccat Kone Pro Air
Keyboard Cooler Master MasterKeys Pro L
Software Windows 10 64-bit Home Edition
AMD has launched a new open-source project called, GAIA (pronounced /ˈɡaɪ.ə/), an awesome application that leverages the power of Ryzen AI Neural Processing Unit (NPU) to run private and local large language models (LLMs). In this blog, we'll dive into the features and benefits of GAIA, while introducing how you can take advantage of GAIA's open-source project to adopt into your own applications.

Introduction to GAIA
GAIA is a generative AI application designed to run local, private LLMs on Windows PCs and is optimized for AMD Ryzen AI hardware (AMD Ryzen AI 300 Series Processors). This integration allows for faster, more efficient processing - i.e. lower power- while keeping your data local and secure. On Ryzen AI PCs, GAIA interacts with the NPU and iGPU to run models seamlessly by using the open-source Lemonade (LLM-Aid) SDK from ONNX TurnkeyML for LLM inference. GAIA supports a variety of local LLMs optimized to run on Ryzen AI PCs. Popular models like Llama and Phi derivatives can be tailored for different use cases, such as Q&A, summarization, and complex reasoning tasks.




Getting Started with GAIA
To get started with GAIA in under 10 minutes. Follow the instructions to download and install GAIA on your Ryzen AI PC. Once installed, you can launch GAIA and begin exploring its various agents and capabilities. There are 2 versions of GAIA:
  • 1) GAIA Installer - this will run on any Windows PC; however, performance may be slower.
  • 2) GAIA Hybrid Installer - this package is optimized to run on Ryzen AI PCs and uses the NPU and iGPU for better performance.

The Agent RAG Pipeline
One of the standout features of GAIA is its agent Retrieval-Augmented Generation (RAG) pipeline. This pipeline combines an LLM with a knowledge base, enabling the agent to retrieve relevant information, reason, plan, and use external tools within an interactive chat environment. This results in more accurate and contextually aware responses.

The current GAIA agents enable the following capabilities:
  • Simple Prompt Completion: No agent for direct model interaction for testing and evaluation.
  • Chaty: an LLM chatbot with history that engages in conversation with the user.
  • Clip: an Agentic RAG for YouTube search and Q&A agent.
  • Joker: a simple joke generator using RAG to bring humor to the user.

Additional agents are currently in development, and developers are encouraged to create and contribute their own agent to GAIA.

How does GAIA Work?
The left side of Figure 2: GAIA Overview Diagram illustrates the functionality of Lemonade SDK from TurnkeyML. Lemonade SDK provides tools for LLM-specific tasks such as prompting, accuracy measurement, and serving across multiple runtimes (e.g., Hugging Face, ONNX Runtime GenAI API) and hardware (CPU, iGPU, and NPU).



Lemonade exposes an LLM web service that communicates with the GAIA application (on the right) via an OpenAI compatible REST API. GAIA consists of three key components:
  • 1) LLM Connector - Bridges the NPU service's Web API with the LlamaIndex-based RAG pipeline.
  • 2) LlamaIndex RAG Pipeline - Includes a query engine and vector memory, which processes and stores relevant external information.
  • 3) Agent Web Server - Connects to the GAIA UI via WebSocket, enabling user interaction.

On the right side of the figure, GAIA acts as an AI-powered agent that retrieves and processes data. It vectorizes external content (e.g., GitHub, YouTube, text files) and stores it in a local vector index. When a user submits a query, the following process occurs:
  • 1) The query is sent to GAIA, where it is transformed into an embedding vector.
  • 2) The vectorized query is used to retrieve relevant context from the indexed data.
  • 3) The retrieved context is passed to the web service, where it is embedded into the LLM's prompt.
  • 4) The LLM generates a response, which is streamed back through the GAIA web service and displayed in the UI.

This process ensures that user queries are enhanced with relevant context before being processed by the LLM, improving response accuracy and relevance. The final answer is delivered to the user in real-time through the UI.

Benefits of Running LLMs Locally
Running LLMs locally on the NPU offers several benefits:
  • Enhanced privacy, as no data needs to leave your machine. This eliminates the need to send sensitive information to the cloud, greatly enhancing data privacy and security while still delivering high-performance AI capabilities.
  • Reduced latency, since there's no need to communicate with the cloud.
  • Optimized performance with the NPU, leading to faster response times and lower power consumption.

Comparing NPU and iGPU
Running GAIA on the NPU results in improved performance for AI-specific tasks, as it is designed for inference workloads. Beginning with Ryzen AI Software Release 1.3, there is hybrid support for deploying quantized LLMs that utilize both the NPU and the iGPU. By using both components, each can be applied to the tasks and operations they are optimized for.

Applications and Industries
This setup could benefit industries that require high performance and privacy, such as healthcare, finance, and enterprise applications where data privacy is critical. It can also be applied in fields like content creation and customer service automation, where generative AI models are becoming essential. Lastly, it helps industries without Wi-Fi to send data to the cloud and receive responses, as all the processing is done locally.

Conclusion
In conclusion, GAIA, an open-source AMD application, uses the power of the Ryzen AI NPU to deliver efficient, private, and high-performance LLMs. By running LLMs locally, GAIA ensures enhanced privacy, reduced latency, and optimized performance, making it ideal for industries that prioritize data security and rapid response times.


Ready to try GAIA yourself? Our video provides a brief overview and installation demo of GAIA.

Check out and contribute to the GAIA repo at github.com/amd/gaia. For feedback or questions, please reach out to us at GAIA@amd.com.

View at TechPowerUp Main Site | Source
 
Joined
Sep 6, 2013
Messages
3,655 (0.87/day)
Location
Athens, Greece
System Name 3 desktop systems: Gaming / Internet / HTPC
Processor Ryzen 5 7600 / Ryzen 5 4600G / Ryzen 5 5500
Motherboard X670E Gaming Plus WiFi / MSI X470 Gaming Plus Max (1) / MSI X470 Gaming Plus Max (2)
Cooling Aigo ICE 400SE / Segotep T4 / Νoctua U12S
Memory Kingston FURY Beast 32GB DDR5 6000 / 16GB JUHOR / 32GB G.Skill RIPJAWS 3600 + Aegis 3200
Video Card(s) ASRock RX 6600 / Vega 7 integrated / Radeon RX 580
Storage NVMes, ONLY NVMes / NVMes, SATA Storage / NVMe, SATA, external storage
Display(s) Philips 43PUS8857/12 UHD TV (120Hz, HDR, FreeSync Premium) / 19'' HP monitor + BlitzWolf BW-V5
Case Sharkoon Rebel 12 / CoolerMaster Elite 361 / Xigmatek Midguard
Audio Device(s) onboard
Power Supply Chieftec 850W / Silver Power 400W / Sharkoon 650W
Mouse CoolerMaster Devastator III Plus / CoolerMaster Devastator / Logitech
Keyboard CoolerMaster Devastator III Plus / CoolerMaster Devastator / Logitech
Software Windows 10 / Windows 10&Windows 11 / Windows 10
Nice. In an era where AI is everywhere, AMD needs to not just follow, but even start offering more options to it's customers. They need it to stay competitive.

1) GAIA Installer - this will run on any Windows PC; however, performance may be slower.
That's also a good choice from them. They don't offer just a Ryzen AI optimized solution in an effort to use it as a marketing tool.
 
Joined
Oct 2, 2004
Messages
13,801 (1.84/day)
System Name Dark Monolith
Processor AMD Ryzen 7 5800X3D
Motherboard ASUS Strix X570-E
Cooling Arctic Cooling Freezer II 240mm + 2x SilentWings 3 120mm
Memory 64 GB G.Skill Ripjaws V Black
Video Card(s) XFX Radeon RX 9070 XT Mercury OC Magnetic Air
Storage Seagate Firecuda 530 4 TB SSD + Samsung 850 Pro 2 TB SSD + Seagate Barracuda 8 TB HDD
Display(s) ASUS ROG Swift PG27AQDM 240Hz OLED
Case Silverstone Kublai KL-07
Audio Device(s) Sound Blaster AE-9 MUSES Edition + Altec Lansing MX5021 2.1 Nichicon Gold
Power Supply BeQuiet DarkPower 11 Pro 750W
Mouse Logitech G502 Core
Keyboard UVI Pride MechaOptical
Software Windows 11 Pro
Joined
Oct 31, 2024
Messages
161 (1.08/day)
Location
Earth, The Sol System, The Milky Way, The Universe
System Name Alienware Aurora R13
Processor i9-12900kf
Motherboard Alienware 0C92D0
Cooling Alienware CPU water block AIO thing
Memory 32gbs of DDR5-4400 (slow, I know)
Video Card(s) Dell/Alienware RTX 3080ti
Storage NVMe KIOXIA KXG70ZNV1T02 1024GB
Display(s) Asus ROG PG32UCDM 4k 240hz 32'
Case Alienware Aurora R13
Audio Device(s) Soundcore Life Q20 headphones
Power Supply Dell-Something-Or-Other 1000? watt
Mouse Logitech Pro Superlight 2
Keyboard Razer BlackWidow V4 with Razer Green switches
VR HMD None
Software Ubuntu Linux 24.10
Joined
Mar 13, 2025
Messages
91 (5.69/day)
System Name My Gamer
Processor 9900X3D
Motherboard As Rock X870E Taichi
Cooling Thermalright Elite 360
Memory Gskill DDR5 64GB 30 1.35 volts
Video Card(s) 7900XT
Storage Corsair MP700 boot
Display(s) FV43U
Case 7000D Airflow
Audio Device(s) Void Headset, Creatibe Speakers
Power Supply Super Flower Leadex 1000W
Mouse AsusTuf M300
I Asked Co-pilot what AI I could use to mine Bitcoin and it actually gave me 3 choices. To me AI is nothing more than an Enclyocpedia (Co Pilot). I guess this means it will do it faster. The next question I might ask is how to turn off Windows telemetry hehe.
 

Cheeseball

Not a Potato
Supporter
Joined
Jan 2, 2009
Messages
2,239 (0.38/day)
Location
Pittsburgh, PA
System Name Titan
Processor AMD Ryzen™ 7 7950X3D
Motherboard ASRock X870 Taichi Lite
Cooling Thermalright Phantom Spirit 120 EVO CPU
Memory TEAMGROUP T-Force Delta RGB 2x16GB DDR5-6000 CL30
Video Card(s) ASRock Radeon RX 7900 XTX 24 GB GDDR6 (MBA)
Storage Crucial T500 2TB x 3
Display(s) LG 32GS95UE-B, ASUS ROG Swift OLED (PG27AQDP), LG C4 42" (OLED42C4PUA)
Case Cooler Master QUBE 500 Flatpack Macaron
Audio Device(s) Kanto Audio YU2 and SUB8 Desktop Speakers and Subwoofer, Cloud Alpha Wireless
Power Supply Corsair SF1000
Mouse Logitech Pro Superlight 2 (White), G303 Shroud Edition
Keyboard Keychron K2 HE Wireless / 8BitDo Retro Mechanical Keyboard (N Edition) / NuPhy Air75 v2
VR HMD Meta Quest 3 512GB
Software Windows 11 Pro 64-bit 24H2 Build 26100.2605
Joined
Jun 18, 2015
Messages
436 (0.12/day)
Location
Perth , West Australia
System Name schweinestalle1 and schweinestalle 2
Processor AMD Ryzen 7 5700X3D / AMD Ryzen 3200G
Motherboard Asus Prime - Pro X570 + Asus PCI -E AC68 Adapter / Asus Prime B450 M-K
Cooling AMD Wraith Prism / AMD Wraith
Memory Kingston HyperX 2 x 16 gb DDR 4 3200mhz / Kingston HyperX 2x 8Gb DDR 3200mhz
Video Card(s) AMD Radeon RX 7800 XT 16GB Pulse / AMD Reference Vega 64 8GB
Storage Crucial 1TB M.2 SSD and WD Blue 500gb Nand SSD / WD Blue 240gb M.2 SSD
Display(s) Asus XG 32 V ROG and LG ultra gear 32gs75q / TCL TV
Case Corsair AIR ATX / Corsair Air Mini ATX
Audio Device(s) Realtech standard / Realtech standard
Power Supply Corsair 850 Modular / Corsair 750 Modular
Mouse CM Havoc / Microsoft Wireless
Keyboard Corsair Cherry Mechanical / Razor piece of shit
Software Win 10 / win 10
Benchmark Scores Soon ! whateva
Joined
Jun 30, 2019
Messages
46 (0.02/day)
Processor Ryzen 7 7700X
Motherboard ASRock B650E PG Riptide WiFi
Cooling Noctua NH-D15
Memory Kingston Fury Beast 32GB 5600 MHz CL36 @ 6200 MHz
Video Card(s) AMD Radeon RX 6600
Case Fractal Design Define R5
Power Supply Corsair RM550x
Are Radeon GPUs supported?
 
Joined
Sep 15, 2024
Messages
65 (0.33/day)
AI Playground 2.0, released the 22nd of July 2024. It had only supported Intel, it seems like, whereas this claims to run on all.


Running GAIA on the NPU results in improved performance for AI-specific tasks, as it is designed for inference workloads.
Isn’t that wrong? As far as I’m aware, the GPU plainly has enough raw power to beat the NPU, even in inference. (I cannot find numbers for that.)
 

Cheeseball

Not a Potato
Supporter
Joined
Jan 2, 2009
Messages
2,239 (0.38/day)
Location
Pittsburgh, PA
System Name Titan
Processor AMD Ryzen™ 7 7950X3D
Motherboard ASRock X870 Taichi Lite
Cooling Thermalright Phantom Spirit 120 EVO CPU
Memory TEAMGROUP T-Force Delta RGB 2x16GB DDR5-6000 CL30
Video Card(s) ASRock Radeon RX 7900 XTX 24 GB GDDR6 (MBA)
Storage Crucial T500 2TB x 3
Display(s) LG 32GS95UE-B, ASUS ROG Swift OLED (PG27AQDP), LG C4 42" (OLED42C4PUA)
Case Cooler Master QUBE 500 Flatpack Macaron
Audio Device(s) Kanto Audio YU2 and SUB8 Desktop Speakers and Subwoofer, Cloud Alpha Wireless
Power Supply Corsair SF1000
Mouse Logitech Pro Superlight 2 (White), G303 Shroud Edition
Keyboard Keychron K2 HE Wireless / 8BitDo Retro Mechanical Keyboard (N Edition) / NuPhy Air75 v2
VR HMD Meta Quest 3 512GB
Software Windows 11 Pro 64-bit 24H2 Build 26100.2605
Are Radeon GPUs supported?
No, this is aimed specifically at the NPU and iGPU in the Strix Point APUs. If you have a 7000 or 9000 series GPU then you might as well run LM Studio directly.

Isn’t that wrong? As far as I’m aware, the GPU plainly has enough raw power to beat the NPU, even in inference. (I cannot find numbers for that.)
That statement doesn't imply that the NPU has improved performance over a GPU at all. It means that instead of using either the CPU or GPU to run AI-related workloads, you can now utilize the NPU with GAIA and have the other two processors free for other tasks.
 
Joined
May 10, 2023
Messages
794 (1.15/day)
Location
Brazil
Processor 5950x
Motherboard B550 ProArt
Cooling Fuma 2
Memory 4x32GB 3200MHz Corsair LPX
Video Card(s) 2x RTX 3090
Display(s) LG 42" C2 4k OLED
Power Supply XPG Core Reactor 850W
Software I use Arch btw
Isn’t that wrong? As far as I’m aware, the GPU plainly has enough raw power to beat the NPU, even in inference. (I cannot find numbers for that.)
This piece of software is meant to run smaller quantized models at INT4. Afaik, rdna 3.5 lacks support for that data type, and thus the NPU will indeed be faster.
 
Joined
Sep 15, 2024
Messages
65 (0.33/day)
This piece of software is meant to run smaller quantized models at INT4. Afaik, rdna 3.5 lacks support for that data type, and thus the NPU will indeed be faster.
Here it mentions RDNA 3.5 in Strix Point being able to go down to INT4, but I don’t understand enough to be sure about its impact: https://chipsandcheese.com/p/lunar-lakes-igpu-debut-of-intels
(I find it hard to find info on this, even though I’d say I’ve given it a good search. Maybe I’m too unexperienced, maybe Google is being dumb; also not on top of my game and heading to sleep soon.)
That statement doesn't imply that the NPU has improved performance over a GPU at all. It means that instead of using either the CPU or GPU to run AI-related workloads, you can now utilize the NPU with GAIA and have the other two processors free for other tasks.
Oh, that’s a very charitable reading of yours. Usually performance is about the item at hand (and more often than not, isolated benchmarks), had they meant whole-system performance, they should have said so.
 

Cheeseball

Not a Potato
Supporter
Joined
Jan 2, 2009
Messages
2,239 (0.38/day)
Location
Pittsburgh, PA
System Name Titan
Processor AMD Ryzen™ 7 7950X3D
Motherboard ASRock X870 Taichi Lite
Cooling Thermalright Phantom Spirit 120 EVO CPU
Memory TEAMGROUP T-Force Delta RGB 2x16GB DDR5-6000 CL30
Video Card(s) ASRock Radeon RX 7900 XTX 24 GB GDDR6 (MBA)
Storage Crucial T500 2TB x 3
Display(s) LG 32GS95UE-B, ASUS ROG Swift OLED (PG27AQDP), LG C4 42" (OLED42C4PUA)
Case Cooler Master QUBE 500 Flatpack Macaron
Audio Device(s) Kanto Audio YU2 and SUB8 Desktop Speakers and Subwoofer, Cloud Alpha Wireless
Power Supply Corsair SF1000
Mouse Logitech Pro Superlight 2 (White), G303 Shroud Edition
Keyboard Keychron K2 HE Wireless / 8BitDo Retro Mechanical Keyboard (N Edition) / NuPhy Air75 v2
VR HMD Meta Quest 3 512GB
Software Windows 11 Pro 64-bit 24H2 Build 26100.2605
Oh, that’s a very charitable reading of yours. Usually performance is about the item at hand (and more often than not, isolated benchmarks), had they meant whole-system performance, they should have said so.
It is though. The item at hand is just the NPU. The GPU was not mentioned in that statement and thus is not the one being compared to.

If you were inferring that a modern GPU is better, then generally thats true just by memory bandwidth limitation alone.

The point is the NPU part of the newer APUs can now be used instead of just being a CoPilot+ selling point. 50 TOPS is nothing compared to a RTX 4060 (242 TOPS by itself), but at least we have access to it now.
 
Joined
Oct 17, 2021
Messages
134 (0.11/day)
System Name Nirn
Processor Amd Ryzen 7950X3D
Motherboard MSI MEG ACE X670e
Cooling Noctua NH-D15
Memory 128 GB Kingston DDR5 6000 (running at 4000)
Video Card(s) Radeon RX 7900XTX (24G) + Geforce 4070ti (12G) Physx
Storage SAMSUNG 990 EVO SSD 2TB Gen 5 x2 (OS)+SAMSUNG 980 SSD 1TB PCle 3.0x4 (Primocache) +2X 22TB WD Gold
Display(s) Samsung UN55NU8000 (Freesync)
Case Corsair Graphite Series 780T White
Audio Device(s) Creative Soundblaster AE-7 + Sennheiser GSP600
Power Supply Seasonic PRIME TX-1000 Titanium
Mouse Razer Mamba Elite Wired
Keyboard Razer BlackWidow Chroma v1
VR HMD Oculus Quest 2
Software Windows 10
wake me up when they integrate the npu into the big desktop processors.
 
Joined
May 10, 2023
Messages
794 (1.15/day)
Location
Brazil
Processor 5950x
Motherboard B550 ProArt
Cooling Fuma 2
Memory 4x32GB 3200MHz Corsair LPX
Video Card(s) 2x RTX 3090
Display(s) LG 42" C2 4k OLED
Power Supply XPG Core Reactor 850W
Software I use Arch btw
Here it mentions RDNA 3.5 in Strix Point being able to go down to INT4, but I don’t understand enough to be sure about its impact: https://chipsandcheese.com/p/lunar-lakes-igpu-debut-of-intels
(I find it hard to find info on this, even though I’d say I’ve given it a good search. Maybe I’m too unexperienced, maybe Google is being dumb; also not on top of my game and heading to sleep soon.)
Oh, I stand corrected, thanks. The ISA reference does mention INT4 support as well for WMMA:
1742828608266.png


Still, given how those WMMA instructions use the regular vector units on RDNA3.5 and older, I believe the NPU has a higher rate when processing those, but I couldn't find the proper numbers after a quick google.
Made in Python...
Most of the AI stack out there is written in python, I don't see why this would be any different.
 
Joined
Mar 16, 2017
Messages
258 (0.09/day)
Location
behind you
Processor Threadripper 1950X
Motherboard ASRock X399 Professional Gaming
Cooling IceGiant ProSiphon Elite
Memory 48GB DDR4 2934MHz
Video Card(s) MSI GTX 1080
Storage 4TB Crucial P3 Plus NVMe, 1TB Samsung 980 NVMe, 1TB Inland NVMe, 2TB Western Digital HDD
Display(s) 2x 4K60
Power Supply Cooler Master Silent Pro M (1000W)
Mouse Corsair Ironclaw Wireless
Keyboard Corsair K70 MK.2
VR HMD HTC Vive Pro
Software Windows 10, QubesOS
Are Radeon GPUs supported?
There's already plenty of LLM AI software that can run on Radeon GPUs much of which is open source. Ollama is a common backend, Msty is a particularly good free (albeit closed source) interface that can use it. Really the only thing notable about this is that it can use the Rysen AI's NPU as there's plenty of fully open source alternatives at this point.
 
Joined
May 10, 2023
Messages
794 (1.15/day)
Location
Brazil
Processor 5950x
Motherboard B550 ProArt
Cooling Fuma 2
Memory 4x32GB 3200MHz Corsair LPX
Video Card(s) 2x RTX 3090
Display(s) LG 42" C2 4k OLED
Power Supply XPG Core Reactor 850W
Software I use Arch btw
Scripting language will never be fast and memory efficient.
It doesn't matter, python is just used as a front-end/glue code to the actual inference engines.
 
Top