Friday, March 21st 2025

AMD Introduces GAIA - an Open-Source Project That Runs Local LLMs on Ryzen AI NPUs

AMD has launched a new open-source project called, GAIA (pronounced /ˈɡaɪ.ə/), an awesome application that leverages the power of Ryzen AI Neural Processing Unit (NPU) to run private and local large language models (LLMs). In this blog, we'll dive into the features and benefits of GAIA, while introducing how you can take advantage of GAIA's open-source project to adopt into your own applications.

Introduction to GAIA
GAIA is a generative AI application designed to run local, private LLMs on Windows PCs and is optimized for AMD Ryzen AI hardware (AMD Ryzen AI 300 Series Processors). This integration allows for faster, more efficient processing - i.e. lower power- while keeping your data local and secure. On Ryzen AI PCs, GAIA interacts with the NPU and iGPU to run models seamlessly by using the open-source Lemonade (LLM-Aid) SDK from ONNX TurnkeyML for LLM inference. GAIA supports a variety of local LLMs optimized to run on Ryzen AI PCs. Popular models like Llama and Phi derivatives can be tailored for different use cases, such as Q&A, summarization, and complex reasoning tasks.
Getting Started with GAIA
To get started with GAIA in under 10 minutes. Follow the instructions to download and install GAIA on your Ryzen AI PC. Once installed, you can launch GAIA and begin exploring its various agents and capabilities. There are 2 versions of GAIA:
  • 1) GAIA Installer - this will run on any Windows PC; however, performance may be slower.
  • 2) GAIA Hybrid Installer - this package is optimized to run on Ryzen AI PCs and uses the NPU and iGPU for better performance.
The Agent RAG Pipeline
One of the standout features of GAIA is its agent Retrieval-Augmented Generation (RAG) pipeline. This pipeline combines an LLM with a knowledge base, enabling the agent to retrieve relevant information, reason, plan, and use external tools within an interactive chat environment. This results in more accurate and contextually aware responses.

The current GAIA agents enable the following capabilities:
  • Simple Prompt Completion: No agent for direct model interaction for testing and evaluation.
  • Chaty: an LLM chatbot with history that engages in conversation with the user.
  • Clip: an Agentic RAG for YouTube search and Q&A agent.
  • Joker: a simple joke generator using RAG to bring humor to the user.
Additional agents are currently in development, and developers are encouraged to create and contribute their own agent to GAIA.

How does GAIA Work?
The left side of Figure 2: GAIA Overview Diagram illustrates the functionality of Lemonade SDK from TurnkeyML. Lemonade SDK provides tools for LLM-specific tasks such as prompting, accuracy measurement, and serving across multiple runtimes (e.g., Hugging Face, ONNX Runtime GenAI API) and hardware (CPU, iGPU, and NPU).

Lemonade exposes an LLM web service that communicates with the GAIA application (on the right) via an OpenAI compatible REST API. GAIA consists of three key components:
  • 1) LLM Connector - Bridges the NPU service's Web API with the LlamaIndex-based RAG pipeline.
  • 2) LlamaIndex RAG Pipeline - Includes a query engine and vector memory, which processes and stores relevant external information.
  • 3) Agent Web Server - Connects to the GAIA UI via WebSocket, enabling user interaction.
On the right side of the figure, GAIA acts as an AI-powered agent that retrieves and processes data. It vectorizes external content (e.g., GitHub, YouTube, text files) and stores it in a local vector index. When a user submits a query, the following process occurs:
  • 1) The query is sent to GAIA, where it is transformed into an embedding vector.
  • 2) The vectorized query is used to retrieve relevant context from the indexed data.
  • 3) The retrieved context is passed to the web service, where it is embedded into the LLM's prompt.
  • 4) The LLM generates a response, which is streamed back through the GAIA web service and displayed in the UI.
This process ensures that user queries are enhanced with relevant context before being processed by the LLM, improving response accuracy and relevance. The final answer is delivered to the user in real-time through the UI.

Benefits of Running LLMs Locally
Running LLMs locally on the NPU offers several benefits:
  • Enhanced privacy, as no data needs to leave your machine. This eliminates the need to send sensitive information to the cloud, greatly enhancing data privacy and security while still delivering high-performance AI capabilities.
  • Reduced latency, since there's no need to communicate with the cloud.
  • Optimized performance with the NPU, leading to faster response times and lower power consumption.
Comparing NPU and iGPU
Running GAIA on the NPU results in improved performance for AI-specific tasks, as it is designed for inference workloads. Beginning with Ryzen AI Software Release 1.3, there is hybrid support for deploying quantized LLMs that utilize both the NPU and the iGPU. By using both components, each can be applied to the tasks and operations they are optimized for.

Applications and Industries
This setup could benefit industries that require high performance and privacy, such as healthcare, finance, and enterprise applications where data privacy is critical. It can also be applied in fields like content creation and customer service automation, where generative AI models are becoming essential. Lastly, it helps industries without Wi-Fi to send data to the cloud and receive responses, as all the processing is done locally.

Conclusion
In conclusion, GAIA, an open-source AMD application, uses the power of the Ryzen AI NPU to deliver efficient, private, and high-performance LLMs. By running LLMs locally, GAIA ensures enhanced privacy, reduced latency, and optimized performance, making it ideal for industries that prioritize data security and rapid response times.


Ready to try GAIA yourself? Our video provides a brief overview and installation demo of GAIA.

Check out and contribute to the GAIA repo at github.com/amd/gaia. For feedback or questions, please reach out to us at GAIA@amd.com.
Source: AMD Developer Blog
Add your own comment

22 Comments on AMD Introduces GAIA - an Open-Source Project That Runs Local LLMs on Ryzen AI NPUs

#1
john_
Nice. In an era where AI is everywhere, AMD needs to not just follow, but even start offering more options to it's customers. They need it to stay competitive.
1) GAIA Installer - this will run on any Windows PC; however, performance may be slower.
That's also a good choice from them. They don't offer just a Ryzen AI optimized solution in an effort to use it as a marketing tool.
Posted on Reply
#3
RejZoR
lexluthermiesterOpen Source is good IF it's actually open.
And not like OpenAi where "open" part is in air quotes...
Posted on Reply
#4
MCJAxolotl7
lexluthermiesterOpen Source is good IF it's actually open.
Can confirm, is open. Went to AMD's GitHub, found it immediately.
github.com/amd/gaia
Posted on Reply
#7
Tomorrow
Finally utilizing "dark silicon" NPU's that most of the time sit idle...
Posted on Reply
#8
R0H1T
Next up ~ Nvidia probably launches Cap'n planet :laugh:

Posted on Reply
#9
blinnbanir
I Asked Co-pilot what AI I could use to mine Bitcoin and it actually gave me 3 choices. To me AI is nothing more than an Enclyocpedia (Co Pilot). I guess this means it will do it faster. The next question I might ask is how to turn off Windows telemetry hehe.
Posted on Reply
#10
Cheeseball
Not a Potato
R0H1TNext up ~ Nvidia probably launches Cap'n planet :laugh:

NVIDIA's version of Captain Planet:

Posted on Reply
#11
r.h.p
CheeseballNVIDIA's version of Captain Planet:

THAT WAS FRAKIN GOOD :roll:
Posted on Reply
#12
Lesha
Are Radeon GPUs supported?
Posted on Reply
#13
NoLoihi
AI Playground 2.0, released the 22nd of July 2024. It had only supported Intel, it seems like, whereas this claims to run on all.
game.intel.com/us/stories/introducing-ai-playground/

www.igorslab.de/en/intel-presents-its-new-ai-playground-off-to-the-world-of-ki/
hothardware.com/news/intel-ai-playground-released
www.tomshardware.com/tech-industry/artificial-intelligence/intel-creates-ai-playground-app-for-local-ai-computing-lunar-lake-support-added-in-the-latest-version
github.com/intel/AI-Playground
T0@stRunning GAIA on the NPU results in improved performance for AI-specific tasks, as it is designed for inference workloads.
Isn’t that wrong? As far as I’m aware, the GPU plainly has enough raw power to beat the NPU, even in inference. (I cannot find numbers for that.)
Posted on Reply
#14
Cheeseball
Not a Potato
LeshaAre Radeon GPUs supported?
No, this is aimed specifically at the NPU and iGPU in the Strix Point APUs. If you have a 7000 or 9000 series GPU then you might as well run LM Studio directly.
NoLoihiIsn’t that wrong? As far as I’m aware, the GPU plainly has enough raw power to beat the NPU, even in inference. (I cannot find numbers for that.)
That statement doesn't imply that the NPU has improved performance over a GPU at all. It means that instead of using either the CPU or GPU to run AI-related workloads, you can now utilize the NPU with GAIA and have the other two processors free for other tasks.
Posted on Reply
#15
igormp
NoLoihiIsn’t that wrong? As far as I’m aware, the GPU plainly has enough raw power to beat the NPU, even in inference. (I cannot find numbers for that.)
This piece of software is meant to run smaller quantized models at INT4. Afaik, rdna 3.5 lacks support for that data type, and thus the NPU will indeed be faster.
Posted on Reply
#16
NoLoihi
igormpThis piece of software is meant to run smaller quantized models at INT4. Afaik, rdna 3.5 lacks support for that data type, and thus the NPU will indeed be faster.
Here it mentions RDNA 3.5 in Strix Point being able to go down to INT4, but I don’t understand enough to be sure about its impact: chipsandcheese.com/p/lunar-lakes-igpu-debut-of-intels
(I find it hard to find info on this, even though I’d say I’ve given it a good search. Maybe I’m too unexperienced, maybe Google is being dumb; also not on top of my game and heading to sleep soon.)
CheeseballThat statement doesn't imply that the NPU has improved performance over a GPU at all. It means that instead of using either the CPU or GPU to run AI-related workloads, you can now utilize the NPU with GAIA and have the other two processors free for other tasks.
Oh, that’s a very charitable reading of yours. Usually performance is about the item at hand (and more often than not, isolated benchmarks), had they meant whole-system performance, they should have said so.
Posted on Reply
#17
Cheeseball
Not a Potato
NoLoihiOh, that’s a very charitable reading of yours. Usually performance is about the item at hand (and more often than not, isolated benchmarks), had they meant whole-system performance, they should have said so.
It is though. The item at hand is just the NPU. The GPU was not mentioned in that statement and thus is not the one being compared to.

If you were inferring that a modern GPU is better, then generally thats true just by memory bandwidth limitation alone.

The point is the NPU part of the newer APUs can now be used instead of just being a CoPilot+ selling point. 50 TOPS is nothing compared to a RTX 4060 (242 TOPS by itself), but at least we have access to it now.
Posted on Reply
#18
regs
Made in Python...
Posted on Reply
#19
cal5582
wake me up when they integrate the npu into the big desktop processors.
Posted on Reply
#20
igormp
NoLoihiHere it mentions RDNA 3.5 in Strix Point being able to go down to INT4, but I don’t understand enough to be sure about its impact: chipsandcheese.com/p/lunar-lakes-igpu-debut-of-intels
(I find it hard to find info on this, even though I’d say I’ve given it a good search. Maybe I’m too unexperienced, maybe Google is being dumb; also not on top of my game and heading to sleep soon.)
Oh, I stand corrected, thanks. The ISA reference does mention INT4 support as well for WMMA:

www.amd.com/content/dam/amd/en/documents/radeon-tech-docs/instruction-set-architectures/rdna35_instruction_set_architecture.pdf

Still, given how those WMMA instructions use the regular vector units on RDNA3.5 and older, I believe the NPU has a higher rate when processing those, but I couldn't find the proper numbers after a quick google.
regsMade in Python...
Most of the AI stack out there is written in python, I don't see why this would be any different.
Posted on Reply
#21
trsttte
regsMade in Python...
Maybe you missed it, but nowadays python is pretty fast if you know what to do ;)
Posted on Reply
#22
OSdevr
LeshaAre Radeon GPUs supported?
There's already plenty of LLM AI software that can run on Radeon GPUs much of which is open source. Ollama is a common backend, Msty is a particularly good free (albeit closed source) interface that can use it. Really the only thing notable about this is that it can use the Rysen AI's NPU as there's plenty of fully open source alternatives at this point.
Posted on Reply
Add your own comment
Mar 25th, 2025 02:50 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts