Category: AI

Building a local AI assistant with user context

We use Ollama and Chroma DB to build a personalized assistant from scraped web content

25 minute read

In my last post, I explored the concept of Retrieval-Augmented Generation (RAG) to enable a locally running generative AI model to access and incorporate new information. To achieve this, I used hardcoded documents as context, which were then embedded as vectors and persisted into Chroma DB. These vectors are used during inference to use as context for a local LLM chatbot. But using a few hardcoded sentences is hardly elegant or particularly exciting. It’s alright for educational purposes, but that’s as far as it goes. However, if we need to build a minimally useful system, we need to be more sophisticated than this. In this new post, I set out to create a local Gaia Sky assistant by using the Gaia Sky documentation site and the Gaia Sky homepage as supplementary information, and leveraging Ollama to generate context-aware responses. So, let’s dive into the topic and explain how it all works.

The source code used in this post is available in this repository.

Local LLM with Retrieval-Augmented Generation

Let’s build a simple RAG application using a local LLM through Ollama.

11 minute read

Edit (2025-03-26): Added some words about next steps in conclusion.

Edit (2025-03-25): I re-ran the example with a clean database and the results are better. I also cleaned up the code a bit.

Over the past few months I have been running local LLMs on my computer with various results, ranging from ‘unusable’ to ‘pretty good’. Local LLMs are becoming more powerful, but they don’t inherently “know” everything. They’re trained on massive datasets, but those are typically static. To make LLMs truly useful for specific tasks, you often need to augment them with your own data–data that’s constantly changing, specific to your domain, or not included in the LLM’s original training. The technique known as RAG aims to bridge this problem by embedding context information into a vector database that is later used to provide context to the LLM, so that it can expand its knowledge beyond the original training dataset. In this short article, we’ll see how to build a very primitive local AI chatbot powered by Ollama with RAG capabilities.

The source code used in this post is available here.

Website design by myself. See the privacy policy.
Content licensed under CC-BY-NC-SA 4.0 .