Building a RAG System with Ollama and LanceDB: A Comprehensive Tutorial

DewJan 12, 2025

This tutorial walks through building a Retrieval-Augmented Generation (RAG) system for BBC News data using Ollama for embeddings and language modeling, and LanceDB for vector storage.

System Architecture

The system consists of several key components:

LLM Interface: An async interface for large language models
Embedder: Handles document and query embedding
Vector Store: Manages storage and retrieval of embedded documents
Component Factory: Creates instances of the above components
Main RAG System: Orchestrates the entire pipeline

Prerequisites

Python 3.7+
Ollama running locally (default: http://localhost:11434)
Required packages:
httpx
pandas
lancedb
pydantic

Component Breakdown

1. LLM Implementation (ollama.py)

The AsyncOllamaLLM class provides an async interface to Ollama’s API:

class AsyncOllamaLLM(LLM):
    def __init__(self, model_name: str = "llama3.1", base_url: str = "http://localhost:11434"):
        self.model_name = model_name
        self.base_url = base_url
        self.client = httpx.AsyncClient()

Key features:

Async HTTP client for API communication
Support for both streaming and non-streaming generation
Context manager for proper resource cleanup

2. Embedder Implementation (ollama_embedder.py)

The AsyncOllamaEmbedder handles document embedding:

class AsyncOllamaEmbedder(Embedder):
    def __init__(self, model_name: str = "mxbai-embed-large", base_url: str = "http://localhost:11434"):
        self.model_name = model_name
        self.base_url = base_url
        self.client = httpx.AsyncClient()

Features:

Integration with LanceDB’s embedding registry
Async document and query embedding
Dimension information for vector storage

3. Vector Store Implementation (lancedb_store.py)

The AsyncLanceDBStore manages embedded documents:

class AsyncLanceDBStore(VectorStore):
    def __init__(self, embedder, db_path: str = "./lancedb", table_name: str = "documents"):
        self.db = lancedb.connect(db_path)
        self.table_name = table_name

Key capabilities:

Async document storage with metadata
Vector similarity search
Thread-safe operations with async locks

Setting Up the RAG System

First, create a configuration dictionary:

config = {
    "llm": {
        "type": "ollama",
        "model_name": "llama3.2"
    },
    "embedder": {
        "type": "ollama",
        "model_name": "nomic-embed-text"
    },
    "vector_store": {
        "type": "lancedb",
        "db_path": "./data/lancedb",
        "table_name": "documents"
    }
}

Initialize the RAG system:

rag = BBCNewsRAG(config)
await rag.initialize()

Ingest your data:

df = pd.read_csv('data/data.txt')
await rag.ingest_data(df)

Query the system:

response = await rag.query(
    "Who is Aarin Chiekrie?",
    system_prompt="You are a helpful assistant that provides accurate information based on the news articles."
)

Best Practices

Resource Management

Always use async context managers or explicit cleanup
Close connections properly using await rag.close()

Error Handling

Implement proper error handling for API calls
Check for missing or malformed data

Configuration

Keep configuration separate from code
Use environment variables for sensitive information

Performance

Use connection pooling for HTTP clients
Implement caching where appropriate
Use async operations for I/O-bound tasks

Common Pitfalls

Not properly closing async resources
Forgetting to handle API rate limits
Missing error handling for embedding operations
Not considering thread safety in vector store operations

Advanced Usage

Custom Embedding Models

You can extend the system to use different embedding models:

class CustomEmbedder(Embedder):
    async def embed_documents(self, documents: List[str]) -> List[List[float]]:
        # Your custom embedding logic here
        pass

Custom Vector Stores

Implement different vector stores by extending the base class:

class CustomVectorStore(VectorStore):
    async def store_embeddings(self, documents, embeddings, metadata=None):
        # Your custom storage logic here
        pass

Conclusion

This RAG system provides a robust foundation for building document-based question-answering systems. Its modular design allows for easy extension and customization, while the async implementation ensures efficient resource usage.

For production deployment, consider:

Implementing proper logging
Adding monitoring and metrics
Setting up proper error handling and retries
Implementing caching mechanisms
Adding authentication and authorization

The system can be extended to handle different document types, embedding models, and vector stores by implementing the appropriate interfaces.

Source code

https://gist.github.com/dewmal/e8f0296bd9743d3fa9dd5841a65d3cdd

Hi, I’m Dew

All My Articles