RAG Stack Tutorial | Complete Guide with Detailed Explanations

01 Core Concepts

📖 Before We Start: The Big Picture

Imagine building a smart assistant that answers questions about YOUR documents, code, or data. That's what a RAG (Retrieval-Augmented Generation) system does!

🎭 Think of it like a librarian: When you ask a question, the librarian doesn't guess - they find relevant books, read the sections, then answer based on actual information!

🦙

Ollama

Local LLM Runner

🤔 Why? Run AI models on YOUR computer - free, private, no API costs!

🔑 Key Points:

▸ Local models: Llama 2, Mistral, CodeLlama run on your machine
▸ Free: No API keys, no usage limits, no data leaving your computer
▸ Simple: One command: ollama run llama2
▸ Compatible: Same API format as OpenAI - easy to switch

🔮

ChromaDB

Vector Database

🤔 Why? Find similar content by meaning, not just matching words!

🔑 Key Points:

▸ Embeddings: Converts text → numbers that capture meaning
▸ Similarity search: "API endpoint" finds "REST route" too!
▸ Metadata: Store tags, dates, sources with each document
▸ Fast: Quickly find relevant docs from thousands

🔗

LangChain

AI Framework

🤔 Why? Connect all the pieces (search → format → AI) without writing everything from scratch!

🔑 Key Points:

▸ Chains: Link steps together (search → prompt → generate)
▸ Agents: Let AI decide which tools to use
▸ Retrievers: Built-in ChromaDB support
▸ Integrations: Works with Ollama, OpenAI, 100+ tools

🧠

RAG

Design Pattern

🤔 Why? AI can "hallucinate" (make stuff up). RAG gives it REAL facts first!

🔑 The RAG Process:

RRetrieve: Search ChromaDB for related docs
AAugment: Add docs to the AI prompt as context
GGenerate: AI answers using the real data!

02 Docker Setup

🐳 Why Docker? What is Docker Compose?

Our app needs 4 services: PostgreSQL, Ollama, ChromaDB, and FastAPI. Installing each manually = nightmare! Docker packages everything into containers.

🐳 Docker

Packages apps into "containers" - self-contained boxes with everything needed to run.

📦 Docker Compose

Define multiple containers in one file. One command starts everything!

docker-compose.yml

version: '3.8'

services:
  postgres:
    image: postgres:15-alpine
    environment:
      POSTGRES_USER: raguser
      POSTGRES_PASSWORD: ragpass123
      POSTGRES_DB: ragdb
    ports:
      - "5432:5432"
    volumes:
      - postgres_data:/var/lib/postgresql/data

  ollama:
    image: ollama/ollama:latest
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama

  chromadb:
    image: chromadb/chroma:latest
    ports:
      - "8000:8000"
    volumes:
      - chroma_data:/chroma/chroma
    environment:
      - IS_PERSISTENT=TRUE

  api:
    build: ./app
    ports:
      - "8080:8080"
    depends_on:
      - postgres
      - chromadb
      - ollama
    environment:
      - DATABASE_URL=postgresql://raguser:ragpass123@postgres:5432/ragdb
      - CHROMA_HOST=chromadb
      - OLLAMA_HOST=ollama

volumes:
  postgres_data:
  ollama_data:
  chroma_data:

📝 Line-by-Line Explanation

version: '3.8'

What: Docker Compose file format version. 3.8 is modern and stable.

🐘 PostgreSQL Service

image: postgres:15-alpine

image: Pre-built container to use | postgres: Official PostgreSQL | 15: Version 15 | alpine: Lightweight Linux (smaller download)

environment: POSTGRES_USER, PASSWORD, DB

Environment variables that configure the database. PostgreSQL reads these on startup to create the user and database automatically!

ports: - "5432:5432"

Format: "YOUR_COMPUTER:CONTAINER" - Maps port 5432 on your machine to 5432 inside Docker. Now you can connect from outside!

volumes: - postgres_data:/var/lib/postgresql/data

Critical! Volumes persist data when containers stop. Without this, your database would be EMPTY on every restart!

🦙 Ollama Service

image: ollama/ollama:latest

Official Ollama image. :latest = always get newest version.

ports: - "11434:11434"

Ollama's API port. Our FastAPI app will call this to generate AI responses.

volumes: - ollama_data:/root/.ollama

Stores downloaded AI models (~4GB each). Without this, you'd re-download every restart!

🔮 ChromaDB Service

environment: - IS_PERSISTENT=TRUE

Super important! Without this, ChromaDB runs in memory only = all embeddings lost on restart!

⚡ FastAPI Service

build: ./app

Instead of using a pre-built image, build our own from a Dockerfile in the ./app folder. This is YOUR code!

depends_on: - postgres - chromadb - ollama

Start order: Docker starts these services BEFORE our API. Ensures databases are ready!

environment: DATABASE_URL=postgresql://...@postgres:5432/ragdb

Connection strings use service names as hostnames! Inside Docker, "postgres" resolves to the PostgreSQL container's IP.

⚡ Quick Start Commands

# Start everything (first run downloads images ~5-10 min)

docker-compose up -d

# Download AI model into Ollama

docker exec -it ollama ollama pull llama2

# Check if everything is running

docker-compose ps

# View API logs

docker-compose logs -f api

# Stop everything

docker-compose down

03 FastAPI CRUD Operations

🎯 What is CRUD?

Almost every app needs these 4 operations to manage data:

C

Create
(POST)

R

Read
(GET)

U

Update
(PUT)

D

Delete
(DELETE)

models.py - Database Models

from sqlalchemy import Column, Integer, String, Text, DateTime
from sqlalchemy.ext.declarative import declarative_base
from pydantic import BaseModel
from typing import Optional
from datetime import datetime

Base = declarative_base()

# Database Table Definition
class CodeSnippet(Base):
    __tablename__ = "code_snippets"
    
    id = Column(Integer, primary_key=True, index=True)
    title = Column(String(255), nullable=False)
    description = Column(Text)
    code = Column(Text, nullable=False)
    language = Column(String(50), default="python")
    tags = Column(String(255))
    created_at = Column(DateTime, default=datetime.utcnow)

# API Input Schema (what users send)
class SnippetCreate(BaseModel):
    title: str
    description: Optional[str] = None
    code: str
    language: str = "python"
    tags: Optional[str] = None

# API Output Schema (what API returns)
class SnippetResponse(BaseModel):
    id: int
    title: str
    description: Optional[str]
    code: str
    language: str
    tags: Optional[str]
    created_at: datetime
    
    class Config:
        from_attributes = True

📝 models.py Explained

🗄️ SQLAlchemy Model (Database Table)

class CodeSnippet(Base):

This Python class = a database table. Each instance = one row.

__tablename__ = "code_snippets"

Actual table name in PostgreSQL. Convention: lowercase_with_underscores

id = Column(Integer, primary_key=True, index=True)

primary_key: Unique ID, auto-increments (1,2,3...) | index: Faster lookups

title = Column(String(255), nullable=False)

String(255): Up to 255 chars | nullable=False: REQUIRED field

created_at = Column(DateTime, default=datetime.utcnow)

default: Auto-fills with current time when row is created!

✅ Pydantic Schemas (API Validation)

class SnippetCreate(BaseModel):

Validates data when creating. User doesn't provide id (auto-generated).

title: str

Required string field. No default = must provide!

description: Optional[str] = None

Optional: Can be string OR None | = None: Default if not provided

class Config: from_attributes = True

Magic! Lets Pydantic read directly from SQLAlchemy objects.

main.py - FastAPI Endpoints

from fastapi import FastAPI, Depends, HTTPException
from sqlalchemy.orm import Session
from database import get_db, engine
from models import Base, SnippetCreate, SnippetResponse, CodeSnippet

Base.metadata.create_all(bind=engine)  # Create tables!

app = FastAPI(title="RAG Snippets API")

# CREATE - Add new snippet
@app.post("/snippets/", response_model=SnippetResponse)
def create_snippet(snippet: SnippetCreate, db: Session = Depends(get_db)):
    db_snippet = CodeSnippet(**snippet.model_dump())
    db.add(db_snippet)
    db.commit()
    db.refresh(db_snippet)
    return db_snippet

# READ - Get all snippets
@app.get("/snippets/", response_model=list[SnippetResponse])
def read_snippets(skip: int = 0, limit: int = 100, db: Session = Depends(get_db)):
    return db.query(CodeSnippet).offset(skip).limit(limit).all()

# READ - Get one snippet
@app.get("/snippets/{snippet_id}", response_model=SnippetResponse)
def read_snippet(snippet_id: int, db: Session = Depends(get_db)):
    snippet = db.query(CodeSnippet).filter(CodeSnippet.id == snippet_id).first()
    if not snippet:
        raise HTTPException(status_code=404, detail="Not found")
    return snippet

# DELETE - Remove snippet
@app.delete("/snippets/{snippet_id}")
def delete_snippet(snippet_id: int, db: Session = Depends(get_db)):
    snippet = db.query(CodeSnippet).filter(CodeSnippet.id == snippet_id).first()
    if not snippet:
        raise HTTPException(status_code=404, detail="Not found")
    db.delete(snippet)
    db.commit()
    return {"message": "Deleted!"}

📝 main.py Explained

Base.metadata.create_all(bind=engine)

Magic line! Looks at all your models and creates actual database tables. Run once on startup.

@app.post("/snippets/", response_model=SnippetResponse)

@app.post: Handles POST requests | response_model: Auto-formats output to match SnippetResponse

snippet: SnippetCreate

FastAPI auto: 1) Reads JSON body 2) Validates against schema 3) Returns 422 if invalid 4) Passes to your function

db: Session = Depends(get_db)

Dependency Injection! Automatically creates fresh DB connection per request, cleans up after.

db.add() → db.commit() → db.refresh()

add: Stage for saving | commit: Actually save | refresh: Get auto-generated values (id, created_at)

raise HTTPException(status_code=404, detail="Not found")

Return error response. Common codes: 400 (bad request), 404 (not found), 500 (server error)

04 ChromaDB Integration

🔮 How Embeddings Work

ChromaDB stores text as embeddings - lists of numbers that represent meaning. Similar meanings = similar numbers!

🎭 Analogy: Organizing books not by title, but by what they're about. "Cooking pasta" is near "Italian recipes" even though words differ - same meaning!

chroma_service.py

import chromadb
import os

class ChromaService:
    def __init__(self):
        # Connect to ChromaDB server
        self.client = chromadb.HttpClient(
            host=os.getenv("CHROMA_HOST", "localhost"),
            port=8000
        )
        # Get or create collection
        self.collection = self.client.get_or_create_collection(
            name="code_snippets"
        )
    
    def add_snippet(self, snippet_id: int, title: str, code: str, 
                      description: str = "", tags: str = ""):
        # Combine all text for embedding
        document = f"{title}\n{description}\n{code}"
        
        self.collection.add(
            documents=[document],           # Text to embed
            metadatas=[{"title": title, "tags": tags}],  # Extra info
            ids=[f"snippet_{snippet_id}"]   # Unique ID
        )
    
    def search_snippets(self, query: str, n_results: int = 5):
        # Find similar documents
        results = self.collection.query(
            query_texts=[query],    # Your question
            n_results=n_results     # Top N matches
        )
        return results
    
    def delete_snippet(self, snippet_id: int):
        self.collection.delete(ids=[f"snippet_{snippet_id}"])

📝 ChromaDB Explained

self.client = chromadb.HttpClient(host=..., port=8000)

HttpClient: Connects to ChromaDB running in Docker. Uses service name from docker-compose!

self.collection = self.client.get_or_create_collection(name="code_snippets")

Collection: Like a database table. get_or_create: Gets existing or creates new. Safe to call multiple times!

self.collection.add(documents=[...], metadatas=[...], ids=[...])

documents: Text to embed | metadatas: Extra info (not embedded) | ids: Unique identifiers for updates/deletes

self.collection.query(query_texts=[query], n_results=5)

Query → Embedding → Find nearest vectors → Return top 5 most similar documents!

05 LangChain RAG Chain

🔗 The RAG Pipeline

LangChain ties everything together: takes question → retrieves docs → formats prompt → calls AI → returns answer.

Question → Retriever → Prompt → LLM → Answer!

langchain_service.py

from langchain_community.llms import Ollama
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import OllamaEmbeddings
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
import chromadb, os

class RAGService:
    def __init__(self):
        # 1. Initialize Ollama LLM
        self.llm = Ollama(
            model="llama2",
            base_url=f"http://{os.getenv('OLLAMA_HOST', 'localhost')}:11434"
        )
        
        # 2. Initialize embeddings
        self.embeddings = OllamaEmbeddings(
            model="llama2",
            base_url=f"http://{os.getenv('OLLAMA_HOST', 'localhost')}:11434"
        )
        
        # 3. Connect to ChromaDB
        self.vectorstore = Chroma(
            collection_name="code_snippets",
            embedding_function=self.embeddings,
            client=chromadb.HttpClient(
                host=os.getenv("CHROMA_HOST", "localhost"), port=8000
            )
        )
        
        # 4. Create retriever
        self.retriever = self.vectorstore.as_retriever(search_kwargs={"k": 3})
        
        # 5. Custom prompt template
        self.prompt = PromptTemplate(
            input_variables=["context", "question"],
            template="""Use these code snippets to answer:

Context: {context}

Question: {question}

Answer: """
        )
        
        # 6. Build the RAG chain
        self.chain = RetrievalQA.from_chain_type(
            llm=self.llm,
            chain_type="stuff",
            retriever=self.retriever,
            chain_type_kwargs={"prompt": self.prompt},
            return_source_documents=True
        )
    
    def query(self, question: str):
        result = self.chain.invoke({"query": question})
        return {
            "answer": result["result"],
            "sources": [doc.metadata for doc in result["source_documents"]]
        }

📝 LangChain Explained

self.llm = Ollama(model="llama2", base_url="...")

LangChain wrapper for Ollama. Specify model and server URL.

self.embeddings = OllamaEmbeddings(model="llama2")

Creates embeddings using Ollama. Same model = consistent results.

self.retriever = self.vectorstore.as_retriever(search_kwargs={"k": 3})

Converts vector store to "retriever". k=3: Return top 3 similar docs.

PromptTemplate(input_variables=["context", "question"], template="...")

{context}: Replaced with retrieved docs | {question}: User's question

RetrievalQA.from_chain_type(chain_type="stuff", return_source_documents=True)

stuff: Put all docs in prompt | return_source_documents: Include which docs were used (for citations!)

self.chain.invoke({"query": question})

Runs entire chain: Question → Retrieve → Format → Generate → Return answer + sources!

06 JSON Playground

🎮 Test API Requests

See what JSON the API expects and returns! Click an operation to see example inputs/outputs.

→ Request

← Response 200 OK

07 Upload Training Snippets

📤 Add Snippet to ChromaDB

Title

Description

Tags (comma-separated)

Code

📚 Stored Snippets 0

No snippets yet. Add your first!

🔍 Query Knowledge Base

📖 Quick Glossary

API

Way for programs to talk. Our FastAPI server is an API.

Embedding

Numbers representing text meaning. Similar = similar numbers.

Vector Database

DB for storing/searching embeddings. ChromaDB is one.

LLM

Large Language Model - AI that understands/generates text.

Container

Packaged app with everything needed to run. Docker creates these.

ORM

Object-Relational Mapping - use Python objects instead of SQL.

📚 Table of Contents

Core Concepts

Docker Setup

FastAPI CRUD

ChromaDB

LangChain RAG

Playground

01 Core Concepts

📖 Before We Start: The Big Picture

Ollama

🔑 Key Points:

ChromaDB

🔑 Key Points:

LangChain

🔑 Key Points:

RAG

🔑 The RAG Process:

02 Docker Setup

🐳 Why Docker? What is Docker Compose?

🐳 Docker

📦 Docker Compose

📝 Line-by-Line Explanation

🐘 PostgreSQL Service

🦙 Ollama Service

🔮 ChromaDB Service

⚡ FastAPI Service

⚡ Quick Start Commands

03 FastAPI CRUD Operations

🎯 What is CRUD?

📝 models.py Explained

🗄️ SQLAlchemy Model (Database Table)

✅ Pydantic Schemas (API Validation)

📝 main.py Explained

04 ChromaDB Integration

🔮 How Embeddings Work

📝 ChromaDB Explained

05 LangChain RAG Chain

🔗 The RAG Pipeline

📝 LangChain Explained

06 JSON Playground

🎮 Test API Requests

→ Request

← Response 200 OK

07 Upload Training Snippets

📤 Add Snippet to ChromaDB

📚 Stored Snippets 0

🔍 Query Knowledge Base

Results:

📖 Quick Glossary

API

Embedding

Vector Database

LLM

Container

ORM