📚 Table of Contents
Core Concepts
Ollama, ChromaDB, LangChain, RAG explained simply
02Docker Setup
Complete docker-compose with explanations
03FastAPI CRUD
Models, schemas, and endpoints explained
04ChromaDB
Vector storage and similarity search
05LangChain RAG
Building the AI pipeline
06Playground
Test JSON inputs/outputs interactively
01 Core Concepts
📖 Before We Start: The Big Picture
Imagine building a smart assistant that answers questions about YOUR documents, code, or data. That's what a RAG (Retrieval-Augmented Generation) system does!
🎭 Think of it like a librarian: When you ask a question, the librarian doesn't guess - they find relevant books, read the sections, then answer based on actual information!
Ollama
Local LLM Runner🤔 Why? Run AI models on YOUR computer - free, private, no API costs!
🔑 Key Points:
- ▸ Local models: Llama 2, Mistral, CodeLlama run on your machine
- ▸ Free: No API keys, no usage limits, no data leaving your computer
- ▸ Simple: One command:
ollama run llama2 - ▸ Compatible: Same API format as OpenAI - easy to switch
ChromaDB
Vector Database🤔 Why? Find similar content by meaning, not just matching words!
🔑 Key Points:
- ▸ Embeddings: Converts text → numbers that capture meaning
- ▸ Similarity search: "API endpoint" finds "REST route" too!
- ▸ Metadata: Store tags, dates, sources with each document
- ▸ Fast: Quickly find relevant docs from thousands
LangChain
AI Framework🤔 Why? Connect all the pieces (search → format → AI) without writing everything from scratch!
🔑 Key Points:
- ▸ Chains: Link steps together (search → prompt → generate)
- ▸ Agents: Let AI decide which tools to use
- ▸ Retrievers: Built-in ChromaDB support
- ▸ Integrations: Works with Ollama, OpenAI, 100+ tools
RAG
Design Pattern🤔 Why? AI can "hallucinate" (make stuff up). RAG gives it REAL facts first!
🔑 The RAG Process:
- RRetrieve: Search ChromaDB for related docs
- AAugment: Add docs to the AI prompt as context
- GGenerate: AI answers using the real data!
02 Docker Setup
🐳 Why Docker? What is Docker Compose?
Our app needs 4 services: PostgreSQL, Ollama, ChromaDB, and FastAPI. Installing each manually = nightmare! Docker packages everything into containers.
🐳 Docker
Packages apps into "containers" - self-contained boxes with everything needed to run.
📦 Docker Compose
Define multiple containers in one file. One command starts everything!
version: '3.8'
services:
postgres:
image: postgres:15-alpine
environment:
POSTGRES_USER: raguser
POSTGRES_PASSWORD: ragpass123
POSTGRES_DB: ragdb
ports:
- "5432:5432"
volumes:
- postgres_data:/var/lib/postgresql/data
ollama:
image: ollama/ollama:latest
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
chromadb:
image: chromadb/chroma:latest
ports:
- "8000:8000"
volumes:
- chroma_data:/chroma/chroma
environment:
- IS_PERSISTENT=TRUE
api:
build: ./app
ports:
- "8080:8080"
depends_on:
- postgres
- chromadb
- ollama
environment:
- DATABASE_URL=postgresql://raguser:ragpass123@postgres:5432/ragdb
- CHROMA_HOST=chromadb
- OLLAMA_HOST=ollama
volumes:
postgres_data:
ollama_data:
chroma_data:
📝 Line-by-Line Explanation
version: '3.8'
What: Docker Compose file format version. 3.8 is modern and stable.
🐘 PostgreSQL Service
image: postgres:15-alpine
image: Pre-built container to use | postgres: Official PostgreSQL | 15: Version 15 | alpine: Lightweight Linux (smaller download)
environment: POSTGRES_USER, PASSWORD, DB
Environment variables that configure the database. PostgreSQL reads these on startup to create the user and database automatically!
ports: - "5432:5432"
Format: "YOUR_COMPUTER:CONTAINER" - Maps port 5432 on your machine to 5432 inside Docker. Now you can connect from outside!
volumes: - postgres_data:/var/lib/postgresql/data
Critical! Volumes persist data when containers stop. Without this, your database would be EMPTY on every restart!
🦙 Ollama Service
image: ollama/ollama:latest
Official Ollama image. :latest = always get newest version.
ports: - "11434:11434"
Ollama's API port. Our FastAPI app will call this to generate AI responses.
volumes: - ollama_data:/root/.ollama
Stores downloaded AI models (~4GB each). Without this, you'd re-download every restart!
🔮 ChromaDB Service
environment: - IS_PERSISTENT=TRUE
Super important! Without this, ChromaDB runs in memory only = all embeddings lost on restart!
⚡ FastAPI Service
build: ./app
Instead of using a pre-built image, build our own from a Dockerfile in the ./app folder. This is YOUR code!
depends_on: - postgres - chromadb - ollama
Start order: Docker starts these services BEFORE our API. Ensures databases are ready!
environment: DATABASE_URL=postgresql://...@postgres:5432/ragdb
Connection strings use service names as hostnames! Inside Docker, "postgres" resolves to the PostgreSQL container's IP.
⚡ Quick Start Commands
# Start everything (first run downloads images ~5-10 min)
docker-compose up -d
# Download AI model into Ollama
docker exec -it ollama ollama pull llama2
# Check if everything is running
docker-compose ps
# View API logs
docker-compose logs -f api
# Stop everything
docker-compose down
03 FastAPI CRUD Operations
🎯 What is CRUD?
Almost every app needs these 4 operations to manage data:
Create
(POST)
Read
(GET)
Update
(PUT)
Delete
(DELETE)
from sqlalchemy import Column, Integer, String, Text, DateTime
from sqlalchemy.ext.declarative import declarative_base
from pydantic import BaseModel
from typing import Optional
from datetime import datetime
Base = declarative_base()
# Database Table Definition
class CodeSnippet(Base):
__tablename__ = "code_snippets"
id = Column(Integer, primary_key=True, index=True)
title = Column(String(255), nullable=False)
description = Column(Text)
code = Column(Text, nullable=False)
language = Column(String(50), default="python")
tags = Column(String(255))
created_at = Column(DateTime, default=datetime.utcnow)
# API Input Schema (what users send)
class SnippetCreate(BaseModel):
title: str
description: Optional[str] = None
code: str
language: str = "python"
tags: Optional[str] = None
# API Output Schema (what API returns)
class SnippetResponse(BaseModel):
id: int
title: str
description: Optional[str]
code: str
language: str
tags: Optional[str]
created_at: datetime
class Config:
from_attributes = True
📝 models.py Explained
🗄️ SQLAlchemy Model (Database Table)
class CodeSnippet(Base):
This Python class = a database table. Each instance = one row.
__tablename__ = "code_snippets"
Actual table name in PostgreSQL. Convention: lowercase_with_underscores
id = Column(Integer, primary_key=True, index=True)
primary_key: Unique ID, auto-increments (1,2,3...) | index: Faster lookups
title = Column(String(255), nullable=False)
String(255): Up to 255 chars | nullable=False: REQUIRED field
created_at = Column(DateTime, default=datetime.utcnow)
default: Auto-fills with current time when row is created!
✅ Pydantic Schemas (API Validation)
class SnippetCreate(BaseModel):
Validates data when creating. User doesn't provide id (auto-generated).
title: str
Required string field. No default = must provide!
description: Optional[str] = None
Optional: Can be string OR None | = None: Default if not provided
class Config: from_attributes = True
Magic! Lets Pydantic read directly from SQLAlchemy objects.
from fastapi import FastAPI, Depends, HTTPException
from sqlalchemy.orm import Session
from database import get_db, engine
from models import Base, SnippetCreate, SnippetResponse, CodeSnippet
Base.metadata.create_all(bind=engine) # Create tables!
app = FastAPI(title="RAG Snippets API")
# CREATE - Add new snippet
@app.post("/snippets/", response_model=SnippetResponse)
def create_snippet(snippet: SnippetCreate, db: Session = Depends(get_db)):
db_snippet = CodeSnippet(**snippet.model_dump())
db.add(db_snippet)
db.commit()
db.refresh(db_snippet)
return db_snippet
# READ - Get all snippets
@app.get("/snippets/", response_model=list[SnippetResponse])
def read_snippets(skip: int = 0, limit: int = 100, db: Session = Depends(get_db)):
return db.query(CodeSnippet).offset(skip).limit(limit).all()
# READ - Get one snippet
@app.get("/snippets/{snippet_id}", response_model=SnippetResponse)
def read_snippet(snippet_id: int, db: Session = Depends(get_db)):
snippet = db.query(CodeSnippet).filter(CodeSnippet.id == snippet_id).first()
if not snippet:
raise HTTPException(status_code=404, detail="Not found")
return snippet
# DELETE - Remove snippet
@app.delete("/snippets/{snippet_id}")
def delete_snippet(snippet_id: int, db: Session = Depends(get_db)):
snippet = db.query(CodeSnippet).filter(CodeSnippet.id == snippet_id).first()
if not snippet:
raise HTTPException(status_code=404, detail="Not found")
db.delete(snippet)
db.commit()
return {"message": "Deleted!"}
📝 main.py Explained
Base.metadata.create_all(bind=engine)
Magic line! Looks at all your models and creates actual database tables. Run once on startup.
@app.post("/snippets/", response_model=SnippetResponse)
@app.post: Handles POST requests | response_model: Auto-formats output to match SnippetResponse
snippet: SnippetCreate
FastAPI auto: 1) Reads JSON body 2) Validates against schema 3) Returns 422 if invalid 4) Passes to your function
db: Session = Depends(get_db)
Dependency Injection! Automatically creates fresh DB connection per request, cleans up after.
db.add() → db.commit() → db.refresh()
add: Stage for saving | commit: Actually save | refresh: Get auto-generated values (id, created_at)
raise HTTPException(status_code=404, detail="Not found")
Return error response. Common codes: 400 (bad request), 404 (not found), 500 (server error)
04 ChromaDB Integration
🔮 How Embeddings Work
ChromaDB stores text as embeddings - lists of numbers that represent meaning. Similar meanings = similar numbers!
🎭 Analogy: Organizing books not by title, but by what they're about. "Cooking pasta" is near "Italian recipes" even though words differ - same meaning!
import chromadb
import os
class ChromaService:
def __init__(self):
# Connect to ChromaDB server
self.client = chromadb.HttpClient(
host=os.getenv("CHROMA_HOST", "localhost"),
port=8000
)
# Get or create collection
self.collection = self.client.get_or_create_collection(
name="code_snippets"
)
def add_snippet(self, snippet_id: int, title: str, code: str,
description: str = "", tags: str = ""):
# Combine all text for embedding
document = f"{title}\n{description}\n{code}"
self.collection.add(
documents=[document], # Text to embed
metadatas=[{"title": title, "tags": tags}], # Extra info
ids=[f"snippet_{snippet_id}"] # Unique ID
)
def search_snippets(self, query: str, n_results: int = 5):
# Find similar documents
results = self.collection.query(
query_texts=[query], # Your question
n_results=n_results # Top N matches
)
return results
def delete_snippet(self, snippet_id: int):
self.collection.delete(ids=[f"snippet_{snippet_id}"])
📝 ChromaDB Explained
self.client = chromadb.HttpClient(host=..., port=8000)
HttpClient: Connects to ChromaDB running in Docker. Uses service name from docker-compose!
self.collection = self.client.get_or_create_collection(name="code_snippets")
Collection: Like a database table. get_or_create: Gets existing or creates new. Safe to call multiple times!
self.collection.add(documents=[...], metadatas=[...], ids=[...])
documents: Text to embed | metadatas: Extra info (not embedded) | ids: Unique identifiers for updates/deletes
self.collection.query(query_texts=[query], n_results=5)
Query → Embedding → Find nearest vectors → Return top 5 most similar documents!
05 LangChain RAG Chain
🔗 The RAG Pipeline
LangChain ties everything together: takes question → retrieves docs → formats prompt → calls AI → returns answer.
from langchain_community.llms import Ollama
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import OllamaEmbeddings
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
import chromadb, os
class RAGService:
def __init__(self):
# 1. Initialize Ollama LLM
self.llm = Ollama(
model="llama2",
base_url=f"http://{os.getenv('OLLAMA_HOST', 'localhost')}:11434"
)
# 2. Initialize embeddings
self.embeddings = OllamaEmbeddings(
model="llama2",
base_url=f"http://{os.getenv('OLLAMA_HOST', 'localhost')}:11434"
)
# 3. Connect to ChromaDB
self.vectorstore = Chroma(
collection_name="code_snippets",
embedding_function=self.embeddings,
client=chromadb.HttpClient(
host=os.getenv("CHROMA_HOST", "localhost"), port=8000
)
)
# 4. Create retriever
self.retriever = self.vectorstore.as_retriever(search_kwargs={"k": 3})
# 5. Custom prompt template
self.prompt = PromptTemplate(
input_variables=["context", "question"],
template="""Use these code snippets to answer:
Context: {context}
Question: {question}
Answer: """
)
# 6. Build the RAG chain
self.chain = RetrievalQA.from_chain_type(
llm=self.llm,
chain_type="stuff",
retriever=self.retriever,
chain_type_kwargs={"prompt": self.prompt},
return_source_documents=True
)
def query(self, question: str):
result = self.chain.invoke({"query": question})
return {
"answer": result["result"],
"sources": [doc.metadata for doc in result["source_documents"]]
}
📝 LangChain Explained
self.llm = Ollama(model="llama2", base_url="...")
LangChain wrapper for Ollama. Specify model and server URL.
self.embeddings = OllamaEmbeddings(model="llama2")
Creates embeddings using Ollama. Same model = consistent results.
self.retriever = self.vectorstore.as_retriever(search_kwargs={"k": 3})
Converts vector store to "retriever". k=3: Return top 3 similar docs.
PromptTemplate(input_variables=["context", "question"], template="...")
{context}: Replaced with retrieved docs | {question}: User's question
RetrievalQA.from_chain_type(chain_type="stuff", return_source_documents=True)
stuff: Put all docs in prompt | return_source_documents: Include which docs were used (for citations!)
self.chain.invoke({"query": question})
Runs entire chain: Question → Retrieve → Format → Generate → Return answer + sources!
06 JSON Playground
🎮 Test API Requests
See what JSON the API expects and returns! Click an operation to see example inputs/outputs.
→ Request
← Response 200 OK
07 Upload Training Snippets
📤 Add Snippet to ChromaDB
📚 Stored Snippets 0
🔍 Query Knowledge Base
📖 Quick Glossary
API
Way for programs to talk. Our FastAPI server is an API.
Embedding
Numbers representing text meaning. Similar = similar numbers.
Vector Database
DB for storing/searching embeddings. ChromaDB is one.
LLM
Large Language Model - AI that understands/generates text.
Container
Packaged app with everything needed to run. Docker creates these.
ORM
Object-Relational Mapping - use Python objects instead of SQL.