System Overview¶
Beyond Retrieval v2 is a production-grade Retrieval-Augmented Generation (RAG) platform designed for enterprise document intelligence. It provides end-to-end document ingestion, vector-based retrieval, AI-powered chat, and continuous quality evaluation — all within a multi-tenant notebook paradigm.
Architecture Diagram¶
graph TB
subgraph Client
Browser["Web Browser"]
end
subgraph Reverse Proxy
Caddy["Caddy<br/>Auto HTTPS / Static Files / API Proxy"]
end
subgraph Application Layer
Frontend["React SPA<br/>Vite 7 + Tailwind CSS 4"]
Backend["FastAPI Backend<br/>Python 3.12"]
end
subgraph Database Layer
Supabase["Supabase PostgreSQL 15"]
pgvector["pgvector<br/>HNSW Index"]
Storage["Object Storage<br/>Supabase / S3 / Local"]
end
subgraph AI Services
OpenRouter["OpenRouter Gateway<br/>21+ embedding models<br/>23+ LLM models"]
OpenAIDirect["OpenAI Direct<br/>text-embedding-3-small<br/>GPT-4o"]
Ollama["Ollama Local<br/>nomic-embed-text<br/>qwen2.5:1.5b"]
Docling["Docling Parser<br/>PDF + DOCX extraction"]
MistralOCR["Mistral OCR<br/>PDF + DOCX OCR"]
end
subgraph Auth
Clerk["Clerk<br/>RS256 JWT + JWKS"]
end
Browser -->|"HTTPS"| Caddy
Caddy -->|"Static files /srv"| Frontend
Caddy -->|"Reverse proxy /api/*"| Backend
Backend -->|"SQL + RPC"| Supabase
Supabase --- pgvector
Backend -->|"File upload/download"| Storage
Backend -->|"LLM inference + embeddings"| OpenRouter
Backend -->|"LLM inference + embeddings"| OpenAIDirect
Backend -->|"LLM inference + embeddings"| Ollama
Backend -->|"Document parsing"| Docling
Backend -->|"OCR extraction"| MistralOCR
Backend -->|"JWT verification"| Clerk Core Stack¶
| Layer | Technology |
|---|---|
| Frontend | React 19, Vite 7, Tailwind CSS 4 |
| Backend | FastAPI (Python 3.12), Pydantic AI |
| Database | PostgreSQL 15 (Supabase) with pgvector, pg_trgm, pg_net |
| Object Storage | Supabase Storage / S3-compatible / Local filesystem |
| Reverse Proxy | Caddy (automatic HTTPS via Let's Encrypt) |
| Authentication | Clerk (RS256 JWT with JWKS) + bypass mode for dev |
| AI Providers | OpenRouter, OpenAI Direct, Ollama (local) |
| Document Parsing | Docling, Mistral OCR |
| Containerization | Docker Compose (16 services across base + profiles) |
Key Capabilities¶
- Multi-notebook workspace — Each notebook has its own documents, settings, conversations, and vector store partition
- Hybrid search — Full-text search (BM25) + semantic vector search + Reciprocal Rank Fusion
- Cache-first RAG — Exact (MD5) and semantic (cosine 0.97) cache lookups before retrieval
- Background LLM Judge — Automated response quality evaluation after every answer
- AI Enhancement — Contextual chunk enrichment with parallel processing
- Multi-provider LLM — OpenRouter, OpenAI, Ollama with dynamic API key resolution
- OneDrive integration — OAuth2 file import from SharePoint
- Role-based access — Admin / chat_only with notebook-level sharing via invite links
Request Flow¶
HTTP Request
→ Caddy (TLS termination, security headers, compression)
→ FastAPI (CORS middleware, global error handler)
→ Auth Middleware (Clerk JWT verification / bypass mode)
→ Router (endpoint matching, request validation)
→ Dependencies (Supabase client resolution, access control check)
→ Service (business logic, external API calls)
→ Database (Supabase PostgreSQL via supabase-py)
Learn More¶
- Backend Architecture — Services, routers, dependency injection
- Frontend Architecture — React pages, components, API client
- Database Schema — Tables, functions, indexes
- Infrastructure — Docker Compose, Caddy, deployment modes
- Security — Auth flow, authorization model, transport security