Skip to content

System Overview

Beyond Retrieval v2 is a production-grade Retrieval-Augmented Generation (RAG) platform designed for enterprise document intelligence. It provides end-to-end document ingestion, vector-based retrieval, AI-powered chat, and continuous quality evaluation — all within a multi-tenant notebook paradigm.


Architecture Diagram

graph TB
    subgraph Client
        Browser["Web Browser"]
    end

    subgraph Reverse Proxy
        Caddy["Caddy<br/>Auto HTTPS / Static Files / API Proxy"]
    end

    subgraph Application Layer
        Frontend["React SPA<br/>Vite 7 + Tailwind CSS 4"]
        Backend["FastAPI Backend<br/>Python 3.12"]
    end

    subgraph Database Layer
        Supabase["Supabase PostgreSQL 15"]
        pgvector["pgvector<br/>HNSW Index"]
        Storage["Object Storage<br/>Supabase / S3 / Local"]
    end

    subgraph AI Services
        OpenRouter["OpenRouter Gateway<br/>21+ embedding models<br/>23+ LLM models"]
        OpenAIDirect["OpenAI Direct<br/>text-embedding-3-small<br/>GPT-4o"]
        Ollama["Ollama Local<br/>nomic-embed-text<br/>qwen2.5:1.5b"]
        Docling["Docling Parser<br/>PDF + DOCX extraction"]
        MistralOCR["Mistral OCR<br/>PDF + DOCX OCR"]
    end

    subgraph Auth
        Clerk["Clerk<br/>RS256 JWT + JWKS"]
    end

    Browser -->|"HTTPS"| Caddy
    Caddy -->|"Static files /srv"| Frontend
    Caddy -->|"Reverse proxy /api/*"| Backend
    Backend -->|"SQL + RPC"| Supabase
    Supabase --- pgvector
    Backend -->|"File upload/download"| Storage
    Backend -->|"LLM inference + embeddings"| OpenRouter
    Backend -->|"LLM inference + embeddings"| OpenAIDirect
    Backend -->|"LLM inference + embeddings"| Ollama
    Backend -->|"Document parsing"| Docling
    Backend -->|"OCR extraction"| MistralOCR
    Backend -->|"JWT verification"| Clerk

Core Stack

Layer Technology
Frontend React 19, Vite 7, Tailwind CSS 4
Backend FastAPI (Python 3.12), Pydantic AI
Database PostgreSQL 15 (Supabase) with pgvector, pg_trgm, pg_net
Object Storage Supabase Storage / S3-compatible / Local filesystem
Reverse Proxy Caddy (automatic HTTPS via Let's Encrypt)
Authentication Clerk (RS256 JWT with JWKS) + bypass mode for dev
AI Providers OpenRouter, OpenAI Direct, Ollama (local)
Document Parsing Docling, Mistral OCR
Containerization Docker Compose (16 services across base + profiles)

Key Capabilities

  • Multi-notebook workspace — Each notebook has its own documents, settings, conversations, and vector store partition
  • Hybrid search — Full-text search (BM25) + semantic vector search + Reciprocal Rank Fusion
  • Cache-first RAG — Exact (MD5) and semantic (cosine 0.97) cache lookups before retrieval
  • Background LLM Judge — Automated response quality evaluation after every answer
  • AI Enhancement — Contextual chunk enrichment with parallel processing
  • Multi-provider LLM — OpenRouter, OpenAI, Ollama with dynamic API key resolution
  • OneDrive integration — OAuth2 file import from SharePoint
  • Role-based access — Admin / chat_only with notebook-level sharing via invite links

Request Flow

HTTP Request
  → Caddy (TLS termination, security headers, compression)
    → FastAPI (CORS middleware, global error handler)
      → Auth Middleware (Clerk JWT verification / bypass mode)
        → Router (endpoint matching, request validation)
          → Dependencies (Supabase client resolution, access control check)
            → Service (business logic, external API calls)
              → Database (Supabase PostgreSQL via supabase-py)

Learn More