Skip to content

Features Overview

Beyond Retrieval v2 is a production-grade Retrieval-Augmented Generation (RAG) platform for enterprise document intelligence. It combines multi-format document ingestion, 10 retrieval strategies, AI-powered chat with citations, and continuous quality evaluation -- all within isolated, shareable notebook workspaces.

Unlike point solutions that handle one piece of the RAG pipeline, Beyond Retrieval v2 delivers the entire stack: parsing, chunking, embedding, retrieval, generation, caching, quality scoring, and collaboration -- ready to deploy on-premises or in the cloud.


Architecture

graph LR
    subgraph Clients
        Browser["Browser / API Client"]
    end

    subgraph "FastAPI Backend"
        API["REST API<br/>78+ endpoints"]
        Ingestion["Ingestion Pipeline"]
        RAG["RAG Engine"]
        Judge["LLM Judge"]
        Enhance["AI Enhancer"]
    end

    subgraph "Database Layer"
        Supabase["Supabase PostgreSQL 15<br/>pgvector + pg_trgm"]
        VectorStore["HNSW Vector Index"]
    end

    subgraph "Storage"
        SupaStorage["Supabase Storage"]
        S3["S3-Compatible"]
        LocalFS["Local Filesystem"]
    end

    subgraph "LLM Providers"
        OpenRouter["OpenRouter<br/>21+ embedding models<br/>23+ LLM models"]
        OpenAI["OpenAI Direct"]
        Ollama["Ollama (Local)"]
    end

    subgraph "Document Parsers"
        Docling["Docling Parser"]
        Mistral["Mistral OCR"]
    end

    Browser -->|"HTTPS / JSON"| API
    API --> Ingestion
    API --> RAG
    API --> Judge
    API --> Enhance
    Ingestion --> Docling
    Ingestion --> Mistral
    RAG --> OpenRouter
    RAG --> OpenAI
    RAG --> Ollama
    Ingestion --> SupaStorage
    Ingestion --> S3
    Ingestion --> LocalFS
    RAG --> Supabase
    Supabase --- VectorStore
    Ingestion --> Supabase

Feature Walkthrough

Notebook System

Notebooks are isolated workspaces that partition documents, conversations, settings, and vector embeddings. Each notebook operates independently, allowing teams to maintain separate knowledge bases for different projects, departments, or use cases.

Capability Detail
Isolated vector stores Each notebook's embeddings are partitioned by notebook_id in metadata
Immutable embedding model Chosen at creation, cannot be changed (prevents vector dimension mismatches)
Per-notebook LLM settings Provider, model, temperature, retrieval strategy, language mode
Database type Cloud (hosted Supabase) or Local (Docker Supabase) per notebook
Storage provider Supabase, S3, local filesystem, or none -- per notebook
Status dashboard Document counts, chunk stats, conversation metrics, enhancement progress

Why Immutable Embedding Models?

Changing the embedding model after documents are ingested would create mixed-dimension vectors in the same index, breaking similarity search. Beyond Retrieval locks this choice at creation time to guarantee index consistency.


Document Management

Upload, parse, chunk, and embed documents through a multi-stage background pipeline. The system handles 8 file formats with configurable parsing and chunking strategies.

Supported file types: PDF, DOCX, DOC, MD, TXT, CSV, XLSX, XLS

Stage What Happens
Upload Files stored in configured storage provider
Parsing Text extracted via Docling Parser or Mistral OCR
Cleaning Unicode normalization, whitespace consolidation, dedup check
Chunking Recursive, Docling Hybrid, or Agentic splitting
Image Description (Optional) Base64 images replaced with AI text descriptions
Context Augmentation (Optional) Chunks enriched with LLM-generated context
Embedding Vectors generated via OpenRouter, OpenAI, or Ollama
Storage Chunks + vectors + metadata inserted into PostgreSQL

Two parsers for different needs:

Parser Best For Speed
Docling Structured documents, table detection, DOCX native 0.01-0.22s (DOCX), 4-182s (PDF)
Mistral OCR Scanned documents, image-heavy PDFs ~2-5s for any document

Additional capabilities:

  • Re-ingestion -- Atomic cleanup of old data + re-pipeline with new settings
  • Batch re-ingestion -- Process multiple files independently in one request
  • OneDrive/SharePoint import -- Browse folders and import files via OAuth2

RAG Chat

The core RAG pipeline combines retrieval, generation, and quality evaluation into a single endpoint. Every response includes numbered citations, source metadata, and suggested follow-up questions.

Pipeline steps:

  1. Cache check (exact MD5 + semantic cosine)
  2. Query expansion (strategy-dependent)
  3. Hybrid retrieval (FTS + semantic + RRF)
  4. Chunk deduplication and ghost citation removal
  5. LLM generation with system prompt + persona + language
  6. Grounding score computation
  7. Background LLM Judge evaluation

10 retrieval strategies:

Strategy Type Requires LLM
fusion Reciprocal Rank Fusion No
semantic-context Semantic with context No
semantic-rerank Semantic with re-ranking No
hybrid-rerank Hybrid with re-ranking No
expanded-hybrid Query-expanded hybrid Yes
multi-query Multi-query expansion Yes
advanced-multipath Multi-strategy pipeline Yes
agentic-sql SQL-assisted retrieval Yes
agent-router Automatic strategy selection Yes
agent-harness Full agentic pipeline Yes

6 personas:

Persona Style
professional Formal and concise
funny Witty and light-hearted
mentor Educational, explains the "why"
storyteller Narrative and engaging
clear Simple, plain language
custom User-provided instructions

10 languages: English, German, Spanish, French, Italian, Portuguese, Dutch, Russian, Chinese, Japanese

Language modes:

  • Auto-detect -- Mirrors the language of the user's question
  • Manual -- Always responds in the configured language

Cache-first pipeline:

The system checks for cached responses before running full retrieval:

  1. Exact match -- MD5 hash of the query
  2. Semantic match -- Cosine similarity >= 0.97 threshold
  3. Cache adaptation -- Lightweight LLM call adjusts cached responses for different language/persona combinations while preserving citations

Cache settings: cache_enabled, cache_similarity_threshold (default 0.97), cache_max_age_days (default 7).

LLM Judge:

After every RAG response, a background task evaluates answer quality:

  • Scores responses on accuracy, groundedness, and answerability
  • Responses scoring >= 80 with grounded=true are cached for future queries
  • Results logged to response_quality_log table
  • Configurable via judge_enabled kill switch

AI Enhancement

The enhancement pipeline enriches document chunks with AI-generated contextual descriptions, improving retrieval accuracy for ambiguous or context-dependent queries.

Lifecycle:

pending --> processing --> success --> embedded (publish)
              |
              v
           failed -----> pending (reset)
Feature Detail
Parallel processing asyncio.Semaphore with configurable per-notebook concurrency
Provider defaults OpenRouter: 10 concurrent, OpenAI: 5, Ollama: 3
Rate limit handling Exponential backoff (10s x 2^attempt) with Retry-After support
Smart truncation 12K char budget with 20/60/20 split for large documents
Publish safety Aborts if any chunk is not in "success" status
Metadata repair Recomputes line ranges for chunks with missing metadata

Enhanced chunks use the format:

# Context
{AI-generated contextual description}

---

# Content
{Original chunk text}

Health Monitor

Automated data quality scoring with duplicate detection and cleanup tools.

Health score formula:

score = 100
      - (duplicates / total) * 40
      - (orphans / total) * 30
      + (enhanced / total) * 10

The score is capped between 0 and 100.

Detection Method
Duplicates Content hash comparison across chunks
Orphans Embeddings without matching document records
Enhancement ratio Percentage of chunks that have been AI-enhanced

Cleanup tools:

  • Remove duplicates -- Keeps the oldest chunk in each duplicate group
  • Remove orphans -- Deletes embeddings without parent records

Multi-Provider LLM

Beyond Retrieval supports three LLM providers with per-notebook configuration and dynamic API key resolution.

Provider Models Use Case
OpenRouter (default) 23+ LLM models, 21+ embedding models Production -- widest model selection
OpenAI Direct GPT-4o, text-embedding-3-small Direct OpenAI access
Ollama qwen2.5:1.5b, nomic-embed-text Local/private -- no data leaves your network

Key resolution chain: User-saved DB key > Server .env key > empty string

Each notebook independently configures its provider, model, and temperature. Switching providers auto-adjusts model ID prefixes (e.g., openai/gpt-4o-mini for OpenRouter vs gpt-4o-mini for OpenAI Direct).


Dynamic API Keys

Users can save their own API keys through the Global Settings UI. These keys take priority over server-configured keys and flow through the entire pipeline.

Where user keys are used:

  • Chat (LLM inference)
  • Retrieval (embedding queries)
  • Cache check (embedding for semantic lookup)
  • LLM Judge (quality evaluation)
  • AI Enhancement (chunk enrichment)

Key management endpoints:

  • GET /api/settings/api-keys -- List key status with masked preview
  • PUT /api/settings/api-keys -- Save or update keys
  • DELETE /api/settings/api-keys/{key_name} -- Remove key (falls back to server config)
  • POST /api/settings/api-keys/test -- Validate key against provider

Missing API Key

If no key is configured (DB or .env) for a non-Ollama provider, the system returns HTTP 400 with a clear error message before attempting any LLM call.


Notebook Sharing

Admins can generate invite links to grant other users access to specific notebooks.

Feature Detail
Invite links Auto-generated codes with optional expiration
Access levels chat_only for invited users, full admin for owners
Cross-database Invite redemption searches across cloud and local databases
Access management View who has access, revoke individual users, deactivate invite codes

OneDrive Integration

Import files directly from SharePoint/OneDrive without manual download-upload cycles.

  • OAuth2 flow -- Direct authorization via httpx (no MSAL dependency)
  • Folder browsing -- Navigate SharePoint folder structure via Graph API
  • File import -- Download from OneDrive, upload to configured storage, trigger ingestion
  • Token management -- Fernet-encrypted tokens stored in the database

Authentication

Beyond Retrieval uses Supabase GoTrue for authentication with a bypass mode for local development.

Mode Behavior
Bypass (BYPASS_AUTH=true) All requests succeed as dev-user -- no token needed
Production (BYPASS_AUTH=false) HS256 JWT verified against GoTrue

Authorization levels:

Level Access
None Public endpoints (GET /api/health, GET /api/auth/config)
User Any authenticated user (or dev-user in bypass mode)
Admin User with admin role in app_metadata

Comparison with Alternatives

Capability Beyond Retrieval v2 NotebookLM ChatGPT File Upload LangChain + Pinecone (DIY)
Multi-notebook workspaces Yes Yes No Manual
File types 8 formats PDF, Docs PDF, images, code Manual per loader
Retrieval strategies 10 configurable 1 (fixed) 1 (fixed) Manual per chain
Personas & languages 6 + 10 No Basic Manual prompting
Cache-first RAG MD5 + semantic No No Manual
LLM Judge Automatic background No No Manual
AI Enhancement Parallel chunk enrichment No No Manual
Health monitoring Automated scoring + cleanup No No Manual
Multi-provider LLM 3 providers, per-notebook Google only OpenAI only Any (manual wiring)
On-premises option Full Docker stack + Ollama No No Partial
Notebook sharing Invite links with access control Share links No Manual
API-first 78+ REST endpoints No API No API Custom
Self-hosted Yes No No Yes
Dynamic API keys User-level key management N/A N/A Environment only

Use Cases

Enterprise Knowledge Base

Centralize company policies, product documentation, and internal procedures. Employees ask questions in their preferred language and receive cited answers with source references.

Customer Support

Build a support knowledge base from help articles, FAQs, and product manuals. Support agents get instant answers with citations instead of searching through documents manually.

Upload contracts, regulations, and case law. Lawyers can query across thousands of pages and get grounded answers with precise citations pointing to relevant clauses.

Research Assistant

Aggregate academic papers, reports, and data sheets. Researchers query across their corpus with different retrieval strategies to find relevant passages.

Internal Wiki Replacement

Replace static wikis with a queryable knowledge base that understands context. Unlike keyword search, the RAG pipeline handles paraphrased questions, synonyms, and cross-document reasoning.


Getting Started

Ready to deploy? Follow the Quick Start guide to get Beyond Retrieval running in under 10 minutes with Docker Compose.

For API integration, see the API Overview and End-to-End Workflows.