Features Overview¶

Beyond Retrieval v2 is a production-grade Retrieval-Augmented Generation (RAG) platform for enterprise document intelligence. It combines multi-format document ingestion, 10 retrieval strategies, AI-powered chat with citations, and continuous quality evaluation -- all within isolated, shareable notebook workspaces.

Unlike point solutions that handle one piece of the RAG pipeline, Beyond Retrieval v2 delivers the entire stack: parsing, chunking, embedding, retrieval, generation, caching, quality scoring, and collaboration -- ready to deploy on-premises or in the cloud.

Architecture¶

graph LR
    subgraph Clients
        Browser["Browser / API Client"]
    end

    subgraph "FastAPI Backend"
        API["REST API<br/>78+ endpoints"]
        Ingestion["Ingestion Pipeline"]
        RAG["RAG Engine"]
        Judge["LLM Judge"]
        Enhance["AI Enhancer"]
    end

    subgraph "Database Layer"
        Supabase["Supabase PostgreSQL 15<br/>pgvector + pg_trgm"]
        VectorStore["HNSW Vector Index"]
    end

    subgraph "Storage"
        SupaStorage["Supabase Storage"]
        S3["S3-Compatible"]
        LocalFS["Local Filesystem"]
    end

    subgraph "LLM Providers"
        OpenRouter["OpenRouter<br/>21+ embedding models<br/>23+ LLM models"]
        OpenAI["OpenAI Direct"]
        Ollama["Ollama (Local)"]
    end

    subgraph "Document Parsers"
        Docling["Docling Parser"]
        Mistral["Mistral OCR"]
    end

    Browser -->|"HTTPS / JSON"| API
    API --> Ingestion
    API --> RAG
    API --> Judge
    API --> Enhance
    Ingestion --> Docling
    Ingestion --> Mistral
    RAG --> OpenRouter
    RAG --> OpenAI
    RAG --> Ollama
    Ingestion --> SupaStorage
    Ingestion --> S3
    Ingestion --> LocalFS
    RAG --> Supabase
    Supabase --- VectorStore
    Ingestion --> Supabase

Feature Walkthrough¶

Notebook System¶

Notebooks are isolated workspaces that partition documents, conversations, settings, and vector embeddings. Each notebook operates independently, allowing teams to maintain separate knowledge bases for different projects, departments, or use cases.

Capability	Detail
Isolated vector stores	Each notebook's embeddings are partitioned by `notebook_id` in metadata
Immutable embedding model	Chosen at creation, cannot be changed (prevents vector dimension mismatches)
Per-notebook LLM settings	Provider, model, temperature, retrieval strategy, language mode
Database type	Cloud (hosted Supabase) or Local (Docker Supabase) per notebook
Storage provider	Supabase, S3, local filesystem, or none -- per notebook
Status dashboard	Document counts, chunk stats, conversation metrics, enhancement progress

Why Immutable Embedding Models?

Changing the embedding model after documents are ingested would create mixed-dimension vectors in the same index, breaking similarity search. Beyond Retrieval locks this choice at creation time to guarantee index consistency.

Document Management¶

Upload, parse, chunk, and embed documents through a multi-stage background pipeline. The system handles 8 file formats with configurable parsing and chunking strategies.

Supported file types: PDF, DOCX, DOC, MD, TXT, CSV, XLSX, XLS

Stage	What Happens
Upload	Files stored in configured storage provider
Parsing	Text extracted via Docling Parser or Mistral OCR
Cleaning	Unicode normalization, whitespace consolidation, dedup check
Chunking	Recursive, Docling Hybrid, or Agentic splitting
Image Description	(Optional) Base64 images replaced with AI text descriptions
Context Augmentation	(Optional) Chunks enriched with LLM-generated context
Embedding	Vectors generated via OpenRouter, OpenAI, or Ollama
Storage	Chunks + vectors + metadata inserted into PostgreSQL

Two parsers for different needs:

Parser	Best For	Speed
Docling	Structured documents, table detection, DOCX native	0.01-0.22s (DOCX), 4-182s (PDF)
Mistral OCR	Scanned documents, image-heavy PDFs	~2-5s for any document

Additional capabilities:

Re-ingestion -- Atomic cleanup of old data + re-pipeline with new settings
Batch re-ingestion -- Process multiple files independently in one request
OneDrive/SharePoint import -- Browse folders and import files via OAuth2

RAG Chat¶

The core RAG pipeline combines retrieval, generation, and quality evaluation into a single endpoint. Every response includes numbered citations, source metadata, and suggested follow-up questions.

Pipeline steps:

Cache check (exact MD5 + semantic cosine)
Query expansion (strategy-dependent)
Hybrid retrieval (FTS + semantic + RRF)
Chunk deduplication and ghost citation removal
LLM generation with system prompt + persona + language
Grounding score computation
Background LLM Judge evaluation

10 retrieval strategies:

Strategy	Type	Requires LLM
`fusion`	Reciprocal Rank Fusion	No
`semantic-context`	Semantic with context	No
`semantic-rerank`	Semantic with re-ranking	No
`hybrid-rerank`	Hybrid with re-ranking	No
`expanded-hybrid`	Query-expanded hybrid	Yes
`multi-query`	Multi-query expansion	Yes
`advanced-multipath`	Multi-strategy pipeline	Yes
`agentic-sql`	SQL-assisted retrieval	Yes
`agent-router`	Automatic strategy selection	Yes
`agent-harness`	Full agentic pipeline	Yes

6 personas:

Persona	Style
`professional`	Formal and concise
`funny`	Witty and light-hearted
`mentor`	Educational, explains the "why"
`storyteller`	Narrative and engaging
`clear`	Simple, plain language
`custom`	User-provided instructions

10 languages: English, German, Spanish, French, Italian, Portuguese, Dutch, Russian, Chinese, Japanese

Language modes:

Auto-detect -- Mirrors the language of the user's question
Manual -- Always responds in the configured language

Cache-first pipeline:

The system checks for cached responses before running full retrieval:

Exact match -- MD5 hash of the query
Semantic match -- Cosine similarity >= 0.97 threshold
Cache adaptation -- Lightweight LLM call adjusts cached responses for different language/persona combinations while preserving citations

Cache settings: cache_enabled, cache_similarity_threshold (default 0.97), cache_max_age_days (default 7).

LLM Judge:

After every RAG response, a background task evaluates answer quality:

Scores responses on accuracy, groundedness, and answerability
Responses scoring >= 80 with grounded=true are cached for future queries
Results logged to response_quality_log table
Configurable via judge_enabled kill switch

AI Enhancement¶

The enhancement pipeline enriches document chunks with AI-generated contextual descriptions, improving retrieval accuracy for ambiguous or context-dependent queries.

Lifecycle:

pending --> processing --> success --> embedded (publish)
              |
              v
           failed -----> pending (reset)

Feature	Detail
Parallel processing	`asyncio.Semaphore` with configurable per-notebook concurrency
Provider defaults	OpenRouter: 10 concurrent, OpenAI: 5, Ollama: 3
Rate limit handling	Exponential backoff (10s x 2^attempt) with Retry-After support
Smart truncation	12K char budget with 20/60/20 split for large documents
Publish safety	Aborts if any chunk is not in "success" status
Metadata repair	Recomputes line ranges for chunks with missing metadata

Enhanced chunks use the format:

# Context
{AI-generated contextual description}

---

# Content
{Original chunk text}

Health Monitor¶

Automated data quality scoring with duplicate detection and cleanup tools.

Health score formula:

score = 100
      - (duplicates / total) * 40
      - (orphans / total) * 30
      + (enhanced / total) * 10

The score is capped between 0 and 100.

Detection	Method
Duplicates	Content hash comparison across chunks
Orphans	Embeddings without matching document records
Enhancement ratio	Percentage of chunks that have been AI-enhanced

Cleanup tools:

Remove duplicates -- Keeps the oldest chunk in each duplicate group
Remove orphans -- Deletes embeddings without parent records

Multi-Provider LLM¶

Beyond Retrieval supports three LLM providers with per-notebook configuration and dynamic API key resolution.

Provider	Models	Use Case
OpenRouter (default)	23+ LLM models, 21+ embedding models	Production -- widest model selection
OpenAI Direct	GPT-4o, text-embedding-3-small	Direct OpenAI access
Ollama	qwen2.5:1.5b, nomic-embed-text	Local/private -- no data leaves your network

Key resolution chain: User-saved DB key > Server .env key > empty string

Each notebook independently configures its provider, model, and temperature. Switching providers auto-adjusts model ID prefixes (e.g., openai/gpt-4o-mini for OpenRouter vs gpt-4o-mini for OpenAI Direct).

Dynamic API Keys¶

Users can save their own API keys through the Global Settings UI. These keys take priority over server-configured keys and flow through the entire pipeline.

Where user keys are used:

Chat (LLM inference)
Retrieval (embedding queries)
Cache check (embedding for semantic lookup)
LLM Judge (quality evaluation)
AI Enhancement (chunk enrichment)

Key management endpoints:

GET /api/settings/api-keys -- List key status with masked preview
PUT /api/settings/api-keys -- Save or update keys
DELETE /api/settings/api-keys/{key_name} -- Remove key (falls back to server config)
POST /api/settings/api-keys/test -- Validate key against provider

Missing API Key

If no key is configured (DB or .env) for a non-Ollama provider, the system returns HTTP 400 with a clear error message before attempting any LLM call.

Admins can generate invite links to grant other users access to specific notebooks.

Feature	Detail
Invite links	Auto-generated codes with optional expiration
Access levels	`chat_only` for invited users, full admin for owners
Cross-database	Invite redemption searches across cloud and local databases
Access management	View who has access, revoke individual users, deactivate invite codes

OneDrive Integration¶

Import files directly from SharePoint/OneDrive without manual download-upload cycles.

OAuth2 flow -- Direct authorization via httpx (no MSAL dependency)
Folder browsing -- Navigate SharePoint folder structure via Graph API
File import -- Download from OneDrive, upload to configured storage, trigger ingestion
Token management -- Fernet-encrypted tokens stored in the database

Authentication¶

Beyond Retrieval uses Supabase GoTrue for authentication with a bypass mode for local development.

Mode	Behavior
Bypass (`BYPASS_AUTH=true`)	All requests succeed as `dev-user` -- no token needed
Production (`BYPASS_AUTH=false`)	HS256 JWT verified against GoTrue

Authorization levels:

Level	Access
None	Public endpoints (`GET /api/health`, `GET /api/auth/config`)
User	Any authenticated user (or `dev-user` in bypass mode)
Admin	User with admin role in `app_metadata`

Comparison with Alternatives¶

Capability	Beyond Retrieval v2	NotebookLM	ChatGPT File Upload	LangChain + Pinecone (DIY)
Multi-notebook workspaces	Yes	Yes	No	Manual
File types	8 formats	PDF, Docs	PDF, images, code	Manual per loader
Retrieval strategies	10 configurable	1 (fixed)	1 (fixed)	Manual per chain
Personas & languages	6 + 10	No	Basic	Manual prompting
Cache-first RAG	MD5 + semantic	No	No	Manual
LLM Judge	Automatic background	No	No	Manual
AI Enhancement	Parallel chunk enrichment	No	No	Manual
Health monitoring	Automated scoring + cleanup	No	No	Manual
Multi-provider LLM	3 providers, per-notebook	Google only	OpenAI only	Any (manual wiring)
On-premises option	Full Docker stack + Ollama	No	No	Partial
Notebook sharing	Invite links with access control	Share links	No	Manual
API-first	78+ REST endpoints	No API	No API	Custom
Self-hosted	Yes	No	No	Yes
Dynamic API keys	User-level key management	N/A	N/A	Environment only

Use Cases¶

Enterprise Knowledge Base¶

Centralize company policies, product documentation, and internal procedures. Employees ask questions in their preferred language and receive cited answers with source references.

Customer Support¶

Build a support knowledge base from help articles, FAQs, and product manuals. Support agents get instant answers with citations instead of searching through documents manually.

Legal Document Analysis¶

Upload contracts, regulations, and case law. Lawyers can query across thousands of pages and get grounded answers with precise citations pointing to relevant clauses.

Research Assistant¶

Aggregate academic papers, reports, and data sheets. Researchers query across their corpus with different retrieval strategies to find relevant passages.

Internal Wiki Replacement¶

Replace static wikis with a queryable knowledge base that understands context. Unlike keyword search, the RAG pipeline handles paraphrased questions, synonyms, and cross-document reasoning.

Getting Started¶

Ready to deploy? Follow the Quick Start guide to get Beyond Retrieval running in under 10 minutes with Docker Compose.

For API integration, see the API Overview and End-to-End Workflows.