Features Overview¶
Beyond Retrieval v2 is a production-grade Retrieval-Augmented Generation (RAG) platform for enterprise document intelligence. It combines multi-format document ingestion, 10 retrieval strategies, AI-powered chat with citations, and continuous quality evaluation -- all within isolated, shareable notebook workspaces.
Unlike point solutions that handle one piece of the RAG pipeline, Beyond Retrieval v2 delivers the entire stack: parsing, chunking, embedding, retrieval, generation, caching, quality scoring, and collaboration -- ready to deploy on-premises or in the cloud.
Architecture¶
graph LR
subgraph Clients
Browser["Browser / API Client"]
end
subgraph "FastAPI Backend"
API["REST API<br/>78+ endpoints"]
Ingestion["Ingestion Pipeline"]
RAG["RAG Engine"]
Judge["LLM Judge"]
Enhance["AI Enhancer"]
end
subgraph "Database Layer"
Supabase["Supabase PostgreSQL 15<br/>pgvector + pg_trgm"]
VectorStore["HNSW Vector Index"]
end
subgraph "Storage"
SupaStorage["Supabase Storage"]
S3["S3-Compatible"]
LocalFS["Local Filesystem"]
end
subgraph "LLM Providers"
OpenRouter["OpenRouter<br/>21+ embedding models<br/>23+ LLM models"]
OpenAI["OpenAI Direct"]
Ollama["Ollama (Local)"]
end
subgraph "Document Parsers"
Docling["Docling Parser"]
Mistral["Mistral OCR"]
end
Browser -->|"HTTPS / JSON"| API
API --> Ingestion
API --> RAG
API --> Judge
API --> Enhance
Ingestion --> Docling
Ingestion --> Mistral
RAG --> OpenRouter
RAG --> OpenAI
RAG --> Ollama
Ingestion --> SupaStorage
Ingestion --> S3
Ingestion --> LocalFS
RAG --> Supabase
Supabase --- VectorStore
Ingestion --> Supabase Feature Walkthrough¶
Notebook System¶
Notebooks are isolated workspaces that partition documents, conversations, settings, and vector embeddings. Each notebook operates independently, allowing teams to maintain separate knowledge bases for different projects, departments, or use cases.
| Capability | Detail |
|---|---|
| Isolated vector stores | Each notebook's embeddings are partitioned by notebook_id in metadata |
| Immutable embedding model | Chosen at creation, cannot be changed (prevents vector dimension mismatches) |
| Per-notebook LLM settings | Provider, model, temperature, retrieval strategy, language mode |
| Database type | Cloud (hosted Supabase) or Local (Docker Supabase) per notebook |
| Storage provider | Supabase, S3, local filesystem, or none -- per notebook |
| Status dashboard | Document counts, chunk stats, conversation metrics, enhancement progress |
Why Immutable Embedding Models?
Changing the embedding model after documents are ingested would create mixed-dimension vectors in the same index, breaking similarity search. Beyond Retrieval locks this choice at creation time to guarantee index consistency.
Document Management¶
Upload, parse, chunk, and embed documents through a multi-stage background pipeline. The system handles 8 file formats with configurable parsing and chunking strategies.
Supported file types: PDF, DOCX, DOC, MD, TXT, CSV, XLSX, XLS
| Stage | What Happens |
|---|---|
| Upload | Files stored in configured storage provider |
| Parsing | Text extracted via Docling Parser or Mistral OCR |
| Cleaning | Unicode normalization, whitespace consolidation, dedup check |
| Chunking | Recursive, Docling Hybrid, or Agentic splitting |
| Image Description | (Optional) Base64 images replaced with AI text descriptions |
| Context Augmentation | (Optional) Chunks enriched with LLM-generated context |
| Embedding | Vectors generated via OpenRouter, OpenAI, or Ollama |
| Storage | Chunks + vectors + metadata inserted into PostgreSQL |
Two parsers for different needs:
| Parser | Best For | Speed |
|---|---|---|
| Docling | Structured documents, table detection, DOCX native | 0.01-0.22s (DOCX), 4-182s (PDF) |
| Mistral OCR | Scanned documents, image-heavy PDFs | ~2-5s for any document |
Additional capabilities:
- Re-ingestion -- Atomic cleanup of old data + re-pipeline with new settings
- Batch re-ingestion -- Process multiple files independently in one request
- OneDrive/SharePoint import -- Browse folders and import files via OAuth2
RAG Chat¶
The core RAG pipeline combines retrieval, generation, and quality evaluation into a single endpoint. Every response includes numbered citations, source metadata, and suggested follow-up questions.
Pipeline steps:
- Cache check (exact MD5 + semantic cosine)
- Query expansion (strategy-dependent)
- Hybrid retrieval (FTS + semantic + RRF)
- Chunk deduplication and ghost citation removal
- LLM generation with system prompt + persona + language
- Grounding score computation
- Background LLM Judge evaluation
10 retrieval strategies:
| Strategy | Type | Requires LLM |
|---|---|---|
fusion | Reciprocal Rank Fusion | No |
semantic-context | Semantic with context | No |
semantic-rerank | Semantic with re-ranking | No |
hybrid-rerank | Hybrid with re-ranking | No |
expanded-hybrid | Query-expanded hybrid | Yes |
multi-query | Multi-query expansion | Yes |
advanced-multipath | Multi-strategy pipeline | Yes |
agentic-sql | SQL-assisted retrieval | Yes |
agent-router | Automatic strategy selection | Yes |
agent-harness | Full agentic pipeline | Yes |
6 personas:
| Persona | Style |
|---|---|
professional | Formal and concise |
funny | Witty and light-hearted |
mentor | Educational, explains the "why" |
storyteller | Narrative and engaging |
clear | Simple, plain language |
custom | User-provided instructions |
10 languages: English, German, Spanish, French, Italian, Portuguese, Dutch, Russian, Chinese, Japanese
Language modes:
- Auto-detect -- Mirrors the language of the user's question
- Manual -- Always responds in the configured language
Cache-first pipeline:
The system checks for cached responses before running full retrieval:
- Exact match -- MD5 hash of the query
- Semantic match -- Cosine similarity >= 0.97 threshold
- Cache adaptation -- Lightweight LLM call adjusts cached responses for different language/persona combinations while preserving citations
Cache settings: cache_enabled, cache_similarity_threshold (default 0.97), cache_max_age_days (default 7).
LLM Judge:
After every RAG response, a background task evaluates answer quality:
- Scores responses on accuracy, groundedness, and answerability
- Responses scoring >= 80 with grounded=true are cached for future queries
- Results logged to
response_quality_logtable - Configurable via
judge_enabledkill switch
AI Enhancement¶
The enhancement pipeline enriches document chunks with AI-generated contextual descriptions, improving retrieval accuracy for ambiguous or context-dependent queries.
Lifecycle:
| Feature | Detail |
|---|---|
| Parallel processing | asyncio.Semaphore with configurable per-notebook concurrency |
| Provider defaults | OpenRouter: 10 concurrent, OpenAI: 5, Ollama: 3 |
| Rate limit handling | Exponential backoff (10s x 2^attempt) with Retry-After support |
| Smart truncation | 12K char budget with 20/60/20 split for large documents |
| Publish safety | Aborts if any chunk is not in "success" status |
| Metadata repair | Recomputes line ranges for chunks with missing metadata |
Enhanced chunks use the format:
Health Monitor¶
Automated data quality scoring with duplicate detection and cleanup tools.
Health score formula:
The score is capped between 0 and 100.
| Detection | Method |
|---|---|
| Duplicates | Content hash comparison across chunks |
| Orphans | Embeddings without matching document records |
| Enhancement ratio | Percentage of chunks that have been AI-enhanced |
Cleanup tools:
- Remove duplicates -- Keeps the oldest chunk in each duplicate group
- Remove orphans -- Deletes embeddings without parent records
Multi-Provider LLM¶
Beyond Retrieval supports three LLM providers with per-notebook configuration and dynamic API key resolution.
| Provider | Models | Use Case |
|---|---|---|
| OpenRouter (default) | 23+ LLM models, 21+ embedding models | Production -- widest model selection |
| OpenAI Direct | GPT-4o, text-embedding-3-small | Direct OpenAI access |
| Ollama | qwen2.5:1.5b, nomic-embed-text | Local/private -- no data leaves your network |
Key resolution chain: User-saved DB key > Server .env key > empty string
Each notebook independently configures its provider, model, and temperature. Switching providers auto-adjusts model ID prefixes (e.g., openai/gpt-4o-mini for OpenRouter vs gpt-4o-mini for OpenAI Direct).
Dynamic API Keys¶
Users can save their own API keys through the Global Settings UI. These keys take priority over server-configured keys and flow through the entire pipeline.
Where user keys are used:
- Chat (LLM inference)
- Retrieval (embedding queries)
- Cache check (embedding for semantic lookup)
- LLM Judge (quality evaluation)
- AI Enhancement (chunk enrichment)
Key management endpoints:
GET /api/settings/api-keys-- List key status with masked previewPUT /api/settings/api-keys-- Save or update keysDELETE /api/settings/api-keys/{key_name}-- Remove key (falls back to server config)POST /api/settings/api-keys/test-- Validate key against provider
Missing API Key
If no key is configured (DB or .env) for a non-Ollama provider, the system returns HTTP 400 with a clear error message before attempting any LLM call.
Notebook Sharing¶
Admins can generate invite links to grant other users access to specific notebooks.
| Feature | Detail |
|---|---|
| Invite links | Auto-generated codes with optional expiration |
| Access levels | chat_only for invited users, full admin for owners |
| Cross-database | Invite redemption searches across cloud and local databases |
| Access management | View who has access, revoke individual users, deactivate invite codes |
OneDrive Integration¶
Import files directly from SharePoint/OneDrive without manual download-upload cycles.
- OAuth2 flow -- Direct authorization via httpx (no MSAL dependency)
- Folder browsing -- Navigate SharePoint folder structure via Graph API
- File import -- Download from OneDrive, upload to configured storage, trigger ingestion
- Token management -- Fernet-encrypted tokens stored in the database
Authentication¶
Beyond Retrieval uses Supabase GoTrue for authentication with a bypass mode for local development.
| Mode | Behavior |
|---|---|
Bypass (BYPASS_AUTH=true) | All requests succeed as dev-user -- no token needed |
Production (BYPASS_AUTH=false) | HS256 JWT verified against GoTrue |
Authorization levels:
| Level | Access |
|---|---|
| None | Public endpoints (GET /api/health, GET /api/auth/config) |
| User | Any authenticated user (or dev-user in bypass mode) |
| Admin | User with admin role in app_metadata |
Comparison with Alternatives¶
| Capability | Beyond Retrieval v2 | NotebookLM | ChatGPT File Upload | LangChain + Pinecone (DIY) |
|---|---|---|---|---|
| Multi-notebook workspaces | Yes | Yes | No | Manual |
| File types | 8 formats | PDF, Docs | PDF, images, code | Manual per loader |
| Retrieval strategies | 10 configurable | 1 (fixed) | 1 (fixed) | Manual per chain |
| Personas & languages | 6 + 10 | No | Basic | Manual prompting |
| Cache-first RAG | MD5 + semantic | No | No | Manual |
| LLM Judge | Automatic background | No | No | Manual |
| AI Enhancement | Parallel chunk enrichment | No | No | Manual |
| Health monitoring | Automated scoring + cleanup | No | No | Manual |
| Multi-provider LLM | 3 providers, per-notebook | Google only | OpenAI only | Any (manual wiring) |
| On-premises option | Full Docker stack + Ollama | No | No | Partial |
| Notebook sharing | Invite links with access control | Share links | No | Manual |
| API-first | 78+ REST endpoints | No API | No API | Custom |
| Self-hosted | Yes | No | No | Yes |
| Dynamic API keys | User-level key management | N/A | N/A | Environment only |
Use Cases¶
Enterprise Knowledge Base¶
Centralize company policies, product documentation, and internal procedures. Employees ask questions in their preferred language and receive cited answers with source references.
Customer Support¶
Build a support knowledge base from help articles, FAQs, and product manuals. Support agents get instant answers with citations instead of searching through documents manually.
Legal Document Analysis¶
Upload contracts, regulations, and case law. Lawyers can query across thousands of pages and get grounded answers with precise citations pointing to relevant clauses.
Research Assistant¶
Aggregate academic papers, reports, and data sheets. Researchers query across their corpus with different retrieval strategies to find relevant passages.
Internal Wiki Replacement¶
Replace static wikis with a queryable knowledge base that understands context. Unlike keyword search, the RAG pipeline handles paraphrased questions, synonyms, and cross-document reasoning.
Getting Started¶
Ready to deploy? Follow the Quick Start guide to get Beyond Retrieval running in under 10 minutes with Docker Compose.
For API integration, see the API Overview and End-to-End Workflows.