Skip to content

Database Schema

Beyond Retrieval v2 uses PostgreSQL 15 via Supabase with three extensions: pgvector, pg_trgm, and pg_net.


Entity Relationship Diagram

erDiagram
    notebook {
        UUID notebook_id PK
        TEXT notebook_title
        TEXT notebook_description
        TEXT icon
        TEXT db_type
        TEXT storage_provider
        TEXT user_id
        INTEGER number_of_documents
        TIMESTAMPTZ created_at
        TIMESTAMPTZ updated_at
    }

    notebook_settings {
        UUID notebook_id PK_FK
        TEXT system_prompt_retrieval
        VARCHAR embedding_model
        VARCHAR inference_provider
        VARCHAR inference_model
        DECIMAL inference_temperature
        VARCHAR active_strategy_id
        JSONB strategies_config
        BOOLEAN enable_multimodal_processing
        TEXT language_mode
        INTEGER enhancement_concurrency
        JSONB ingestion_settings
        BOOLEAN judge_enabled
        TIMESTAMPTZ created_at
    }

    notebook_file_jobs {
        UUID job_id PK
        UUID notebook_id FK
        TEXT file_id
        TEXT file_name
        TEXT status
        TEXT error_description
        JSONB ingestion_settings
        TIMESTAMPTZ created_at
    }

    documents {
        BIGSERIAL id PK
        TEXT content
        JSONB metadata
        VECTOR embedding
        TSVECTOR fts
    }

    document_records {
        UUID id PK
        TEXT file_id
        TEXT file_name
        UUID notebook_id FK
        TIMESTAMPTZ created_at
    }

    contextual_retrieval_table {
        TEXT chunk_id PK
        UUID job_id FK
        TEXT file_id
        TEXT original_chunk
        TEXT enhanced_chunk
        TEXT status
        UUID notebook_id FK
        JSONB original_metadata
    }

    chat_conversations {
        UUID conversation_id PK
        UUID notebook_id FK
        TEXT user_id
        TEXT title
        BOOLEAN is_pinned
        INTEGER message_count
        TIMESTAMPTZ last_message_at
    }

    chat_history {
        UUID id PK
        UUID conversation_id FK
        UUID notebook_id FK
        VARCHAR role
        TEXT content
        JSONB citations
        JSONB run_metadata
    }

    query_cache {
        UUID id PK
        UUID notebook_id FK
        TEXT query_text
        TEXT query_hash
        TEXT response
        VECTOR embedding
        TIMESTAMPTZ created_at
    }

    notebook_access {
        UUID id PK
        UUID notebook_id FK
        TEXT user_id
        TEXT access_type
    }

    notebook_invites {
        UUID id PK
        UUID notebook_id FK
        TEXT invite_code
        TIMESTAMPTZ expires_at
    }

    user_api_keys {
        UUID id PK
        TEXT user_id
        TEXT key_name
        TEXT key_value
    }

    notebook ||--|| notebook_settings : "1:1 config"
    notebook ||--o{ notebook_file_jobs : "has jobs"
    notebook ||--o{ document_records : "has files"
    notebook ||--o{ contextual_retrieval_table : "has chunks"
    notebook ||--o{ chat_conversations : "has conversations"
    notebook ||--o{ query_cache : "has cache"
    notebook ||--o{ notebook_access : "has access grants"
    notebook ||--o{ notebook_invites : "has invites"
    chat_conversations ||--o{ chat_history : "has messages"
    notebook_file_jobs ||--o{ contextual_retrieval_table : "produces chunks"

PostgreSQL Extensions

Extension Purpose
pgvector VECTOR type, HNSW index for semantic similarity search
pg_trgm Trigram-based fuzzy text matching
pg_net Asynchronous HTTP requests from within PostgreSQL

Key Indexes

  • HNSW vector index on query_cache.embedding (vector_cosine_ops) for semantic cache lookups
  • GIN index on documents.fts for full-text search
  • GIN index on documents.metadata for JSONB containment queries
  • Composite unique on query_cache(notebook_id, query_hash) for exact cache dedup
  • Composite unique on notebook_access(notebook_id, user_id)
  • Composite unique on user_api_keys(user_id, key_name)

Key Database Functions (30+)

Function Purpose
hybrid_search_v2 FTS + semantic + RRF fusion search
match_documents Pure semantic vector similarity search
match_documents_ollama Semantic search for Ollama embeddings
create_conversation Atomic conversation creation with validation
delete_conversation Cascade delete with message count reporting
rename_conversation Notebook-isolated conversation rename
toggle_conversation_pin Pin/unpin conversation
get_user_conversations Paginated conversation listing
update_conversation_stats Trigger: auto-update message_count on INSERT
delete_notebook Cascade delete all notebook data
delete_file_from_notebook Clean up file across 6 tables
submit_feedback Upsert feedback for a message
cleanup_expired_cache TTL-based cache eviction

Schema Migrations

The primary schema is at db/migrations/001_initial_schema.sql. Additional migrations:

File Purpose
001_initial_schema.sql Core schema (17+ tables, 30+ functions, 3 extensions)
002_notebook_access.sql Sharing and access control tables
009_create_user_api_keys_table.sql Per-user API key storage
010_search_playground_history.sql Search history

Local mode: start_services.py copies the schema to docker/supabase/init-scripts/99-schema.sql for auto-execution on first container startup.

Cloud mode: Apply migrations manually via the Supabase SQL Editor.