Skip to content

End-to-End Workflows

This guide walks through four complete workflows that demonstrate the core capabilities of Beyond Retrieval v2. Each workflow includes sequence diagrams and tabbed code examples in curl, Python, and JavaScript.

Prerequisites

All examples assume the API is running at http://localhost:8000. In bypass auth mode (default for development), no token is required. For production, include Authorization: Bearer <token> in all requests.


Workflow 1: Upload, Ingest, Chat, and Citations

The basic happy path: create a notebook, upload a document, ingest it, start a conversation, ask a question, and get a cited RAG response.

Sequence Diagram

sequenceDiagram
    participant Client
    participant API as FastAPI Backend
    participant Storage as Object Storage
    participant Parser as Docling / Mistral OCR
    participant Embedder as Embedding API
    participant DB as Supabase PostgreSQL
    participant LLM as LLM Provider

    Note over Client,LLM: Step 1 — Create Notebook
    Client->>API: POST /api/notebooks/
    API->>DB: Insert notebook + settings
    DB-->>API: notebook_id
    API-->>Client: 201 Created

    Note over Client,LLM: Step 2 — Upload Document
    Client->>API: POST /api/notebooks/{id}/documents/upload
    API->>Storage: Store file
    Storage-->>API: storage_path
    API-->>Client: file_id, file_name, storage_path

    Note over Client,LLM: Step 3 — Ingest Document
    Client->>API: POST /api/notebooks/{id}/documents/ingest
    API->>DB: Create job (status: pending)
    API-->>Client: job_id (immediate)
    API->>Parser: Parse document (background)
    Parser-->>API: Extracted text
    API->>API: Chunk text
    API->>Embedder: Generate embeddings
    Embedder-->>API: Vectors
    API->>DB: Insert chunks + vectors
    API->>DB: Update job (status: success)

    Note over Client,LLM: Step 4 — Poll Until Ready
    Client->>API: GET /api/notebooks/{id}/documents/{file_id}/stage
    API-->>Client: status: success

    Note over Client,LLM: Step 5 — Create Conversation
    Client->>API: POST /api/notebooks/{id}/conversations
    API->>DB: Insert conversation
    API-->>Client: conversation_id

    Note over Client,LLM: Step 6 — Ask a Question (RAG)
    Client->>API: POST /api/notebooks/{id}/conversations/{conv}/messages
    API->>DB: Check cache (MD5 + semantic)
    API->>DB: Hybrid search (FTS + semantic + RRF)
    DB-->>API: Ranked chunks
    API->>LLM: Generate answer with context
    LLM-->>API: Response with citations
    API->>DB: Save messages
    API-->>Client: assistant_message + citations

    Note over Client,LLM: Step 7 — Submit Feedback
    Client->>API: POST /api/notebooks/{id}/messages/{msg}/feedback
    API->>DB: Upsert feedback
    API-->>Client: 200 OK

Step 1: Create a Notebook

NOTEBOOK_ID=$(python3 -c "import uuid; print(uuid.uuid4())")

curl -s -X POST http://localhost:8000/api/notebooks/ \
  -H "Content-Type: application/json" \
  -d "{
    \"notebook_id\": \"$NOTEBOOK_ID\",
    \"notebook_title\": \"Customer Support KB\",
    \"embedding_model\": \"openai/text-embedding-3-small\",
    \"user_id\": \"dev-user\"
  }" | jq .
import httpx
import uuid

BASE = "http://localhost:8000/api"

notebook_id = str(uuid.uuid4())
resp = httpx.post(f"{BASE}/notebooks/", json={
    "notebook_id": notebook_id,
    "notebook_title": "Customer Support KB",
    "embedding_model": "openai/text-embedding-3-small",
    "user_id": "dev-user",
})
notebook = resp.json()["data"]
print(f"Created notebook: {notebook['notebook_id']}")
const BASE = "http://localhost:8000/api";

const notebookId = crypto.randomUUID();
const resp = await fetch(`${BASE}/notebooks/`, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    notebook_id: notebookId,
    notebook_title: "Customer Support KB",
    embedding_model: "openai/text-embedding-3-small",
    user_id: "dev-user",
  }),
});
const { data: notebook } = await resp.json();
console.log(`Created notebook: ${notebook.notebook_id}`);

Step 2: Upload a Document

UPLOAD=$(curl -s -X POST "http://localhost:8000/api/notebooks/$NOTEBOOK_ID/documents/upload" \
  -F "files=@handbook.pdf")
FILE_ID=$(echo "$UPLOAD" | jq -r '.data[0].file_id')
FILE_NAME=$(echo "$UPLOAD" | jq -r '.data[0].file_name')
FILE_PATH=$(echo "$UPLOAD" | jq -r '.data[0].storage_path')
echo "Uploaded: $FILE_NAME ($FILE_ID)"
with open("handbook.pdf", "rb") as f:
    resp = httpx.post(
        f"{BASE}/notebooks/{notebook_id}/documents/upload",
        files={"files": ("handbook.pdf", f, "application/pdf")},
    )
uploaded = resp.json()["data"][0]
file_id = uploaded["file_id"]
file_name = uploaded["file_name"]
file_path = uploaded["storage_path"]
print(f"Uploaded: {file_name} ({file_id})")
const formData = new FormData();
formData.append("files", fileBlob, "handbook.pdf");

const uploadResp = await fetch(
  `${BASE}/notebooks/${notebookId}/documents/upload`,
  { method: "POST", body: formData }
);
const { data: [uploaded] } = await uploadResp.json();
const { file_id: fileId, file_name: fileName, storage_path: filePath } = uploaded;
console.log(`Uploaded: ${fileName} (${fileId})`);

Step 3: Trigger Ingestion

curl -s -X POST "http://localhost:8000/api/notebooks/$NOTEBOOK_ID/documents/ingest" \
  -H "Content-Type: application/json" \
  -d "{
    \"files\": [{\"file_id\": \"$FILE_ID\", \"file_name\": \"$FILE_NAME\", \"file_path\": \"$FILE_PATH\"}],
    \"settings\": {
      \"parser\": \"Docling Parser\",
      \"chunking_strategy\": \"Recursive Chunking\",
      \"chunk_size\": 1000,
      \"chunk_overlap\": 200
    },
    \"notebook_name\": \"Customer Support KB\"
  }" | jq .
resp = httpx.post(
    f"{BASE}/notebooks/{notebook_id}/documents/ingest",
    json={
        "files": [{"file_id": file_id, "file_name": file_name, "file_path": file_path}],
        "settings": {
            "parser": "Docling Parser",
            "chunking_strategy": "Recursive Chunking",
            "chunk_size": 1000,
            "chunk_overlap": 200,
        },
        "notebook_name": "Customer Support KB",
    },
)
jobs = resp.json()["data"]["jobs"]
print(f"Started {len(jobs)} ingestion jobs")
const ingestResp = await fetch(
  `${BASE}/notebooks/${notebookId}/documents/ingest`,
  {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      files: [{ file_id: fileId, file_name: fileName, file_path: filePath }],
      settings: {
        parser: "Docling Parser",
        chunking_strategy: "Recursive Chunking",
        chunk_size: 1000,
        chunk_overlap: 200,
      },
      notebook_name: "Customer Support KB",
    }),
  }
);
const { data: { jobs } } = await ingestResp.json();
console.log(`Started ${jobs.length} ingestion jobs`);

Step 4: Poll Until Ingestion Completes

while true; do
  STATUS=$(curl -s "http://localhost:8000/api/notebooks/$NOTEBOOK_ID/documents/$FILE_ID/stage" \
    | jq -r '.data.status')
  echo "Status: $STATUS"
  [ "$STATUS" = "success" ] || [ "$STATUS" = "error" ] && break
  sleep 3
done
import time

while True:
    resp = httpx.get(f"{BASE}/notebooks/{notebook_id}/documents/{file_id}/stage")
    status = resp.json()["data"]["status"]
    print(f"Status: {status}")
    if status in ("success", "error"):
        break
    time.sleep(3)
let status;
do {
  const stageResp = await fetch(
    `${BASE}/notebooks/${notebookId}/documents/${fileId}/stage`
  );
  status = (await stageResp.json()).data.status;
  console.log(`Status: ${status}`);
  if (status !== "success" && status !== "error") {
    await new Promise((r) => setTimeout(r, 3000));
  }
} while (status !== "success" && status !== "error");

Step 5: Create a Conversation

CONV=$(curl -s -X POST "http://localhost:8000/api/notebooks/$NOTEBOOK_ID/conversations" \
  -H "Content-Type: application/json" \
  -d '{"title": "Support Questions", "chat_mode": "rag"}')
CONV_ID=$(echo "$CONV" | jq -r '.data.conversation_id')
echo "Conversation: $CONV_ID"
resp = httpx.post(
    f"{BASE}/notebooks/{notebook_id}/conversations",
    json={"title": "Support Questions", "chat_mode": "rag"},
)
conversation_id = resp.json()["data"]["conversation_id"]
print(f"Conversation: {conversation_id}")
const convResp = await fetch(
  `${BASE}/notebooks/${notebookId}/conversations`,
  {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({ title: "Support Questions", chat_mode: "rag" }),
  }
);
const conversationId = (await convResp.json()).data.conversation_id;
console.log(`Conversation: ${conversationId}`);

Step 6: Ask a Question

curl -s -X POST \
  "http://localhost:8000/api/notebooks/$NOTEBOOK_ID/conversations/$CONV_ID/messages" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "What is the refund policy?",
    "strategy_id": "fusion",
    "persona": "professional",
    "language": "en"
  }' | jq '{
    answer: .data.assistant_message.content,
    citations: [.data.assistant_message.citations[] | {id: .citation_id, file: .metadata.file_name}],
    cached: .data.assistant_message.run_metadata.is_cached,
    strategy: .data.assistant_message.run_metadata.strategy_id
  }'
resp = httpx.post(
    f"{BASE}/notebooks/{notebook_id}/conversations/{conversation_id}/messages",
    json={
        "content": "What is the refund policy?",
        "strategy_id": "fusion",
        "persona": "professional",
        "language": "en",
    },
    timeout=60.0,
)
result = resp.json()["data"]
assistant = result["assistant_message"]
message_id = assistant["id"]

print(f"Answer: {assistant['content'][:200]}...")
print(f"Strategy: {assistant['run_metadata']['strategy_id']}")
print(f"Cached: {assistant['run_metadata']['is_cached']}")
for cite in assistant["citations"]:
    print(f"  [{cite['citation_id']}] {cite['metadata'].get('file_name')} "
          f"(similarity: {cite.get('similarity', 'N/A')})")
const msgResp = await fetch(
  `${BASE}/notebooks/${notebookId}/conversations/${conversationId}/messages`,
  {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      content: "What is the refund policy?",
      strategy_id: "fusion",
      persona: "professional",
      language: "en",
    }),
  }
);
const { data: { assistant_message: assistant } } = await msgResp.json();
const messageId = assistant.id;

console.log(`Answer: ${assistant.content.slice(0, 200)}...`);
console.log(`Strategy: ${assistant.run_metadata.strategy_id}`);
console.log(`Cached: ${assistant.run_metadata.is_cached}`);
for (const cite of assistant.citations) {
  console.log(`  [${cite.citation_id}] ${cite.metadata?.file_name}`);
}

Step 7: Submit Feedback

MSG_ID="<message_id from step 6>"
curl -s -X POST "http://localhost:8000/api/notebooks/$NOTEBOOK_ID/messages/$MSG_ID/feedback" \
  -H "Content-Type: application/json" \
  -d '{"is_positive": true, "feedback_text": "Accurate answer with correct citations"}' \
  | jq .
resp = httpx.post(
    f"{BASE}/notebooks/{notebook_id}/messages/{message_id}/feedback",
    json={"is_positive": True, "feedback_text": "Accurate answer with correct citations"},
)
print(f"Feedback submitted: {resp.json()['data']}")
const fbResp = await fetch(
  `${BASE}/notebooks/${notebookId}/messages/${messageId}/feedback`,
  {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      is_positive: true,
      feedback_text: "Accurate answer with correct citations",
    }),
  }
);
console.log("Feedback submitted:", (await fbResp.json()).data);

Workflow 2: Enhance Chunks, Publish, and Verify

AI enhancement enriches document chunks with contextual descriptions, improving retrieval accuracy. This workflow enhances a file's chunks, publishes them to the vector store, and verifies the results with a search.

Sequence Diagram

sequenceDiagram
    participant Client
    participant API as FastAPI Backend
    participant LLM as LLM Provider
    participant Embedder as Embedding API
    participant DB as Supabase PostgreSQL

    Note over Client,DB: Step 1 — List Files for Enhancement
    Client->>API: GET /enhance/files
    API->>DB: Query contextual_retrieval_table
    API->>DB: Auto-populate new files
    API-->>Client: files with chunk counts

    Note over Client,DB: Step 2 — Start Enhancement
    Client->>API: POST /enhance {file_ids: [...]}
    API-->>Client: 200 OK (processing started)
    loop For each pending chunk (parallel)
        API->>LLM: Enhance chunk with context
        LLM-->>API: Enhanced text
        API->>DB: Update chunk status
    end

    Note over Client,DB: Step 3 — Poll Progress
    loop Every 4 seconds
        Client->>API: GET /enhance/status?file_id=...
        API-->>Client: progress_pct, all_terminated
    end

    Note over Client,DB: Step 4 — Publish to Vector Store
    Client->>API: POST /enhance/publish {file_id: ...}
    API->>DB: Delete old vectors for file
    API->>Embedder: Embed enhanced chunks
    Embedder-->>API: New vectors
    API->>DB: Insert enhanced vectors
    API->>DB: Mark chunks as "embedded"
    API-->>Client: published_count

    Note over Client,DB: Step 5 — Verify with Search
    Client->>API: POST /retrieval/retrieve
    API->>DB: Hybrid search (now with enhanced chunks)
    DB-->>API: Results
    API-->>Client: Ranked chunks

Step 1: List Files Available for Enhancement

curl -s "http://localhost:8000/api/notebooks/$NOTEBOOK_ID/enhance/files" | jq '.data[] | {
  file_id, file_name, total_chunks, pending, success, embedded
}'
resp = httpx.get(f"{BASE}/notebooks/{notebook_id}/enhance/files")
files = resp.json()["data"]
for f in files:
    print(f"{f['file_name']}: {f['total_chunks']} chunks "
          f"(pending={f['pending']}, success={f['success']}, embedded={f['embedded']})")
file_id = files[0]["file_id"]
file_name = files[0]["file_name"]
const filesResp = await fetch(
  `${BASE}/notebooks/${notebookId}/enhance/files`
);
const { data: enhanceFiles } = await filesResp.json();
for (const f of enhanceFiles) {
  console.log(`${f.file_name}: ${f.total_chunks} chunks ` +
    `(pending=${f.pending}, success=${f.success}, embedded=${f.embedded})`);
}
const targetFileId = enhanceFiles[0].file_id;
const targetFileName = enhanceFiles[0].file_name;

Step 2: Start Enhancement

curl -s -X POST "http://localhost:8000/api/notebooks/$NOTEBOOK_ID/enhance" \
  -H "Content-Type: application/json" \
  -d "{\"file_ids\": [\"$FILE_ID\"]}" | jq .
resp = httpx.post(
    f"{BASE}/notebooks/{notebook_id}/enhance",
    json={"file_ids": [file_id]},
)
print(f"Enhancement started: {resp.json()['data']}")
const enhanceResp = await fetch(`${BASE}/notebooks/${notebookId}/enhance`, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({ file_ids: [targetFileId] }),
});
console.log("Enhancement started:", (await enhanceResp.json()).data);

Step 3: Poll Until Enhancement Completes

while true; do
  RESULT=$(curl -s "http://localhost:8000/api/notebooks/$NOTEBOOK_ID/enhance/status?file_id=$FILE_ID")
  PROGRESS=$(echo "$RESULT" | jq '.data.progress_pct')
  DONE=$(echo "$RESULT" | jq '.data.all_terminated')
  echo "Progress: ${PROGRESS}%"
  [ "$DONE" = "true" ] && break
  sleep 4
done
echo "Enhancement complete!"
import time

while True:
    resp = httpx.get(
        f"{BASE}/notebooks/{notebook_id}/enhance/status",
        params={"file_id": file_id},
    )
    status = resp.json()["data"]
    print(f"Progress: {status['progress_pct']:.1f}% "
          f"(success={status['success']}, failed={status['failed']})")
    if status["all_terminated"]:
        break
    time.sleep(4)

print("Enhancement complete!")
let enhanceDone = false;
while (!enhanceDone) {
  const statusResp = await fetch(
    `${BASE}/notebooks/${notebookId}/enhance/status?file_id=${targetFileId}`
  );
  const enhStatus = (await statusResp.json()).data;
  console.log(`Progress: ${enhStatus.progress_pct.toFixed(1)}% ` +
    `(success=${enhStatus.success}, failed=${enhStatus.failed})`);
  enhanceDone = enhStatus.all_terminated;
  if (!enhanceDone) await new Promise((r) => setTimeout(r, 4000));
}
console.log("Enhancement complete!");

Step 4: Publish Enhanced Chunks

curl -s -X POST "http://localhost:8000/api/notebooks/$NOTEBOOK_ID/enhance/publish" \
  -H "Content-Type: application/json" \
  -d "{
    \"file_id\": \"$FILE_ID\",
    \"file_name\": \"$FILE_NAME\",
    \"notebook_title\": \"Customer Support KB\"
  }" | jq .
resp = httpx.post(
    f"{BASE}/notebooks/{notebook_id}/enhance/publish",
    json={
        "file_id": file_id,
        "file_name": file_name,
        "notebook_title": "Customer Support KB",
    },
    timeout=120.0,
)
result = resp.json()["data"]
print(f"Published {result['published_count']} enhanced chunks")
const publishResp = await fetch(
  `${BASE}/notebooks/${notebookId}/enhance/publish`,
  {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      file_id: targetFileId,
      file_name: targetFileName,
      notebook_title: "Customer Support KB",
    }),
  }
);
const publishResult = (await publishResp.json()).data;
console.log(`Published ${publishResult.published_count} enhanced chunks`);
curl -s -X POST "http://localhost:8000/api/notebooks/$NOTEBOOK_ID/retrieval/retrieve" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is the refund policy?",
    "strategy_id": "fusion",
    "top_k": 5
  }' | jq '.data | {
    total_results,
    execution_time_ms,
    chunks: [.chunks[] | {rank, score, content: (.content[:100] + "...")}]
  }'
resp = httpx.post(
    f"{BASE}/notebooks/{notebook_id}/retrieval/retrieve",
    json={
        "query": "What is the refund policy?",
        "strategy_id": "fusion",
        "top_k": 5,
    },
)
result = resp.json()["data"]
print(f"Found {result['total_results']} chunks in {result['execution_time_ms']}ms")
for chunk in result["chunks"]:
    print(f"  [{chunk['rank']}] score={chunk['score']:.3f}: {chunk['content'][:80]}...")
const searchResp = await fetch(
  `${BASE}/notebooks/${notebookId}/retrieval/retrieve`,
  {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      query: "What is the refund policy?",
      strategy_id: "fusion",
      top_k: 5,
    }),
  }
);
const searchResult = (await searchResp.json()).data;
console.log(`Found ${searchResult.total_results} chunks in ${searchResult.execution_time_ms}ms`);
for (const chunk of searchResult.chunks) {
  console.log(`  [${chunk.rank}] score=${chunk.score.toFixed(3)}: ${chunk.content.slice(0, 80)}...`);
}

Workflow 3: Share a Notebook and Collaborate

Create an invite link, share it with a colleague, and let them chat with the notebook's documents using their own conversations.

Sequence Diagram

sequenceDiagram
    participant Admin
    participant API as FastAPI Backend
    participant DB as Supabase PostgreSQL
    participant InvitedUser as Invited User

    Note over Admin,InvitedUser: Step 1 — Admin Creates Invite
    Admin->>API: POST /notebooks/{id}/invites
    API->>DB: Insert invite record
    API-->>Admin: invite_code, invite_url

    Note over Admin,InvitedUser: Step 2 — Admin Shares Link
    Admin->>InvitedUser: Share invite_url

    Note over Admin,InvitedUser: Step 3 — User Previews Invite
    InvitedUser->>API: GET /invite/{code}
    API->>DB: Look up invite
    API-->>InvitedUser: notebook_title, is_valid

    Note over Admin,InvitedUser: Step 4 — User Redeems Invite
    InvitedUser->>API: POST /invite/{code}/redeem
    API->>DB: Insert notebook_access (chat_only)
    API-->>InvitedUser: notebook_id, access_level

    Note over Admin,InvitedUser: Step 5 — User Creates Conversation
    InvitedUser->>API: POST /notebooks/{id}/conversations
    API-->>InvitedUser: conversation_id

    Note over Admin,InvitedUser: Step 6 — User Chats with Documents
    InvitedUser->>API: POST /notebooks/{id}/conversations/{conv}/messages
    API-->>InvitedUser: RAG response with citations

    Note over Admin,InvitedUser: Step 7 — Admin Reviews Access
    Admin->>API: GET /notebooks/{id}/access
    API-->>Admin: List of users with access levels
INVITE=$(curl -s -X POST "http://localhost:8000/api/notebooks/$NOTEBOOK_ID/invites" \
  -H "Content-Type: application/json" \
  -d '{"expires_at": "2026-04-01T00:00:00Z"}')
INVITE_CODE=$(echo "$INVITE" | jq -r '.data.invite_code')
INVITE_URL=$(echo "$INVITE" | jq -r '.data.invite_url')
echo "Share this link: $INVITE_URL"
resp = httpx.post(
    f"{BASE}/notebooks/{notebook_id}/invites",
    json={"expires_at": "2026-04-01T00:00:00Z"},
)
invite = resp.json()["data"]
invite_code = invite["invite_code"]
print(f"Share this link: {invite['invite_url']}")
const inviteResp = await fetch(
  `${BASE}/notebooks/${notebookId}/invites`,
  {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({ expires_at: "2026-04-01T00:00:00Z" }),
  }
);
const invite = (await inviteResp.json()).data;
console.log(`Share this link: ${invite.invite_url}`);

Step 2: Preview the Invite (as Invited User)

curl -s "http://localhost:8000/api/invite/$INVITE_CODE" | jq .
# { "data": { "notebook_title": "Customer Support KB", "is_valid": true } }
resp = httpx.get(f"{BASE}/invite/{invite_code}")
info = resp.json()["data"]
if info["is_valid"]:
    print(f"Invite for notebook: {info['notebook_title']}")
else:
    print(f"Invalid invite: {info['reason']}")
const previewResp = await fetch(`${BASE}/invite/${invite.invite_code}`);
const inviteInfo = (await previewResp.json()).data;
if (inviteInfo.is_valid) {
  console.log(`Invite for notebook: ${inviteInfo.notebook_title}`);
} else {
  console.log(`Invalid invite: ${inviteInfo.reason}`);
}

Step 3: Redeem the Invite

curl -s -X POST "http://localhost:8000/api/invite/$INVITE_CODE/redeem" | jq .
# { "data": { "notebook_id": "...", "notebook_title": "...", "access_level": "chat_only" } }
resp = httpx.post(f"{BASE}/invite/{invite_code}/redeem")
access = resp.json()["data"]
print(f"Gained {access['access_level']} access to: {access['notebook_title']}")
const redeemResp = await fetch(
  `${BASE}/invite/${invite.invite_code}/redeem`,
  { method: "POST" }
);
const access = (await redeemResp.json()).data;
console.log(`Gained ${access.access_level} access to: ${access.notebook_title}`);

Step 4: Chat as the Invited User

# Create conversation
CONV=$(curl -s -X POST "http://localhost:8000/api/notebooks/$NOTEBOOK_ID/conversations" \
  -H "Content-Type: application/json" \
  -d '{"title": "My Questions"}')
CONV_ID=$(echo "$CONV" | jq -r '.data.conversation_id')

# Ask a question
curl -s -X POST \
  "http://localhost:8000/api/notebooks/$NOTEBOOK_ID/conversations/$CONV_ID/messages" \
  -H "Content-Type: application/json" \
  -d '{"content": "How do I request a refund?", "persona": "clear", "language": "en"}' \
  | jq '.data.assistant_message.content'
# Create conversation as invited user
resp = httpx.post(
    f"{BASE}/notebooks/{notebook_id}/conversations",
    json={"title": "My Questions"},
)
conv_id = resp.json()["data"]["conversation_id"]

# Ask a question
resp = httpx.post(
    f"{BASE}/notebooks/{notebook_id}/conversations/{conv_id}/messages",
    json={"content": "How do I request a refund?", "persona": "clear", "language": "en"},
    timeout=60.0,
)
print(resp.json()["data"]["assistant_message"]["content"])
// Create conversation as invited user
const newConvResp = await fetch(
  `${BASE}/notebooks/${notebookId}/conversations`,
  {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({ title: "My Questions" }),
  }
);
const newConvId = (await newConvResp.json()).data.conversation_id;

// Ask a question
const chatResp = await fetch(
  `${BASE}/notebooks/${notebookId}/conversations/${newConvId}/messages`,
  {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      content: "How do I request a refund?",
      persona: "clear",
      language: "en",
    }),
  }
);
console.log((await chatResp.json()).data.assistant_message.content);

Step 5: Admin Reviews Access

curl -s "http://localhost:8000/api/notebooks/$NOTEBOOK_ID/access" \
  | jq '.data[] | {user_id, access_level, granted_at}'
resp = httpx.get(f"{BASE}/notebooks/{notebook_id}/access")
for record in resp.json()["data"]:
    print(f"  {record['user_id']}: {record['access_level']} "
          f"(granted {record['granted_at']})")
const accessResp = await fetch(`${BASE}/notebooks/${notebookId}/access`);
for (const record of (await accessResp.json()).data) {
  console.log(`  ${record.user_id}: ${record.access_level} (granted ${record.granted_at})`);
}

Workflow 4: Monitor Health, Cleanup, and Reingest

Maintain data quality by checking notebook health, cleaning up duplicates, reingesting problematic files, and verifying the health score improved.

Sequence Diagram

sequenceDiagram
    participant Client
    participant API as FastAPI Backend
    participant DB as Supabase PostgreSQL
    participant Parser as Docling / Mistral OCR
    participant Embedder as Embedding API

    Note over Client,Embedder: Step 1 — Run Health Check
    Client->>API: GET /notebooks/{id}/health
    API->>DB: Detect duplicates (content hash)
    API->>DB: Detect orphans (missing records)
    API->>DB: Count enhanced chunks
    API-->>Client: health_score, duplicates, orphans

    Note over Client,Embedder: Step 2 — Cleanup Duplicates
    Client->>API: POST /notebooks/{id}/health/cleanup
    API->>DB: Delete duplicate chunks (keep oldest)
    API-->>Client: removed_count, new_health_score

    Note over Client,Embedder: Step 3 — Check for Errored Files
    Client->>API: GET /notebooks/{id}/documents/errors
    API-->>Client: files with errors

    Note over Client,Embedder: Step 4 — Reingest Problematic Files
    Client->>API: POST /notebooks/{id}/documents/{file}/reingest
    API->>DB: Atomic cleanup (6 tables)
    API->>DB: Verify cleanup complete
    API->>Parser: Re-parse with new settings
    Parser-->>API: Extracted text
    API->>Embedder: Re-embed chunks
    Embedder-->>API: New vectors
    API->>DB: Insert new chunks + vectors
    API-->>Client: job_id, cleanup_summary

    Note over Client,Embedder: Step 5 — Verify Health Improved
    Client->>API: GET /notebooks/{id}/health
    API-->>Client: Updated health_score

Step 1: Run a Health Check

curl -s "http://localhost:8000/api/notebooks/$NOTEBOOK_ID/health" | jq '{
  health_score: .data.health_score,
  total_chunks: .data.total_chunks,
  duplicates: .data.duplicate_count,
  orphans: .data.orphaned_count,
  enhanced: .data.enhanced_count
}'
resp = httpx.get(f"{BASE}/notebooks/{notebook_id}/health")
health = resp.json()["data"]
print(f"Health Score: {health['health_score']}/100")
print(f"Total Chunks: {health['total_chunks']}")
print(f"Duplicates:   {health['duplicate_count']}")
print(f"Orphans:      {health['orphaned_count']}")
print(f"Enhanced:     {health['enhanced_count']}")

if health["duplicate_count"] > 0:
    print(f"\nDuplicate groups:")
    for group in health["duplicate_groups"]:
        print(f"  Hash {group['content_hash'][:16]}...: {group['count']} copies")
const healthResp = await fetch(`${BASE}/notebooks/${notebookId}/health`);
const health = (await healthResp.json()).data;
console.log(`Health Score: ${health.health_score}/100`);
console.log(`Total Chunks: ${health.total_chunks}`);
console.log(`Duplicates:   ${health.duplicate_count}`);
console.log(`Orphans:      ${health.orphaned_count}`);
console.log(`Enhanced:     ${health.enhanced_count}`);

if (health.duplicate_count > 0) {
  console.log("\nDuplicate groups:");
  for (const group of health.duplicate_groups) {
    console.log(`  Hash ${group.content_hash.slice(0, 16)}...: ${group.count} copies`);
  }
}

Step 2: Cleanup Duplicates

curl -s -X POST "http://localhost:8000/api/notebooks/$NOTEBOOK_ID/health/cleanup" | jq .
resp = httpx.post(f"{BASE}/notebooks/{notebook_id}/health/cleanup")
result = resp.json()["data"]
print(f"Removed {result['removed_count']} duplicate chunks")
print(f"New health score: {result['new_health_score']}/100")
print(f"Remaining chunks: {result['new_total_chunks']}")
const cleanupResp = await fetch(
  `${BASE}/notebooks/${notebookId}/health/cleanup`,
  { method: "POST" }
);
const cleanup = (await cleanupResp.json()).data;
console.log(`Removed ${cleanup.removed_count} duplicate chunks`);
console.log(`New health score: ${cleanup.new_health_score}/100`);
console.log(`Remaining chunks: ${cleanup.new_total_chunks}`);

Step 3: Check for Errored Files

curl -s "http://localhost:8000/api/notebooks/$NOTEBOOK_ID/documents/errors" \
  | jq '.data[] | {file_id, file_name, error_message, error_stage}'
resp = httpx.get(f"{BASE}/notebooks/{notebook_id}/documents/errors")
errors = resp.json()["data"]
if not errors:
    print("No errored files!")
else:
    for err in errors:
        print(f"  {err['file_name']}: {err['error_message']} (stage: {err['error_stage']})")
    error_file_id = errors[0]["file_id"]
const errorsResp = await fetch(
  `${BASE}/notebooks/${notebookId}/documents/errors`
);
const errors = (await errorsResp.json()).data;
if (errors.length === 0) {
  console.log("No errored files!");
} else {
  for (const err of errors) {
    console.log(`  ${err.file_name}: ${err.error_message} (stage: ${err.error_stage})`);
  }
}

Step 4: Reingest with Different Settings

ERROR_FILE_ID="<file_id from step 3>"
curl -s -X POST \
  "http://localhost:8000/api/notebooks/$NOTEBOOK_ID/documents/$ERROR_FILE_ID/reingest" \
  -H "Content-Type: application/json" \
  -d '{
    "settings": {
      "parser": "Mistral OCR",
      "chunking_strategy": "Recursive Chunking",
      "chunk_size": 800,
      "chunk_overlap": 150
    }
  }' | jq '{
    job_id: .data.job_id,
    status: .data.status,
    docs_cleaned: .data.cleanup_summary.documents_deleted,
    chunks_cleaned: .data.cleanup_summary.chunks_deleted
  }'
resp = httpx.post(
    f"{BASE}/notebooks/{notebook_id}/documents/{error_file_id}/reingest",
    json={
        "settings": {
            "parser": "Mistral OCR",
            "chunking_strategy": "Recursive Chunking",
            "chunk_size": 800,
            "chunk_overlap": 150,
        }
    },
)
result = resp.json()["data"]
print(f"Reingest job: {result['job_id']}")
print(f"Status: {result['status']}")
summary = result["cleanup_summary"]
print(f"Cleaned: {summary['documents_deleted']} docs, "
      f"{summary.get('chunks_deleted', 0)} chunks")
const reingestResp = await fetch(
  `${BASE}/notebooks/${notebookId}/documents/${errorFileId}/reingest`,
  {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      settings: {
        parser: "Mistral OCR",
        chunking_strategy: "Recursive Chunking",
        chunk_size: 800,
        chunk_overlap: 150,
      },
    }),
  }
);
const reingest = (await reingestResp.json()).data;
console.log(`Reingest job: ${reingest.job_id}`);
console.log(`Status: ${reingest.status}`);
console.log(`Cleaned: ${reingest.cleanup_summary.documents_deleted} docs`);

Batch Re-Ingestion

To reingest multiple files at once, use POST /documents/reingest-batch with an array of file_ids. Each file is processed independently -- one failure does not block others.

Step 5: Verify Health Improved

# Wait for reingestion to complete
sleep 10

curl -s "http://localhost:8000/api/notebooks/$NOTEBOOK_ID/health" \
  | jq '{health_score: .data.health_score, total_chunks: .data.total_chunks}'
import time

# Wait for reingestion to finish
while True:
    resp = httpx.get(f"{BASE}/notebooks/{notebook_id}/documents/{error_file_id}/stage")
    stage = resp.json()["data"]
    if stage["status"] in ("success", "error"):
        break
    time.sleep(3)

# Check health again
resp = httpx.get(f"{BASE}/notebooks/{notebook_id}/health")
new_health = resp.json()["data"]
print(f"Updated Health Score: {new_health['health_score']}/100")
print(f"Total Chunks: {new_health['total_chunks']}")
print(f"Duplicates: {new_health['duplicate_count']}")
// Wait for reingestion to finish
let reingestStatus;
do {
  const stageResp = await fetch(
    `${BASE}/notebooks/${notebookId}/documents/${errorFileId}/stage`
  );
  reingestStatus = (await stageResp.json()).data.status;
  if (reingestStatus !== "success" && reingestStatus !== "error") {
    await new Promise((r) => setTimeout(r, 3000));
  }
} while (reingestStatus !== "success" && reingestStatus !== "error");

// Check health again
const newHealthResp = await fetch(`${BASE}/notebooks/${notebookId}/health`);
const newHealth = (await newHealthResp.json()).data;
console.log(`Updated Health Score: ${newHealth.health_score}/100`);
console.log(`Total Chunks: ${newHealth.total_chunks}`);
console.log(`Duplicates: ${newHealth.duplicate_count}`);

Next Steps