End-to-End Workflows¶
This guide walks through four complete workflows that demonstrate the core capabilities of Beyond Retrieval v2. Each workflow includes sequence diagrams and tabbed code examples in curl, Python, and JavaScript.
Prerequisites
All examples assume the API is running at http://localhost:8000. In bypass auth mode (default for development), no token is required. For production, include Authorization: Bearer <token> in all requests.
Workflow 1: Upload, Ingest, Chat, and Citations¶
The basic happy path: create a notebook, upload a document, ingest it, start a conversation, ask a question, and get a cited RAG response.
Sequence Diagram¶
sequenceDiagram
participant Client
participant API as FastAPI Backend
participant Storage as Object Storage
participant Parser as Docling / Mistral OCR
participant Embedder as Embedding API
participant DB as Supabase PostgreSQL
participant LLM as LLM Provider
Note over Client,LLM: Step 1 — Create Notebook
Client->>API: POST /api/notebooks/
API->>DB: Insert notebook + settings
DB-->>API: notebook_id
API-->>Client: 201 Created
Note over Client,LLM: Step 2 — Upload Document
Client->>API: POST /api/notebooks/{id}/documents/upload
API->>Storage: Store file
Storage-->>API: storage_path
API-->>Client: file_id, file_name, storage_path
Note over Client,LLM: Step 3 — Ingest Document
Client->>API: POST /api/notebooks/{id}/documents/ingest
API->>DB: Create job (status: pending)
API-->>Client: job_id (immediate)
API->>Parser: Parse document (background)
Parser-->>API: Extracted text
API->>API: Chunk text
API->>Embedder: Generate embeddings
Embedder-->>API: Vectors
API->>DB: Insert chunks + vectors
API->>DB: Update job (status: success)
Note over Client,LLM: Step 4 — Poll Until Ready
Client->>API: GET /api/notebooks/{id}/documents/{file_id}/stage
API-->>Client: status: success
Note over Client,LLM: Step 5 — Create Conversation
Client->>API: POST /api/notebooks/{id}/conversations
API->>DB: Insert conversation
API-->>Client: conversation_id
Note over Client,LLM: Step 6 — Ask a Question (RAG)
Client->>API: POST /api/notebooks/{id}/conversations/{conv}/messages
API->>DB: Check cache (MD5 + semantic)
API->>DB: Hybrid search (FTS + semantic + RRF)
DB-->>API: Ranked chunks
API->>LLM: Generate answer with context
LLM-->>API: Response with citations
API->>DB: Save messages
API-->>Client: assistant_message + citations
Note over Client,LLM: Step 7 — Submit Feedback
Client->>API: POST /api/notebooks/{id}/messages/{msg}/feedback
API->>DB: Upsert feedback
API-->>Client: 200 OK Step 1: Create a Notebook¶
NOTEBOOK_ID=$(python3 -c "import uuid; print(uuid.uuid4())")
curl -s -X POST http://localhost:8000/api/notebooks/ \
-H "Content-Type: application/json" \
-d "{
\"notebook_id\": \"$NOTEBOOK_ID\",
\"notebook_title\": \"Customer Support KB\",
\"embedding_model\": \"openai/text-embedding-3-small\",
\"user_id\": \"dev-user\"
}" | jq .
import httpx
import uuid
BASE = "http://localhost:8000/api"
notebook_id = str(uuid.uuid4())
resp = httpx.post(f"{BASE}/notebooks/", json={
"notebook_id": notebook_id,
"notebook_title": "Customer Support KB",
"embedding_model": "openai/text-embedding-3-small",
"user_id": "dev-user",
})
notebook = resp.json()["data"]
print(f"Created notebook: {notebook['notebook_id']}")
const BASE = "http://localhost:8000/api";
const notebookId = crypto.randomUUID();
const resp = await fetch(`${BASE}/notebooks/`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
notebook_id: notebookId,
notebook_title: "Customer Support KB",
embedding_model: "openai/text-embedding-3-small",
user_id: "dev-user",
}),
});
const { data: notebook } = await resp.json();
console.log(`Created notebook: ${notebook.notebook_id}`);
Step 2: Upload a Document¶
UPLOAD=$(curl -s -X POST "http://localhost:8000/api/notebooks/$NOTEBOOK_ID/documents/upload" \
-F "files=@handbook.pdf")
FILE_ID=$(echo "$UPLOAD" | jq -r '.data[0].file_id')
FILE_NAME=$(echo "$UPLOAD" | jq -r '.data[0].file_name')
FILE_PATH=$(echo "$UPLOAD" | jq -r '.data[0].storage_path')
echo "Uploaded: $FILE_NAME ($FILE_ID)"
with open("handbook.pdf", "rb") as f:
resp = httpx.post(
f"{BASE}/notebooks/{notebook_id}/documents/upload",
files={"files": ("handbook.pdf", f, "application/pdf")},
)
uploaded = resp.json()["data"][0]
file_id = uploaded["file_id"]
file_name = uploaded["file_name"]
file_path = uploaded["storage_path"]
print(f"Uploaded: {file_name} ({file_id})")
const formData = new FormData();
formData.append("files", fileBlob, "handbook.pdf");
const uploadResp = await fetch(
`${BASE}/notebooks/${notebookId}/documents/upload`,
{ method: "POST", body: formData }
);
const { data: [uploaded] } = await uploadResp.json();
const { file_id: fileId, file_name: fileName, storage_path: filePath } = uploaded;
console.log(`Uploaded: ${fileName} (${fileId})`);
Step 3: Trigger Ingestion¶
curl -s -X POST "http://localhost:8000/api/notebooks/$NOTEBOOK_ID/documents/ingest" \
-H "Content-Type: application/json" \
-d "{
\"files\": [{\"file_id\": \"$FILE_ID\", \"file_name\": \"$FILE_NAME\", \"file_path\": \"$FILE_PATH\"}],
\"settings\": {
\"parser\": \"Docling Parser\",
\"chunking_strategy\": \"Recursive Chunking\",
\"chunk_size\": 1000,
\"chunk_overlap\": 200
},
\"notebook_name\": \"Customer Support KB\"
}" | jq .
resp = httpx.post(
f"{BASE}/notebooks/{notebook_id}/documents/ingest",
json={
"files": [{"file_id": file_id, "file_name": file_name, "file_path": file_path}],
"settings": {
"parser": "Docling Parser",
"chunking_strategy": "Recursive Chunking",
"chunk_size": 1000,
"chunk_overlap": 200,
},
"notebook_name": "Customer Support KB",
},
)
jobs = resp.json()["data"]["jobs"]
print(f"Started {len(jobs)} ingestion jobs")
const ingestResp = await fetch(
`${BASE}/notebooks/${notebookId}/documents/ingest`,
{
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
files: [{ file_id: fileId, file_name: fileName, file_path: filePath }],
settings: {
parser: "Docling Parser",
chunking_strategy: "Recursive Chunking",
chunk_size: 1000,
chunk_overlap: 200,
},
notebook_name: "Customer Support KB",
}),
}
);
const { data: { jobs } } = await ingestResp.json();
console.log(`Started ${jobs.length} ingestion jobs`);
Step 4: Poll Until Ingestion Completes¶
let status;
do {
const stageResp = await fetch(
`${BASE}/notebooks/${notebookId}/documents/${fileId}/stage`
);
status = (await stageResp.json()).data.status;
console.log(`Status: ${status}`);
if (status !== "success" && status !== "error") {
await new Promise((r) => setTimeout(r, 3000));
}
} while (status !== "success" && status !== "error");
Step 5: Create a Conversation¶
const convResp = await fetch(
`${BASE}/notebooks/${notebookId}/conversations`,
{
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ title: "Support Questions", chat_mode: "rag" }),
}
);
const conversationId = (await convResp.json()).data.conversation_id;
console.log(`Conversation: ${conversationId}`);
Step 6: Ask a Question¶
curl -s -X POST \
"http://localhost:8000/api/notebooks/$NOTEBOOK_ID/conversations/$CONV_ID/messages" \
-H "Content-Type: application/json" \
-d '{
"content": "What is the refund policy?",
"strategy_id": "fusion",
"persona": "professional",
"language": "en"
}' | jq '{
answer: .data.assistant_message.content,
citations: [.data.assistant_message.citations[] | {id: .citation_id, file: .metadata.file_name}],
cached: .data.assistant_message.run_metadata.is_cached,
strategy: .data.assistant_message.run_metadata.strategy_id
}'
resp = httpx.post(
f"{BASE}/notebooks/{notebook_id}/conversations/{conversation_id}/messages",
json={
"content": "What is the refund policy?",
"strategy_id": "fusion",
"persona": "professional",
"language": "en",
},
timeout=60.0,
)
result = resp.json()["data"]
assistant = result["assistant_message"]
message_id = assistant["id"]
print(f"Answer: {assistant['content'][:200]}...")
print(f"Strategy: {assistant['run_metadata']['strategy_id']}")
print(f"Cached: {assistant['run_metadata']['is_cached']}")
for cite in assistant["citations"]:
print(f" [{cite['citation_id']}] {cite['metadata'].get('file_name')} "
f"(similarity: {cite.get('similarity', 'N/A')})")
const msgResp = await fetch(
`${BASE}/notebooks/${notebookId}/conversations/${conversationId}/messages`,
{
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
content: "What is the refund policy?",
strategy_id: "fusion",
persona: "professional",
language: "en",
}),
}
);
const { data: { assistant_message: assistant } } = await msgResp.json();
const messageId = assistant.id;
console.log(`Answer: ${assistant.content.slice(0, 200)}...`);
console.log(`Strategy: ${assistant.run_metadata.strategy_id}`);
console.log(`Cached: ${assistant.run_metadata.is_cached}`);
for (const cite of assistant.citations) {
console.log(` [${cite.citation_id}] ${cite.metadata?.file_name}`);
}
Step 7: Submit Feedback¶
const fbResp = await fetch(
`${BASE}/notebooks/${notebookId}/messages/${messageId}/feedback`,
{
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
is_positive: true,
feedback_text: "Accurate answer with correct citations",
}),
}
);
console.log("Feedback submitted:", (await fbResp.json()).data);
Workflow 2: Enhance Chunks, Publish, and Verify¶
AI enhancement enriches document chunks with contextual descriptions, improving retrieval accuracy. This workflow enhances a file's chunks, publishes them to the vector store, and verifies the results with a search.
Sequence Diagram¶
sequenceDiagram
participant Client
participant API as FastAPI Backend
participant LLM as LLM Provider
participant Embedder as Embedding API
participant DB as Supabase PostgreSQL
Note over Client,DB: Step 1 — List Files for Enhancement
Client->>API: GET /enhance/files
API->>DB: Query contextual_retrieval_table
API->>DB: Auto-populate new files
API-->>Client: files with chunk counts
Note over Client,DB: Step 2 — Start Enhancement
Client->>API: POST /enhance {file_ids: [...]}
API-->>Client: 200 OK (processing started)
loop For each pending chunk (parallel)
API->>LLM: Enhance chunk with context
LLM-->>API: Enhanced text
API->>DB: Update chunk status
end
Note over Client,DB: Step 3 — Poll Progress
loop Every 4 seconds
Client->>API: GET /enhance/status?file_id=...
API-->>Client: progress_pct, all_terminated
end
Note over Client,DB: Step 4 — Publish to Vector Store
Client->>API: POST /enhance/publish {file_id: ...}
API->>DB: Delete old vectors for file
API->>Embedder: Embed enhanced chunks
Embedder-->>API: New vectors
API->>DB: Insert enhanced vectors
API->>DB: Mark chunks as "embedded"
API-->>Client: published_count
Note over Client,DB: Step 5 — Verify with Search
Client->>API: POST /retrieval/retrieve
API->>DB: Hybrid search (now with enhanced chunks)
DB-->>API: Results
API-->>Client: Ranked chunks Step 1: List Files Available for Enhancement¶
resp = httpx.get(f"{BASE}/notebooks/{notebook_id}/enhance/files")
files = resp.json()["data"]
for f in files:
print(f"{f['file_name']}: {f['total_chunks']} chunks "
f"(pending={f['pending']}, success={f['success']}, embedded={f['embedded']})")
file_id = files[0]["file_id"]
file_name = files[0]["file_name"]
const filesResp = await fetch(
`${BASE}/notebooks/${notebookId}/enhance/files`
);
const { data: enhanceFiles } = await filesResp.json();
for (const f of enhanceFiles) {
console.log(`${f.file_name}: ${f.total_chunks} chunks ` +
`(pending=${f.pending}, success=${f.success}, embedded=${f.embedded})`);
}
const targetFileId = enhanceFiles[0].file_id;
const targetFileName = enhanceFiles[0].file_name;
Step 2: Start Enhancement¶
Step 3: Poll Until Enhancement Completes¶
while true; do
RESULT=$(curl -s "http://localhost:8000/api/notebooks/$NOTEBOOK_ID/enhance/status?file_id=$FILE_ID")
PROGRESS=$(echo "$RESULT" | jq '.data.progress_pct')
DONE=$(echo "$RESULT" | jq '.data.all_terminated')
echo "Progress: ${PROGRESS}%"
[ "$DONE" = "true" ] && break
sleep 4
done
echo "Enhancement complete!"
import time
while True:
resp = httpx.get(
f"{BASE}/notebooks/{notebook_id}/enhance/status",
params={"file_id": file_id},
)
status = resp.json()["data"]
print(f"Progress: {status['progress_pct']:.1f}% "
f"(success={status['success']}, failed={status['failed']})")
if status["all_terminated"]:
break
time.sleep(4)
print("Enhancement complete!")
let enhanceDone = false;
while (!enhanceDone) {
const statusResp = await fetch(
`${BASE}/notebooks/${notebookId}/enhance/status?file_id=${targetFileId}`
);
const enhStatus = (await statusResp.json()).data;
console.log(`Progress: ${enhStatus.progress_pct.toFixed(1)}% ` +
`(success=${enhStatus.success}, failed=${enhStatus.failed})`);
enhanceDone = enhStatus.all_terminated;
if (!enhanceDone) await new Promise((r) => setTimeout(r, 4000));
}
console.log("Enhancement complete!");
Step 4: Publish Enhanced Chunks¶
const publishResp = await fetch(
`${BASE}/notebooks/${notebookId}/enhance/publish`,
{
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
file_id: targetFileId,
file_name: targetFileName,
notebook_title: "Customer Support KB",
}),
}
);
const publishResult = (await publishResp.json()).data;
console.log(`Published ${publishResult.published_count} enhanced chunks`);
Step 5: Verify with a Search¶
curl -s -X POST "http://localhost:8000/api/notebooks/$NOTEBOOK_ID/retrieval/retrieve" \
-H "Content-Type: application/json" \
-d '{
"query": "What is the refund policy?",
"strategy_id": "fusion",
"top_k": 5
}' | jq '.data | {
total_results,
execution_time_ms,
chunks: [.chunks[] | {rank, score, content: (.content[:100] + "...")}]
}'
resp = httpx.post(
f"{BASE}/notebooks/{notebook_id}/retrieval/retrieve",
json={
"query": "What is the refund policy?",
"strategy_id": "fusion",
"top_k": 5,
},
)
result = resp.json()["data"]
print(f"Found {result['total_results']} chunks in {result['execution_time_ms']}ms")
for chunk in result["chunks"]:
print(f" [{chunk['rank']}] score={chunk['score']:.3f}: {chunk['content'][:80]}...")
const searchResp = await fetch(
`${BASE}/notebooks/${notebookId}/retrieval/retrieve`,
{
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
query: "What is the refund policy?",
strategy_id: "fusion",
top_k: 5,
}),
}
);
const searchResult = (await searchResp.json()).data;
console.log(`Found ${searchResult.total_results} chunks in ${searchResult.execution_time_ms}ms`);
for (const chunk of searchResult.chunks) {
console.log(` [${chunk.rank}] score=${chunk.score.toFixed(3)}: ${chunk.content.slice(0, 80)}...`);
}
Workflow 3: Share a Notebook and Collaborate¶
Create an invite link, share it with a colleague, and let them chat with the notebook's documents using their own conversations.
Sequence Diagram¶
sequenceDiagram
participant Admin
participant API as FastAPI Backend
participant DB as Supabase PostgreSQL
participant InvitedUser as Invited User
Note over Admin,InvitedUser: Step 1 — Admin Creates Invite
Admin->>API: POST /notebooks/{id}/invites
API->>DB: Insert invite record
API-->>Admin: invite_code, invite_url
Note over Admin,InvitedUser: Step 2 — Admin Shares Link
Admin->>InvitedUser: Share invite_url
Note over Admin,InvitedUser: Step 3 — User Previews Invite
InvitedUser->>API: GET /invite/{code}
API->>DB: Look up invite
API-->>InvitedUser: notebook_title, is_valid
Note over Admin,InvitedUser: Step 4 — User Redeems Invite
InvitedUser->>API: POST /invite/{code}/redeem
API->>DB: Insert notebook_access (chat_only)
API-->>InvitedUser: notebook_id, access_level
Note over Admin,InvitedUser: Step 5 — User Creates Conversation
InvitedUser->>API: POST /notebooks/{id}/conversations
API-->>InvitedUser: conversation_id
Note over Admin,InvitedUser: Step 6 — User Chats with Documents
InvitedUser->>API: POST /notebooks/{id}/conversations/{conv}/messages
API-->>InvitedUser: RAG response with citations
Note over Admin,InvitedUser: Step 7 — Admin Reviews Access
Admin->>API: GET /notebooks/{id}/access
API-->>Admin: List of users with access levels Step 1: Create an Invite Link¶
INVITE=$(curl -s -X POST "http://localhost:8000/api/notebooks/$NOTEBOOK_ID/invites" \
-H "Content-Type: application/json" \
-d '{"expires_at": "2026-04-01T00:00:00Z"}')
INVITE_CODE=$(echo "$INVITE" | jq -r '.data.invite_code')
INVITE_URL=$(echo "$INVITE" | jq -r '.data.invite_url')
echo "Share this link: $INVITE_URL"
const inviteResp = await fetch(
`${BASE}/notebooks/${notebookId}/invites`,
{
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ expires_at: "2026-04-01T00:00:00Z" }),
}
);
const invite = (await inviteResp.json()).data;
console.log(`Share this link: ${invite.invite_url}`);
Step 2: Preview the Invite (as Invited User)¶
Step 3: Redeem the Invite¶
Step 4: Chat as the Invited User¶
# Create conversation
CONV=$(curl -s -X POST "http://localhost:8000/api/notebooks/$NOTEBOOK_ID/conversations" \
-H "Content-Type: application/json" \
-d '{"title": "My Questions"}')
CONV_ID=$(echo "$CONV" | jq -r '.data.conversation_id')
# Ask a question
curl -s -X POST \
"http://localhost:8000/api/notebooks/$NOTEBOOK_ID/conversations/$CONV_ID/messages" \
-H "Content-Type: application/json" \
-d '{"content": "How do I request a refund?", "persona": "clear", "language": "en"}' \
| jq '.data.assistant_message.content'
# Create conversation as invited user
resp = httpx.post(
f"{BASE}/notebooks/{notebook_id}/conversations",
json={"title": "My Questions"},
)
conv_id = resp.json()["data"]["conversation_id"]
# Ask a question
resp = httpx.post(
f"{BASE}/notebooks/{notebook_id}/conversations/{conv_id}/messages",
json={"content": "How do I request a refund?", "persona": "clear", "language": "en"},
timeout=60.0,
)
print(resp.json()["data"]["assistant_message"]["content"])
// Create conversation as invited user
const newConvResp = await fetch(
`${BASE}/notebooks/${notebookId}/conversations`,
{
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ title: "My Questions" }),
}
);
const newConvId = (await newConvResp.json()).data.conversation_id;
// Ask a question
const chatResp = await fetch(
`${BASE}/notebooks/${notebookId}/conversations/${newConvId}/messages`,
{
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
content: "How do I request a refund?",
persona: "clear",
language: "en",
}),
}
);
console.log((await chatResp.json()).data.assistant_message.content);
Step 5: Admin Reviews Access¶
Workflow 4: Monitor Health, Cleanup, and Reingest¶
Maintain data quality by checking notebook health, cleaning up duplicates, reingesting problematic files, and verifying the health score improved.
Sequence Diagram¶
sequenceDiagram
participant Client
participant API as FastAPI Backend
participant DB as Supabase PostgreSQL
participant Parser as Docling / Mistral OCR
participant Embedder as Embedding API
Note over Client,Embedder: Step 1 — Run Health Check
Client->>API: GET /notebooks/{id}/health
API->>DB: Detect duplicates (content hash)
API->>DB: Detect orphans (missing records)
API->>DB: Count enhanced chunks
API-->>Client: health_score, duplicates, orphans
Note over Client,Embedder: Step 2 — Cleanup Duplicates
Client->>API: POST /notebooks/{id}/health/cleanup
API->>DB: Delete duplicate chunks (keep oldest)
API-->>Client: removed_count, new_health_score
Note over Client,Embedder: Step 3 — Check for Errored Files
Client->>API: GET /notebooks/{id}/documents/errors
API-->>Client: files with errors
Note over Client,Embedder: Step 4 — Reingest Problematic Files
Client->>API: POST /notebooks/{id}/documents/{file}/reingest
API->>DB: Atomic cleanup (6 tables)
API->>DB: Verify cleanup complete
API->>Parser: Re-parse with new settings
Parser-->>API: Extracted text
API->>Embedder: Re-embed chunks
Embedder-->>API: New vectors
API->>DB: Insert new chunks + vectors
API-->>Client: job_id, cleanup_summary
Note over Client,Embedder: Step 5 — Verify Health Improved
Client->>API: GET /notebooks/{id}/health
API-->>Client: Updated health_score Step 1: Run a Health Check¶
resp = httpx.get(f"{BASE}/notebooks/{notebook_id}/health")
health = resp.json()["data"]
print(f"Health Score: {health['health_score']}/100")
print(f"Total Chunks: {health['total_chunks']}")
print(f"Duplicates: {health['duplicate_count']}")
print(f"Orphans: {health['orphaned_count']}")
print(f"Enhanced: {health['enhanced_count']}")
if health["duplicate_count"] > 0:
print(f"\nDuplicate groups:")
for group in health["duplicate_groups"]:
print(f" Hash {group['content_hash'][:16]}...: {group['count']} copies")
const healthResp = await fetch(`${BASE}/notebooks/${notebookId}/health`);
const health = (await healthResp.json()).data;
console.log(`Health Score: ${health.health_score}/100`);
console.log(`Total Chunks: ${health.total_chunks}`);
console.log(`Duplicates: ${health.duplicate_count}`);
console.log(`Orphans: ${health.orphaned_count}`);
console.log(`Enhanced: ${health.enhanced_count}`);
if (health.duplicate_count > 0) {
console.log("\nDuplicate groups:");
for (const group of health.duplicate_groups) {
console.log(` Hash ${group.content_hash.slice(0, 16)}...: ${group.count} copies`);
}
}
Step 2: Cleanup Duplicates¶
const cleanupResp = await fetch(
`${BASE}/notebooks/${notebookId}/health/cleanup`,
{ method: "POST" }
);
const cleanup = (await cleanupResp.json()).data;
console.log(`Removed ${cleanup.removed_count} duplicate chunks`);
console.log(`New health score: ${cleanup.new_health_score}/100`);
console.log(`Remaining chunks: ${cleanup.new_total_chunks}`);
Step 3: Check for Errored Files¶
const errorsResp = await fetch(
`${BASE}/notebooks/${notebookId}/documents/errors`
);
const errors = (await errorsResp.json()).data;
if (errors.length === 0) {
console.log("No errored files!");
} else {
for (const err of errors) {
console.log(` ${err.file_name}: ${err.error_message} (stage: ${err.error_stage})`);
}
}
Step 4: Reingest with Different Settings¶
ERROR_FILE_ID="<file_id from step 3>"
curl -s -X POST \
"http://localhost:8000/api/notebooks/$NOTEBOOK_ID/documents/$ERROR_FILE_ID/reingest" \
-H "Content-Type: application/json" \
-d '{
"settings": {
"parser": "Mistral OCR",
"chunking_strategy": "Recursive Chunking",
"chunk_size": 800,
"chunk_overlap": 150
}
}' | jq '{
job_id: .data.job_id,
status: .data.status,
docs_cleaned: .data.cleanup_summary.documents_deleted,
chunks_cleaned: .data.cleanup_summary.chunks_deleted
}'
resp = httpx.post(
f"{BASE}/notebooks/{notebook_id}/documents/{error_file_id}/reingest",
json={
"settings": {
"parser": "Mistral OCR",
"chunking_strategy": "Recursive Chunking",
"chunk_size": 800,
"chunk_overlap": 150,
}
},
)
result = resp.json()["data"]
print(f"Reingest job: {result['job_id']}")
print(f"Status: {result['status']}")
summary = result["cleanup_summary"]
print(f"Cleaned: {summary['documents_deleted']} docs, "
f"{summary.get('chunks_deleted', 0)} chunks")
const reingestResp = await fetch(
`${BASE}/notebooks/${notebookId}/documents/${errorFileId}/reingest`,
{
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
settings: {
parser: "Mistral OCR",
chunking_strategy: "Recursive Chunking",
chunk_size: 800,
chunk_overlap: 150,
},
}),
}
);
const reingest = (await reingestResp.json()).data;
console.log(`Reingest job: ${reingest.job_id}`);
console.log(`Status: ${reingest.status}`);
console.log(`Cleaned: ${reingest.cleanup_summary.documents_deleted} docs`);
Batch Re-Ingestion
To reingest multiple files at once, use POST /documents/reingest-batch with an array of file_ids. Each file is processed independently -- one failure does not block others.
Step 5: Verify Health Improved¶
import time
# Wait for reingestion to finish
while True:
resp = httpx.get(f"{BASE}/notebooks/{notebook_id}/documents/{error_file_id}/stage")
stage = resp.json()["data"]
if stage["status"] in ("success", "error"):
break
time.sleep(3)
# Check health again
resp = httpx.get(f"{BASE}/notebooks/{notebook_id}/health")
new_health = resp.json()["data"]
print(f"Updated Health Score: {new_health['health_score']}/100")
print(f"Total Chunks: {new_health['total_chunks']}")
print(f"Duplicates: {new_health['duplicate_count']}")
// Wait for reingestion to finish
let reingestStatus;
do {
const stageResp = await fetch(
`${BASE}/notebooks/${notebookId}/documents/${errorFileId}/stage`
);
reingestStatus = (await stageResp.json()).data.status;
if (reingestStatus !== "success" && reingestStatus !== "error") {
await new Promise((r) => setTimeout(r, 3000));
}
} while (reingestStatus !== "success" && reingestStatus !== "error");
// Check health again
const newHealthResp = await fetch(`${BASE}/notebooks/${notebookId}/health`);
const newHealth = (await newHealthResp.json()).data;
console.log(`Updated Health Score: ${newHealth.health_score}/100`);
console.log(`Total Chunks: ${newHealth.total_chunks}`);
console.log(`Duplicates: ${newHealth.duplicate_count}`);
Next Steps¶
- API Overview -- Full endpoint reference for all 78+ endpoints
- Features Overview -- Feature-by-feature walkthrough of the platform
- Ingestion Pipeline -- Deep dive into document processing stages
- User Guide -- Step-by-step guide for the web UI