Skip to content

Documents

Documents are files uploaded to a notebook for parsing, chunking, and vector embedding. The ingestion pipeline runs as a background task after upload.

Base path: /api/notebooks/{notebook_id}/documents

Supported file types: PDF, DOCX, DOC, MD, TXT, CSV, XLSX, XLS


POST /api/notebooks/{notebook_id}/documents/upload

Upload one or more files to storage. Each file is stored at documents/{notebook_id}/{file_id}/{safe_name}. Returns metadata ready to be passed to the /ingest endpoint.

Auth: Admin

Headers:

Header Value
Authorization Bearer <token>
Content-Type multipart/form-data

Body:

Multipart form data with field name files containing one or more files.

Status: 200 OK

{
  "success": true,
  "data": [
    {
      "file_id": "f1a2b3c4-...",
      "file_name": "handbook.pdf",
      "file_type": "PDF",
      "storage_path": "notebook-id/f1a2b3c4/handbook.pdf",
      "size": 245760
    }
  ]
}
Code Cause
400 No files provided
401 Invalid or missing token
403 Non-admin user
502 Storage upload failed
curl -X POST http://localhost:8000/api/notebooks/$NOTEBOOK_ID/documents/upload \
  -H "Authorization: Bearer $TOKEN" \
  -F "files=@handbook.pdf" \
  -F "files=@faq.docx"
import httpx

notebook_id = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
with open("handbook.pdf", "rb") as f:
    response = httpx.post(
        f"http://localhost:8000/api/notebooks/{notebook_id}/documents/upload",
        headers={"Authorization": f"Bearer {token}"},
        files={"files": ("handbook.pdf", f, "application/pdf")},
    )
uploaded = response.json()["data"]
print(f"Uploaded {len(uploaded)} files")

GET /api/notebooks/{notebook_id}/documents/sources

List available files from storage for this notebook.

Auth: Admin

Headers:

Header Value
Authorization Bearer <token>

Status: 200 OK

{
  "success": true,
  "data": [
    {
      "file_id": "f1a2b3c4-...",
      "file_name": "handbook.pdf",
      "file_path": "notebook-id/f1a2b3c4/handbook.pdf",
      "size": 245760
    }
  ]
}
Code Cause
401 Invalid or missing token
403 Non-admin user
curl http://localhost:8000/api/notebooks/$NOTEBOOK_ID/documents/sources \
  -H "Authorization: Bearer $TOKEN"
import httpx

notebook_id = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
response = httpx.get(
    f"http://localhost:8000/api/notebooks/{notebook_id}/documents/sources",
    headers={"Authorization": f"Bearer {token}"},
)
print(response.json()["data"])

POST /api/notebooks/{notebook_id}/documents/ingest

Trigger the ingestion pipeline for uploaded files. Creates job records synchronously, then runs parsing, chunking, and embedding as a background task.

Auth: Admin

Headers:

Header Value
Authorization Bearer <token>
Content-Type application/json

Body:

{
  "files": [
    {
      "file_id": "f1a2b3c4-...",
      "file_name": "handbook.pdf",
      "file_path": "notebook-id/f1a2b3c4/handbook.pdf"
    }
  ],
  "settings": {
    "parser": "Docling Parser",
    "chunking_strategy": "Recursive Chunking",
    "chunk_size": 1000,
    "chunk_overlap": 200,
    "enable_contextual_retrieval": false,
    "enable_multimodal_processing": false
  },
  "notebook_name": "Customer Support KB",
  "inference_provider": "openrouter",
  "inference_model": "openai/gpt-4o-mini",
  "inference_temperature": 0.4
}
Field Type Required Default Description
files array Yes -- Files to ingest (from upload response)
files[].file_id string Yes -- File UUID
files[].file_name string Yes -- Original file name
files[].file_path string Yes -- Storage path
settings object No defaults Ingestion configuration
settings.parser string No "Docling Parser" "Docling Parser" or "Mistral OCR"
settings.chunking_strategy string No "Recursive Chunking" "Recursive Chunking" or "Agentic Chunking"
settings.chunk_size integer No 1000 Target chunk size in characters
settings.chunk_overlap integer No 200 Overlap between chunks
settings.enable_contextual_retrieval boolean No false Enable context augmentation
settings.enable_multimodal_processing boolean No false Enable image description
notebook_name string No -- Notebook title (for metadata)
inference_provider string No -- LLM provider for context augmentation
inference_model string No -- LLM model for context augmentation
inference_temperature float No -- Temperature for context augmentation

Status: 200 OK

{
  "success": true,
  "data": {
    "jobs": [
      {
        "job_id": "j1a2b3c4-...",
        "file_id": "f1a2b3c4-..."
      }
    ]
  }
}
Code Cause
400 No files provided
401 Invalid or missing token
403 Non-admin user
curl -X POST http://localhost:8000/api/notebooks/$NOTEBOOK_ID/documents/ingest \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "files": [{"file_id": "f1a2b3c4", "file_name": "handbook.pdf", "file_path": "nb/f1a2b3c4/handbook.pdf"}],
    "settings": {"parser": "Docling Parser", "chunking_strategy": "Recursive Chunking"},
    "notebook_name": "Customer Support KB"
  }'
import httpx

notebook_id = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
response = httpx.post(
    f"http://localhost:8000/api/notebooks/{notebook_id}/documents/ingest",
    headers={"Authorization": f"Bearer {token}"},
    json={
        "files": [
            {
                "file_id": "f1a2b3c4",
                "file_name": "handbook.pdf",
                "file_path": "nb/f1a2b3c4/handbook.pdf",
            }
        ],
        "settings": {
            "parser": "Docling Parser",
            "chunking_strategy": "Recursive Chunking",
        },
        "notebook_name": "Customer Support KB",
    },
)
jobs = response.json()["data"]["jobs"]
print(f"Started {len(jobs)} ingestion jobs")

POST /api/notebooks/{notebook_id}/documents/{file_id}/reingest

Re-ingest a document: performs atomic cleanup of all old data, then re-runs the pipeline. Rejects files that are currently being processed.

Auth: Admin

Headers:

Header Value
Authorization Bearer <token>
Content-Type application/json

Body:

{
  "settings": {
    "parser": "Mistral OCR",
    "chunking_strategy": "Recursive Chunking",
    "chunk_size": 800
  }
}
Field Type Required Default Description
settings object No -- New ingestion settings (same schema as ingest)

Status: 200 OK

{
  "success": true,
  "data": {
    "job_id": "j1a2b3c4-...",
    "status": "reprocessing",
    "cleanup_summary": {
      "documents_deleted": 45,
      "records_deleted": 1,
      "chunks_deleted": 12,
      "raw_data_deleted": 1,
      "cache_deleted": 3,
      "jobs_deleted": 1
    }
  }
}
Code Cause
401 Invalid or missing token
403 Non-admin user
409 File currently processing, or storage provider is "none"
curl -X POST http://localhost:8000/api/notebooks/$NOTEBOOK_ID/documents/$FILE_ID/reingest \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"settings": {"parser": "Mistral OCR"}}'
import httpx

notebook_id = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
file_id = "f1a2b3c4-..."
response = httpx.post(
    f"http://localhost:8000/api/notebooks/{notebook_id}/documents/{file_id}/reingest",
    headers={"Authorization": f"Bearer {token}"},
    json={"settings": {"parser": "Mistral OCR"}},
)
print(response.json()["data"])

POST /api/notebooks/{notebook_id}/documents/reingest-batch

Batch re-ingest multiple files. Each file is cleaned up and re-ingested independently -- one file's failure does not block others.

Auth: Admin

Headers:

Header Value
Authorization Bearer <token>
Content-Type application/json

Body:

{
  "file_ids": ["f1a2b3c4-...", "f5e6d7c8-..."],
  "settings": {
    "parser": "Docling Parser"
  }
}
Field Type Required Default Description
file_ids array Yes -- List of file IDs to re-ingest
settings object No -- Ingestion settings (applied to all)

Status: 200 OK

{
  "success": true,
  "data": {
    "results": [
      {
        "file_id": "f1a2b3c4-...",
        "job_id": "j1a2b3c4-...",
        "status": "reprocessing",
        "cleanup_summary": { "documents_deleted": 45 }
      },
      {
        "file_id": "f5e6d7c8-...",
        "status": "failed",
        "error": "File is currently being processed"
      }
    ],
    "total": 2,
    "succeeded": 1,
    "failed": 1
  }
}
Code Cause
400 No file_ids provided
401 Invalid or missing token
403 Non-admin user
409 Storage provider is "none"
curl -X POST http://localhost:8000/api/notebooks/$NOTEBOOK_ID/documents/reingest-batch \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"file_ids": ["f1a2b3c4", "f5e6d7c8"], "settings": {"parser": "Docling Parser"}}'
import httpx

notebook_id = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
response = httpx.post(
    f"http://localhost:8000/api/notebooks/{notebook_id}/documents/reingest-batch",
    headers={"Authorization": f"Bearer {token}"},
    json={
        "file_ids": ["f1a2b3c4", "f5e6d7c8"],
        "settings": {"parser": "Docling Parser"},
    },
)
result = response.json()["data"]
print(f"Succeeded: {result['succeeded']}, Failed: {result['failed']}")

GET /api/notebooks/{notebook_id}/documents/

List all documents with their current status.

Auth: Admin

Headers:

Header Value
Authorization Bearer <token>

Status: 200 OK

{
  "success": true,
  "data": [
    {
      "file_id": "f1a2b3c4-...",
      "file_name": "handbook.pdf",
      "status": "success",
      "chunk_count": 45,
      "created_at": "2026-02-25T10:00:00Z"
    }
  ]
}
Code Cause
401 Invalid or missing token
403 Non-admin user
curl http://localhost:8000/api/notebooks/$NOTEBOOK_ID/documents/ \
  -H "Authorization: Bearer $TOKEN"
import httpx

notebook_id = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
response = httpx.get(
    f"http://localhost:8000/api/notebooks/{notebook_id}/documents/",
    headers={"Authorization": f"Bearer {token}"},
)
docs = response.json()["data"]
for doc in docs:
    print(f"{doc['file_name']}: {doc['status']}")

DELETE /api/notebooks/{notebook_id}/documents/{file_id}

Delete a document and all related data (vectors, records, enhanced chunks, raw data, cache entries).

Auth: Admin

Headers:

Header Value
Authorization Bearer <token>

Status: 200 OK

{
  "success": true,
  "data": {
    "success": true
  }
}
Code Cause
401 Invalid or missing token
403 Non-admin user
curl -X DELETE http://localhost:8000/api/notebooks/$NOTEBOOK_ID/documents/$FILE_ID \
  -H "Authorization: Bearer $TOKEN"
import httpx

notebook_id = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
file_id = "f1a2b3c4-..."
response = httpx.delete(
    f"http://localhost:8000/api/notebooks/{notebook_id}/documents/{file_id}",
    headers={"Authorization": f"Bearer {token}"},
)
print(response.json()["data"])

POST /api/notebooks/{notebook_id}/documents/delete-batch

Delete multiple documents in a single request.

Auth: Admin

Headers:

Header Value
Authorization Bearer <token>
Content-Type application/json

Body:

{
  "file_ids": ["f1a2b3c4-...", "f5e6d7c8-..."]
}
Field Type Required Default Description
file_ids array Yes -- List of file IDs to delete

Status: 200 OK

{
  "success": true,
  "data": {
    "success": true,
    "deleted": 2
  }
}
Code Cause
400 No file_ids provided
401 Invalid or missing token
403 Non-admin user
curl -X POST http://localhost:8000/api/notebooks/$NOTEBOOK_ID/documents/delete-batch \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"file_ids": ["f1a2b3c4", "f5e6d7c8"]}'
import httpx

notebook_id = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
response = httpx.post(
    f"http://localhost:8000/api/notebooks/{notebook_id}/documents/delete-batch",
    headers={"Authorization": f"Bearer {token}"},
    json={"file_ids": ["f1a2b3c4", "f5e6d7c8"]},
)
print(f"Deleted: {response.json()['data']['deleted']}")

GET /api/notebooks/{notebook_id}/documents/settings

Get the ingestion settings used for a specific file.

Auth: Admin

Headers:

Header Value
Authorization Bearer <token>

Query Parameters:

Parameter Type Required Description
file_id string Yes File ID to get settings for

Status: 200 OK

{
  "success": true,
  "data": {
    "parser": "Docling Parser",
    "chunking_strategy": "Recursive Chunking",
    "chunk_size": 1000,
    "chunk_overlap": 200,
    "enable_contextual_retrieval": false,
    "enable_multimodal_processing": false
  }
}
Code Cause
401 Invalid or missing token
403 Non-admin user
curl "http://localhost:8000/api/notebooks/$NOTEBOOK_ID/documents/settings?file_id=$FILE_ID" \
  -H "Authorization: Bearer $TOKEN"
import httpx

notebook_id = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
response = httpx.get(
    f"http://localhost:8000/api/notebooks/{notebook_id}/documents/settings",
    headers={"Authorization": f"Bearer {token}"},
    params={"file_id": "f1a2b3c4"},
)
print(response.json()["data"])

GET /api/notebooks/{notebook_id}/documents/context-state

Get the contextual retrieval state for the notebook -- shows which files have been through context augmentation.

Auth: Admin

Headers:

Header Value
Authorization Bearer <token>

Status: 200 OK

{
  "success": true,
  "data": {
    "files": [
      {
        "file_id": "f1a2b3c4-...",
        "total_chunks": 45,
        "enhanced_chunks": 30,
        "status": "partial"
      }
    ]
  }
}
Code Cause
401 Invalid or missing token
403 Non-admin user
curl http://localhost:8000/api/notebooks/$NOTEBOOK_ID/documents/context-state \
  -H "Authorization: Bearer $TOKEN"
import httpx

notebook_id = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
response = httpx.get(
    f"http://localhost:8000/api/notebooks/{notebook_id}/documents/context-state",
    headers={"Authorization": f"Bearer {token}"},
)
print(response.json()["data"])

GET /api/notebooks/{notebook_id}/documents/{file_id}/stage

Get the current workflow stage for a file (useful for tracking ingestion progress).

Auth: Admin

Headers:

Header Value
Authorization Bearer <token>

Status: 200 OK

{
  "success": true,
  "data": {
    "file_id": "f1a2b3c4-...",
    "status": "Processing",
    "stage": "embedding",
    "progress": 75
  }
}
Code Cause
401 Invalid or missing token
403 Non-admin user
404 File job not found
curl http://localhost:8000/api/notebooks/$NOTEBOOK_ID/documents/$FILE_ID/stage \
  -H "Authorization: Bearer $TOKEN"
import httpx

notebook_id = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
file_id = "f1a2b3c4-..."
response = httpx.get(
    f"http://localhost:8000/api/notebooks/{notebook_id}/documents/{file_id}/stage",
    headers={"Authorization": f"Bearer {token}"},
)
print(response.json()["data"])

GET /api/notebooks/{notebook_id}/documents/errors

List all files with ingestion errors in this notebook.

Auth: Admin

Headers:

Header Value
Authorization Bearer <token>

Status: 200 OK

{
  "success": true,
  "data": [
    {
      "file_id": "f1a2b3c4-...",
      "file_name": "corrupted.pdf",
      "error_message": "Failed to parse PDF",
      "error_stage": "parsing"
    }
  ]
}
Code Cause
401 Invalid or missing token
403 Non-admin user
curl http://localhost:8000/api/notebooks/$NOTEBOOK_ID/documents/errors \
  -H "Authorization: Bearer $TOKEN"
import httpx

notebook_id = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
response = httpx.get(
    f"http://localhost:8000/api/notebooks/{notebook_id}/documents/errors",
    headers={"Authorization": f"Bearer {token}"},
)
errors = response.json()["data"]
for err in errors:
    print(f"{err['file_name']}: {err['error_message']}")

POST /api/notebooks/{notebook_id}/documents/{file_id}/mark-error

Manually mark a file as errored with a custom error message and stage.

Auth: Admin

Headers:

Header Value
Authorization Bearer <token>
Content-Type application/json

Body:

{
  "error_message": "Manual abort: file too large",
  "error_stage": "upload"
}
Field Type Required Default Description
error_message string Yes -- Error description
error_stage string Yes -- Pipeline stage where error occurred

Status: 200 OK

{
  "success": true,
  "data": {
    "file_id": "f1a2b3c4-...",
    "status": "error",
    "error_message": "Manual abort: file too large"
  }
}
Code Cause
401 Invalid or missing token
403 Non-admin user
curl -X POST http://localhost:8000/api/notebooks/$NOTEBOOK_ID/documents/$FILE_ID/mark-error \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"error_message": "Manual abort", "error_stage": "upload"}'
import httpx

notebook_id = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
file_id = "f1a2b3c4-..."
response = httpx.post(
    f"http://localhost:8000/api/notebooks/{notebook_id}/documents/{file_id}/mark-error",
    headers={"Authorization": f"Bearer {token}"},
    json={"error_message": "Manual abort", "error_stage": "upload"},
)
print(response.json()["data"])