Documents¶

Documents are files uploaded to a notebook for parsing, chunking, and vector embedding. The ingestion pipeline runs as a background task after upload.

Base path: /api/notebooks/{notebook_id}/documents

Supported file types: PDF, DOCX, DOC, MD, TXT, CSV, XLSX, XLS

POST /api/notebooks/{notebook_id}/documents/upload¶

Upload one or more files to storage. Each file is stored at documents/{notebook_id}/{file_id}/{safe_name}. Returns metadata ready to be passed to the /ingest endpoint.

Auth: Admin

RequestResponseErrorscurl ExamplePython Example

Headers:

Header	Value
`Authorization`	`Bearer <token>`
`Content-Type`	`multipart/form-data`

Body:

Multipart form data with field name files containing one or more files.

Status: 200 OK

{
  "success": true,
  "data": [
    {
      "file_id": "f1a2b3c4-...",
      "file_name": "handbook.pdf",
      "file_type": "PDF",
      "storage_path": "notebook-id/f1a2b3c4/handbook.pdf",
      "size": 245760
    }
  ]
}

Code	Cause
`400`	No files provided
`401`	Invalid or missing token
`403`	Non-admin user
`502`	Storage upload failed

curl -X POST http://localhost:8000/api/notebooks/$NOTEBOOK_ID/documents/upload \
  -H "Authorization: Bearer $TOKEN" \
  -F "files=@handbook.pdf" \
  -F "files=@faq.docx"

import httpx

notebook_id = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
with open("handbook.pdf", "rb") as f:
    response = httpx.post(
        f"http://localhost:8000/api/notebooks/{notebook_id}/documents/upload",
        headers={"Authorization": f"Bearer {token}"},
        files={"files": ("handbook.pdf", f, "application/pdf")},
    )
uploaded = response.json()["data"]
print(f"Uploaded {len(uploaded)} files")

GET /api/notebooks/{notebook_id}/documents/sources¶

List available files from storage for this notebook.

Auth: Admin

RequestResponseErrorscurl ExamplePython Example

Headers:

Header	Value
`Authorization`	`Bearer <token>`

Status: 200 OK

{
  "success": true,
  "data": [
    {
      "file_id": "f1a2b3c4-...",
      "file_name": "handbook.pdf",
      "file_path": "notebook-id/f1a2b3c4/handbook.pdf",
      "size": 245760
    }
  ]
}

Code	Cause
`401`	Invalid or missing token
`403`	Non-admin user

curl http://localhost:8000/api/notebooks/$NOTEBOOK_ID/documents/sources \
  -H "Authorization: Bearer $TOKEN"

import httpx

notebook_id = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
response = httpx.get(
    f"http://localhost:8000/api/notebooks/{notebook_id}/documents/sources",
    headers={"Authorization": f"Bearer {token}"},
)
print(response.json()["data"])

POST /api/notebooks/{notebook_id}/documents/ingest¶

Trigger the ingestion pipeline for uploaded files. Creates job records synchronously, then runs parsing, chunking, and embedding as a background task.

Auth: Admin

RequestResponseErrorscurl ExamplePython Example

Headers:

Header	Value
`Authorization`	`Bearer <token>`
`Content-Type`	`application/json`

Body:

{
  "files": [
    {
      "file_id": "f1a2b3c4-...",
      "file_name": "handbook.pdf",
      "file_path": "notebook-id/f1a2b3c4/handbook.pdf"
    }
  ],
  "settings": {
    "parser": "Docling Parser",
    "chunking_strategy": "Recursive Chunking",
    "chunk_size": 1000,
    "chunk_overlap": 200,
    "enable_contextual_retrieval": false,
    "enable_multimodal_processing": false
  },
  "notebook_name": "Customer Support KB",
  "inference_provider": "openrouter",
  "inference_model": "openai/gpt-4o-mini",
  "inference_temperature": 0.4
}

Field	Type	Required	Default	Description
`files`	array	Yes	--	Files to ingest (from upload response)
`files[].file_id`	string	Yes	--	File UUID
`files[].file_name`	string	Yes	--	Original file name
`files[].file_path`	string	Yes	--	Storage path
`settings`	object	No	defaults	Ingestion configuration
`settings.parser`	string	No	`"Docling Parser"`	`"Docling Parser"` or `"Mistral OCR"`
`settings.chunking_strategy`	string	No	`"Recursive Chunking"`	`"Recursive Chunking"` or `"Agentic Chunking"`
`settings.chunk_size`	integer	No	`1000`	Target chunk size in characters
`settings.chunk_overlap`	integer	No	`200`	Overlap between chunks
`settings.enable_contextual_retrieval`	boolean	No	`false`	Enable context augmentation
`settings.enable_multimodal_processing`	boolean	No	`false`	Enable image description
`notebook_name`	string	No	--	Notebook title (for metadata)
`inference_provider`	string	No	--	LLM provider for context augmentation
`inference_model`	string	No	--	LLM model for context augmentation
`inference_temperature`	float	No	--	Temperature for context augmentation

Status: 200 OK

{
  "success": true,
  "data": {
    "jobs": [
      {
        "job_id": "j1a2b3c4-...",
        "file_id": "f1a2b3c4-..."
      }
    ]
  }
}

Code	Cause
`400`	No files provided
`401`	Invalid or missing token
`403`	Non-admin user

curl -X POST http://localhost:8000/api/notebooks/$NOTEBOOK_ID/documents/ingest \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "files": [{"file_id": "f1a2b3c4", "file_name": "handbook.pdf", "file_path": "nb/f1a2b3c4/handbook.pdf"}],
    "settings": {"parser": "Docling Parser", "chunking_strategy": "Recursive Chunking"},
    "notebook_name": "Customer Support KB"
  }'

import httpx

notebook_id = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
response = httpx.post(
    f"http://localhost:8000/api/notebooks/{notebook_id}/documents/ingest",
    headers={"Authorization": f"Bearer {token}"},
    json={
        "files": [
            {
                "file_id": "f1a2b3c4",
                "file_name": "handbook.pdf",
                "file_path": "nb/f1a2b3c4/handbook.pdf",
            }
        ],
        "settings": {
            "parser": "Docling Parser",
            "chunking_strategy": "Recursive Chunking",
        },
        "notebook_name": "Customer Support KB",
    },
)
jobs = response.json()["data"]["jobs"]
print(f"Started {len(jobs)} ingestion jobs")

POST /api/notebooks/{notebook_id}/documents/{file_id}/reingest¶

Re-ingest a document: performs atomic cleanup of all old data, then re-runs the pipeline. Rejects files that are currently being processed.

Auth: Admin

RequestResponseErrorscurl ExamplePython Example

Headers:

Header	Value
`Authorization`	`Bearer <token>`
`Content-Type`	`application/json`

Body:

{
  "settings": {
    "parser": "Mistral OCR",
    "chunking_strategy": "Recursive Chunking",
    "chunk_size": 800
  }
}

Field	Type	Required	Default	Description
`settings`	object	No	--	New ingestion settings (same schema as ingest)

Status: 200 OK

{
  "success": true,
  "data": {
    "job_id": "j1a2b3c4-...",
    "status": "reprocessing",
    "cleanup_summary": {
      "documents_deleted": 45,
      "records_deleted": 1,
      "chunks_deleted": 12,
      "raw_data_deleted": 1,
      "cache_deleted": 3,
      "jobs_deleted": 1
    }
  }
}

Code	Cause
`401`	Invalid or missing token
`403`	Non-admin user
`409`	File currently processing, or storage provider is `"none"`

curl -X POST http://localhost:8000/api/notebooks/$NOTEBOOK_ID/documents/$FILE_ID/reingest \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"settings": {"parser": "Mistral OCR"}}'

import httpx

notebook_id = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
file_id = "f1a2b3c4-..."
response = httpx.post(
    f"http://localhost:8000/api/notebooks/{notebook_id}/documents/{file_id}/reingest",
    headers={"Authorization": f"Bearer {token}"},
    json={"settings": {"parser": "Mistral OCR"}},
)
print(response.json()["data"])

POST /api/notebooks/{notebook_id}/documents/reingest-batch¶

Batch re-ingest multiple files. Each file is cleaned up and re-ingested independently -- one file's failure does not block others.

Auth: Admin

RequestResponseErrorscurl ExamplePython Example

Headers:

Header	Value
`Authorization`	`Bearer <token>`
`Content-Type`	`application/json`

Body:

{
  "file_ids": ["f1a2b3c4-...", "f5e6d7c8-..."],
  "settings": {
    "parser": "Docling Parser"
  }
}

Field	Type	Required	Default	Description
`file_ids`	array	Yes	--	List of file IDs to re-ingest
`settings`	object	No	--	Ingestion settings (applied to all)

Status: 200 OK

{
  "success": true,
  "data": {
    "results": [
      {
        "file_id": "f1a2b3c4-...",
        "job_id": "j1a2b3c4-...",
        "status": "reprocessing",
        "cleanup_summary": { "documents_deleted": 45 }
      },
      {
        "file_id": "f5e6d7c8-...",
        "status": "failed",
        "error": "File is currently being processed"
      }
    ],
    "total": 2,
    "succeeded": 1,
    "failed": 1
  }
}

Code	Cause
`400`	No `file_ids` provided
`401`	Invalid or missing token
`403`	Non-admin user
`409`	Storage provider is `"none"`

curl -X POST http://localhost:8000/api/notebooks/$NOTEBOOK_ID/documents/reingest-batch \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"file_ids": ["f1a2b3c4", "f5e6d7c8"], "settings": {"parser": "Docling Parser"}}'

import httpx

notebook_id = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
response = httpx.post(
    f"http://localhost:8000/api/notebooks/{notebook_id}/documents/reingest-batch",
    headers={"Authorization": f"Bearer {token}"},
    json={
        "file_ids": ["f1a2b3c4", "f5e6d7c8"],
        "settings": {"parser": "Docling Parser"},
    },
)
result = response.json()["data"]
print(f"Succeeded: {result['succeeded']}, Failed: {result['failed']}")

GET /api/notebooks/{notebook_id}/documents/¶

List all documents with their current status.

Auth: Admin

RequestResponseErrorscurl ExamplePython Example

Headers:

Header	Value
`Authorization`	`Bearer <token>`

Status: 200 OK

{
  "success": true,
  "data": [
    {
      "file_id": "f1a2b3c4-...",
      "file_name": "handbook.pdf",
      "status": "success",
      "chunk_count": 45,
      "created_at": "2026-02-25T10:00:00Z"
    }
  ]
}

Code	Cause
`401`	Invalid or missing token
`403`	Non-admin user

curl http://localhost:8000/api/notebooks/$NOTEBOOK_ID/documents/ \
  -H "Authorization: Bearer $TOKEN"

import httpx

notebook_id = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
response = httpx.get(
    f"http://localhost:8000/api/notebooks/{notebook_id}/documents/",
    headers={"Authorization": f"Bearer {token}"},
)
docs = response.json()["data"]
for doc in docs:
    print(f"{doc['file_name']}: {doc['status']}")

DELETE /api/notebooks/{notebook_id}/documents/{file_id}¶

Delete a document and all related data (vectors, records, enhanced chunks, raw data, cache entries).

Auth: Admin

RequestResponseErrorscurl ExamplePython Example

Headers:

Header	Value
`Authorization`	`Bearer <token>`

Status: 200 OK

{
  "success": true,
  "data": {
    "success": true
  }
}

Code	Cause
`401`	Invalid or missing token
`403`	Non-admin user

curl -X DELETE http://localhost:8000/api/notebooks/$NOTEBOOK_ID/documents/$FILE_ID \
  -H "Authorization: Bearer $TOKEN"

import httpx

notebook_id = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
file_id = "f1a2b3c4-..."
response = httpx.delete(
    f"http://localhost:8000/api/notebooks/{notebook_id}/documents/{file_id}",
    headers={"Authorization": f"Bearer {token}"},
)
print(response.json()["data"])

POST /api/notebooks/{notebook_id}/documents/delete-batch¶

Delete multiple documents in a single request.

Auth: Admin

RequestResponseErrorscurl ExamplePython Example

Headers:

Header	Value
`Authorization`	`Bearer <token>`
`Content-Type`	`application/json`

Body:

{
  "file_ids": ["f1a2b3c4-...", "f5e6d7c8-..."]
}

Field	Type	Required	Default	Description
`file_ids`	array	Yes	--	List of file IDs to delete

Status: 200 OK

{
  "success": true,
  "data": {
    "success": true,
    "deleted": 2
  }
}

Code	Cause
`400`	No `file_ids` provided
`401`	Invalid or missing token
`403`	Non-admin user

curl -X POST http://localhost:8000/api/notebooks/$NOTEBOOK_ID/documents/delete-batch \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"file_ids": ["f1a2b3c4", "f5e6d7c8"]}'

import httpx

notebook_id = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
response = httpx.post(
    f"http://localhost:8000/api/notebooks/{notebook_id}/documents/delete-batch",
    headers={"Authorization": f"Bearer {token}"},
    json={"file_ids": ["f1a2b3c4", "f5e6d7c8"]},
)
print(f"Deleted: {response.json()['data']['deleted']}")

GET /api/notebooks/{notebook_id}/documents/settings¶

Get the ingestion settings used for a specific file.

Auth: Admin

RequestResponseErrorscurl ExamplePython Example

Headers:

Header	Value
`Authorization`	`Bearer <token>`

Query Parameters:

Parameter	Type	Required	Description
`file_id`	string	Yes	File ID to get settings for

Status: 200 OK

{
  "success": true,
  "data": {
    "parser": "Docling Parser",
    "chunking_strategy": "Recursive Chunking",
    "chunk_size": 1000,
    "chunk_overlap": 200,
    "enable_contextual_retrieval": false,
    "enable_multimodal_processing": false
  }
}

Code	Cause
`401`	Invalid or missing token
`403`	Non-admin user

curl "http://localhost:8000/api/notebooks/$NOTEBOOK_ID/documents/settings?file_id=$FILE_ID" \
  -H "Authorization: Bearer $TOKEN"

import httpx

notebook_id = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
response = httpx.get(
    f"http://localhost:8000/api/notebooks/{notebook_id}/documents/settings",
    headers={"Authorization": f"Bearer {token}"},
    params={"file_id": "f1a2b3c4"},
)
print(response.json()["data"])

GET /api/notebooks/{notebook_id}/documents/context-state¶

Get the contextual retrieval state for the notebook -- shows which files have been through context augmentation.

Auth: Admin

RequestResponseErrorscurl ExamplePython Example

Headers:

Header	Value
`Authorization`	`Bearer <token>`

Status: 200 OK

{
  "success": true,
  "data": {
    "files": [
      {
        "file_id": "f1a2b3c4-...",
        "total_chunks": 45,
        "enhanced_chunks": 30,
        "status": "partial"
      }
    ]
  }
}

Code	Cause
`401`	Invalid or missing token
`403`	Non-admin user

curl http://localhost:8000/api/notebooks/$NOTEBOOK_ID/documents/context-state \
  -H "Authorization: Bearer $TOKEN"

import httpx

notebook_id = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
response = httpx.get(
    f"http://localhost:8000/api/notebooks/{notebook_id}/documents/context-state",
    headers={"Authorization": f"Bearer {token}"},
)
print(response.json()["data"])

GET /api/notebooks/{notebook_id}/documents/{file_id}/stage¶

Get the current workflow stage for a file (useful for tracking ingestion progress).

Auth: Admin

RequestResponseErrorscurl ExamplePython Example

Headers:

Header	Value
`Authorization`	`Bearer <token>`

Status: 200 OK

{
  "success": true,
  "data": {
    "file_id": "f1a2b3c4-...",
    "status": "Processing",
    "stage": "embedding",
    "progress": 75
  }
}

Code	Cause
`401`	Invalid or missing token
`403`	Non-admin user
`404`	File job not found

curl http://localhost:8000/api/notebooks/$NOTEBOOK_ID/documents/$FILE_ID/stage \
  -H "Authorization: Bearer $TOKEN"

import httpx

notebook_id = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
file_id = "f1a2b3c4-..."
response = httpx.get(
    f"http://localhost:8000/api/notebooks/{notebook_id}/documents/{file_id}/stage",
    headers={"Authorization": f"Bearer {token}"},
)
print(response.json()["data"])

GET /api/notebooks/{notebook_id}/documents/errors¶

List all files with ingestion errors in this notebook.

Auth: Admin

RequestResponseErrorscurl ExamplePython Example

Headers:

Header	Value
`Authorization`	`Bearer <token>`

Status: 200 OK

{
  "success": true,
  "data": [
    {
      "file_id": "f1a2b3c4-...",
      "file_name": "corrupted.pdf",
      "error_message": "Failed to parse PDF",
      "error_stage": "parsing"
    }
  ]
}

Code	Cause
`401`	Invalid or missing token
`403`	Non-admin user

curl http://localhost:8000/api/notebooks/$NOTEBOOK_ID/documents/errors \
  -H "Authorization: Bearer $TOKEN"

import httpx

notebook_id = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
response = httpx.get(
    f"http://localhost:8000/api/notebooks/{notebook_id}/documents/errors",
    headers={"Authorization": f"Bearer {token}"},
)
errors = response.json()["data"]
for err in errors:
    print(f"{err['file_name']}: {err['error_message']}")

POST /api/notebooks/{notebook_id}/documents/{file_id}/mark-error¶

Manually mark a file as errored with a custom error message and stage.

Auth: Admin

RequestResponseErrorscurl ExamplePython Example

Headers:

Header	Value
`Authorization`	`Bearer <token>`
`Content-Type`	`application/json`

Body:

{
  "error_message": "Manual abort: file too large",
  "error_stage": "upload"
}

Field	Type	Required	Default	Description
`error_message`	string	Yes	--	Error description
`error_stage`	string	Yes	--	Pipeline stage where error occurred

Status: 200 OK

{
  "success": true,
  "data": {
    "file_id": "f1a2b3c4-...",
    "status": "error",
    "error_message": "Manual abort: file too large"
  }
}

Code	Cause
`401`	Invalid or missing token
`403`	Non-admin user

curl -X POST http://localhost:8000/api/notebooks/$NOTEBOOK_ID/documents/$FILE_ID/mark-error \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"error_message": "Manual abort", "error_stage": "upload"}'

import httpx

notebook_id = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
file_id = "f1a2b3c4-..."
response = httpx.post(
    f"http://localhost:8000/api/notebooks/{notebook_id}/documents/{file_id}/mark-error",
    headers={"Authorization": f"Bearer {token}"},
    json={"error_message": "Manual abort", "error_stage": "upload"},
)
print(response.json()["data"])