Skip to content

AI Enhancement

The enhancement pipeline enriches document chunks with AI-generated contextual descriptions. Enhanced chunks follow a lifecycle: pending -> processing -> success -> embedded (via publish). Failed chunks can be reset to pending.

Base path: /api/notebooks/{notebook_id}/enhance


Enhancement Lifecycle

pending --> processing --> success --> embedded (publish)
              |                          ^
              v                          |
           failed -----> pending (reset) -+

Publish Safety

The publish endpoint aborts if ANY chunk for the file is not in "success" status. All chunks must complete enhancement before publishing.


GET /api/notebooks/{notebook_id}/enhance/files

List all files with enhancement status. Auto-populates new files from the documents table on each call.

Auth: Admin

Headers:

Header Value
Authorization Bearer <token>

Status: 200 OK

{
  "success": true,
  "data": [
    {
      "file_id": "f1a2b3c4-...",
      "file_name": "handbook.pdf",
      "total_chunks": 45,
      "pending": 0,
      "processing": 0,
      "success": 45,
      "failed": 0,
      "embedded": 0
    }
  ]
}
Code Cause
401 Invalid or missing token
403 Non-admin user
curl http://localhost:8000/api/notebooks/$NOTEBOOK_ID/enhance/files \
  -H "Authorization: Bearer $TOKEN"
import httpx

notebook_id = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
response = httpx.get(
    f"http://localhost:8000/api/notebooks/{notebook_id}/enhance/files",
    headers={"Authorization": f"Bearer {token}"},
)
files = response.json()["data"]
for f in files:
    print(f"{f['file_name']}: {f['success']}/{f['total_chunks']} enhanced")

GET /api/notebooks/{notebook_id}/enhance/files/{file_id}/chunks

List individual chunks for a file (paginated, truncated for display).

Auth: Admin

Headers:

Header Value
Authorization Bearer <token>

Query Parameters:

Parameter Type Required Default Description
limit integer No 200 Results per page (1-1000)
offset integer No 0 Pagination offset

Status: 200 OK

{
  "success": true,
  "data": [
    {
      "chunk_id": "chunk-1234-...",
      "file_id": "f1a2b3c4-...",
      "status": "success",
      "original_chunk": "The refund policy states that...",
      "enhanced_chunk": "# Context\nThis chunk describes...\n\n---\n\n# Content\nThe refund policy states that..."
    }
  ]
}
Code Cause
401 Invalid or missing token
403 Non-admin user
curl "http://localhost:8000/api/notebooks/$NOTEBOOK_ID/enhance/files/$FILE_ID/chunks?limit=50&offset=0" \
  -H "Authorization: Bearer $TOKEN"
import httpx

response = httpx.get(
    f"http://localhost:8000/api/notebooks/{notebook_id}/enhance/files/{file_id}/chunks",
    headers={"Authorization": f"Bearer {token}"},
    params={"limit": 50, "offset": 0},
)
chunks = response.json()["data"]
print(f"Got {len(chunks)} chunks")

GET /api/notebooks/{notebook_id}/enhance/chunks/{chunk_id}

Get full chunk detail for the preview panel (no truncation).

Auth: Admin

Headers:

Header Value
Authorization Bearer <token>

Status: 200 OK

{
  "success": true,
  "data": {
    "chunk_id": "chunk-1234-...",
    "file_id": "f1a2b3c4-...",
    "status": "success",
    "original_chunk": "Full original text...",
    "enhanced_chunk": "# Context\n...\n\n---\n\n# Content\n..."
  }
}
Code Cause
401 Invalid or missing token
403 Non-admin user
404 Chunk not found
curl http://localhost:8000/api/notebooks/$NOTEBOOK_ID/enhance/chunks/$CHUNK_ID \
  -H "Authorization: Bearer $TOKEN"
import httpx

response = httpx.get(
    f"http://localhost:8000/api/notebooks/{notebook_id}/enhance/chunks/{chunk_id}",
    headers={"Authorization": f"Bearer {token}"},
)
print(response.json()["data"]["enhanced_chunk"])

GET /api/notebooks/{notebook_id}/enhance/count

Aggregate chunk counts (notebook-wide or per-file).

Auth: Admin

Headers:

Header Value
Authorization Bearer <token>

Query Parameters:

Parameter Type Required Default Description
file_id string No null Filter to a specific file

Status: 200 OK

{
  "success": true,
  "data": {
    "total": 340,
    "pending": 100,
    "processing": 5,
    "success": 200,
    "failed": 3,
    "embedded": 32
  }
}
Code Cause
401 Invalid or missing token
403 Non-admin user
curl "http://localhost:8000/api/notebooks/$NOTEBOOK_ID/enhance/count?file_id=$FILE_ID" \
  -H "Authorization: Bearer $TOKEN"
import httpx

response = httpx.get(
    f"http://localhost:8000/api/notebooks/{notebook_id}/enhance/count",
    headers={"Authorization": f"Bearer {token}"},
    params={"file_id": file_id},
)
counts = response.json()["data"]
print(f"Total: {counts['total']}, Success: {counts['success']}")

POST /api/notebooks/{notebook_id}/enhance

Start the enhancement pipeline for files or specific chunks. Validates that enhanceable chunks exist, then kicks off background processing.

Auth: Admin

Headers:

Header Value
Authorization Bearer <token>
Content-Type application/json

Body:

{
  "file_ids": ["f1a2b3c4-..."],
  "chunk_ids": null
}
Field Type Required Default Description
file_ids array No null File IDs to enhance (file-level)
chunk_ids array No null Specific chunk IDs to enhance (chunk-level)

File vs Chunk Level

Provide file_ids to enhance all pending chunks in those files, or chunk_ids to enhance specific chunks.

Status: 200 OK

{
  "success": true,
  "data": [
    {
      "file_id": "f1a2b3c4-...",
      "file_name": "handbook.pdf",
      "total_chunks": 45,
      "pending": 45,
      "processing": 0,
      "success": 0,
      "failed": 0,
      "embedded": 0
    }
  ]
}
Code Cause
401 Invalid or missing token
403 Non-admin user
404 No enhanceable chunks found
curl -X POST http://localhost:8000/api/notebooks/$NOTEBOOK_ID/enhance \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"file_ids": ["f1a2b3c4"]}'
import httpx

response = httpx.post(
    f"http://localhost:8000/api/notebooks/{notebook_id}/enhance",
    headers={"Authorization": f"Bearer {token}"},
    json={"file_ids": ["f1a2b3c4"]},
)
print(response.json()["data"])

GET /api/notebooks/{notebook_id}/enhance/status

Poll enhancement progress for a single file. Used by the frontend for progress tracking.

Auth: Admin

Headers:

Header Value
Authorization Bearer <token>

Query Parameters:

Parameter Type Required Description
file_id string Yes File to check status for

Status: 200 OK

{
  "success": true,
  "data": {
    "total": 45,
    "pending": 0,
    "processing": 5,
    "success": 38,
    "failed": 2,
    "embedded": 0,
    "progress_pct": 84.4,
    "all_terminated": false
  }
}
Code Cause
401 Invalid or missing token
403 Non-admin user
curl "http://localhost:8000/api/notebooks/$NOTEBOOK_ID/enhance/status?file_id=$FILE_ID" \
  -H "Authorization: Bearer $TOKEN"
import httpx
import time

# Poll until all chunks are done
while True:
    response = httpx.get(
        f"http://localhost:8000/api/notebooks/{notebook_id}/enhance/status",
        headers={"Authorization": f"Bearer {token}"},
        params={"file_id": file_id},
    )
    status = response.json()["data"]
    print(f"Progress: {status['progress_pct']:.1f}%")

    if status["all_terminated"]:
        break
    time.sleep(4)

POST /api/notebooks/{notebook_id}/enhance/publish

Publish enhanced chunks to the vector store (file-level). Deletes old vectors, embeds enhanced chunks, and inserts into the documents table. Aborts if any chunk is not in "success" status.

Auth: Admin

Headers:

Header Value
Authorization Bearer <token>
Content-Type application/json

Body:

{
  "file_id": "f1a2b3c4-...",
  "job_id": "j1a2b3c4-...",
  "file_name": "handbook.pdf",
  "notebook_title": "Customer Support KB"
}
Field Type Required Default Description
file_id string Yes -- File to publish
job_id string No null Associated job ID
file_name string Yes -- File name for metadata
notebook_title string Yes -- Notebook title for metadata

Status: 200 OK

{
  "success": true,
  "data": {
    "success": true,
    "message": "Published 45 enhanced chunks",
    "published_count": 45
  }
}
Code Cause
400 Not all chunks in "success" status
401 Invalid or missing token
403 Non-admin user
curl -X POST http://localhost:8000/api/notebooks/$NOTEBOOK_ID/enhance/publish \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "file_id": "f1a2b3c4",
    "file_name": "handbook.pdf",
    "notebook_title": "Customer Support KB"
  }'
import httpx

response = httpx.post(
    f"http://localhost:8000/api/notebooks/{notebook_id}/enhance/publish",
    headers={"Authorization": f"Bearer {token}"},
    json={
        "file_id": "f1a2b3c4",
        "file_name": "handbook.pdf",
        "notebook_title": "Customer Support KB",
    },
)
result = response.json()["data"]
print(f"Published: {result['published_count']} chunks")

POST /api/notebooks/{notebook_id}/enhance/publish-chunks

Publish specific enhanced chunks to the vector store (chunk-level).

Auth: Admin

Headers:

Header Value
Authorization Bearer <token>
Content-Type application/json

Body:

{
  "chunk_ids": ["chunk-1234-...", "chunk-5678-..."],
  "file_id": "f1a2b3c4-...",
  "file_name": "handbook.pdf",
  "notebook_title": "Customer Support KB"
}
Field Type Required Default Description
chunk_ids array Yes -- Chunk IDs to publish
file_id string Yes -- Parent file ID
file_name string Yes -- File name for metadata
notebook_title string Yes -- Notebook title for metadata

Status: 200 OK

{
  "success": true,
  "data": {
    "success": true,
    "message": "Published 2 enhanced chunks",
    "published_count": 2
  }
}
Code Cause
400 Not all specified chunks in "success" status
401 Invalid or missing token
403 Non-admin user
curl -X POST http://localhost:8000/api/notebooks/$NOTEBOOK_ID/enhance/publish-chunks \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "chunk_ids": ["chunk-1234", "chunk-5678"],
    "file_id": "f1a2b3c4",
    "file_name": "handbook.pdf",
    "notebook_title": "Customer Support KB"
  }'
import httpx

response = httpx.post(
    f"http://localhost:8000/api/notebooks/{notebook_id}/enhance/publish-chunks",
    headers={"Authorization": f"Bearer {token}"},
    json={
        "chunk_ids": ["chunk-1234", "chunk-5678"],
        "file_id": "f1a2b3c4",
        "file_name": "handbook.pdf",
        "notebook_title": "Customer Support KB",
    },
)
print(response.json()["data"])

POST /api/notebooks/{notebook_id}/enhance/reset

Reset failed chunks back to "pending" status. Optionally re-triggers the enhancement pipeline.

Auth: Admin

Headers:

Header Value
Authorization Bearer <token>
Content-Type application/json

Body:

{
  "file_id": "f1a2b3c4-...",
  "chunk_ids": null,
  "trigger_enhancement": true
}
Field Type Required Default Description
file_id string Yes -- File containing failed chunks
chunk_ids array No null Specific chunks to reset (all failed if null)
trigger_enhancement boolean No false Re-trigger enhancement after reset

Status: 200 OK

{
  "success": true,
  "data": {
    "reset_count": 3,
    "enhancement_triggered": true
  }
}
Code Cause
401 Invalid or missing token
403 Non-admin user
curl -X POST http://localhost:8000/api/notebooks/$NOTEBOOK_ID/enhance/reset \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"file_id": "f1a2b3c4", "trigger_enhancement": true}'
import httpx

response = httpx.post(
    f"http://localhost:8000/api/notebooks/{notebook_id}/enhance/reset",
    headers={"Authorization": f"Bearer {token}"},
    json={"file_id": "f1a2b3c4", "trigger_enhancement": True},
)
result = response.json()["data"]
print(f"Reset {result['reset_count']} chunks")

POST /api/notebooks/{notebook_id}/enhance/populate

Backfill the contextual_retrieval_table from already-ingested documents. Use this when documents were ingested without context augmentation and you want to make them available for AI enhancement.

Auth: Admin

Headers:

Header Value
Authorization Bearer <token>

Status: 200 OK

{
  "success": true,
  "data": {
    "populated": 45,
    "skipped": 0
  }
}
Code Cause
401 Invalid or missing token
403 Non-admin user
curl -X POST http://localhost:8000/api/notebooks/$NOTEBOOK_ID/enhance/populate \
  -H "Authorization: Bearer $TOKEN"
import httpx

response = httpx.post(
    f"http://localhost:8000/api/notebooks/{notebook_id}/enhance/populate",
    headers={"Authorization": f"Bearer {token}"},
)
print(response.json()["data"])

POST /api/notebooks/{notebook_id}/enhance/repair-metadata

Recompute original_metadata for chunks where it is empty or missing. Uses file_content and original_chunk to compute real line positions.

Auth: Admin

Headers:

Header Value
Authorization Bearer <token>

Query Parameters:

Parameter Type Required Default Description
file_id string No null Repair specific file, or all files if omitted

Status: 200 OK

{
  "success": true,
  "data": {
    "repaired": 12,
    "total_checked": 45
  }
}
Code Cause
401 Invalid or missing token
403 Non-admin user
curl -X POST "http://localhost:8000/api/notebooks/$NOTEBOOK_ID/enhance/repair-metadata?file_id=$FILE_ID" \
  -H "Authorization: Bearer $TOKEN"
import httpx

response = httpx.post(
    f"http://localhost:8000/api/notebooks/{notebook_id}/enhance/repair-metadata",
    headers={"Authorization": f"Bearer {token}"},
    params={"file_id": file_id},
)
result = response.json()["data"]
print(f"Repaired {result['repaired']} of {result['total_checked']} chunks")