AI Enhancement¶
The enhancement pipeline enriches document chunks with AI-generated contextual descriptions. Enhanced chunks follow a lifecycle: pending -> processing -> success -> embedded (via publish). Failed chunks can be reset to pending.
Base path: /api/notebooks/{notebook_id}/enhance
Enhancement Lifecycle¶
Publish Safety
The publish endpoint aborts if ANY chunk for the file is not in "success" status. All chunks must complete enhancement before publishing.
GET /api/notebooks/{notebook_id}/enhance/files¶
List all files with enhancement status. Auto-populates new files from the documents table on each call.
Auth: Admin
Headers:
| Header | Value |
|---|---|
Authorization | Bearer <token> |
Status: 200 OK
| Code | Cause |
|---|---|
401 | Invalid or missing token |
403 | Non-admin user |
import httpx
notebook_id = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
response = httpx.get(
f"http://localhost:8000/api/notebooks/{notebook_id}/enhance/files",
headers={"Authorization": f"Bearer {token}"},
)
files = response.json()["data"]
for f in files:
print(f"{f['file_name']}: {f['success']}/{f['total_chunks']} enhanced")
GET /api/notebooks/{notebook_id}/enhance/files/{file_id}/chunks¶
List individual chunks for a file (paginated, truncated for display).
Auth: Admin
Headers:
| Header | Value |
|---|---|
Authorization | Bearer <token> |
Query Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
limit | integer | No | 200 | Results per page (1-1000) |
offset | integer | No | 0 | Pagination offset |
Status: 200 OK
| Code | Cause |
|---|---|
401 | Invalid or missing token |
403 | Non-admin user |
GET /api/notebooks/{notebook_id}/enhance/chunks/{chunk_id}¶
Get full chunk detail for the preview panel (no truncation).
Auth: Admin
Headers:
| Header | Value |
|---|---|
Authorization | Bearer <token> |
Status: 200 OK
| Code | Cause |
|---|---|
401 | Invalid or missing token |
403 | Non-admin user |
404 | Chunk not found |
GET /api/notebooks/{notebook_id}/enhance/count¶
Aggregate chunk counts (notebook-wide or per-file).
Auth: Admin
Headers:
| Header | Value |
|---|---|
Authorization | Bearer <token> |
Query Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
file_id | string | No | null | Filter to a specific file |
Status: 200 OK
| Code | Cause |
|---|---|
401 | Invalid or missing token |
403 | Non-admin user |
POST /api/notebooks/{notebook_id}/enhance¶
Start the enhancement pipeline for files or specific chunks. Validates that enhanceable chunks exist, then kicks off background processing.
Auth: Admin
Headers:
| Header | Value |
|---|---|
Authorization | Bearer <token> |
Content-Type | application/json |
Body:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
file_ids | array | No | null | File IDs to enhance (file-level) |
chunk_ids | array | No | null | Specific chunk IDs to enhance (chunk-level) |
File vs Chunk Level
Provide file_ids to enhance all pending chunks in those files, or chunk_ids to enhance specific chunks.
Status: 200 OK
| Code | Cause |
|---|---|
401 | Invalid or missing token |
403 | Non-admin user |
404 | No enhanceable chunks found |
GET /api/notebooks/{notebook_id}/enhance/status¶
Poll enhancement progress for a single file. Used by the frontend for progress tracking.
Auth: Admin
Headers:
| Header | Value |
|---|---|
Authorization | Bearer <token> |
Query Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
file_id | string | Yes | File to check status for |
Status: 200 OK
| Code | Cause |
|---|---|
401 | Invalid or missing token |
403 | Non-admin user |
import httpx
import time
# Poll until all chunks are done
while True:
response = httpx.get(
f"http://localhost:8000/api/notebooks/{notebook_id}/enhance/status",
headers={"Authorization": f"Bearer {token}"},
params={"file_id": file_id},
)
status = response.json()["data"]
print(f"Progress: {status['progress_pct']:.1f}%")
if status["all_terminated"]:
break
time.sleep(4)
POST /api/notebooks/{notebook_id}/enhance/publish¶
Publish enhanced chunks to the vector store (file-level). Deletes old vectors, embeds enhanced chunks, and inserts into the documents table. Aborts if any chunk is not in "success" status.
Auth: Admin
Headers:
| Header | Value |
|---|---|
Authorization | Bearer <token> |
Content-Type | application/json |
Body:
{
"file_id": "f1a2b3c4-...",
"job_id": "j1a2b3c4-...",
"file_name": "handbook.pdf",
"notebook_title": "Customer Support KB"
}
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
file_id | string | Yes | -- | File to publish |
job_id | string | No | null | Associated job ID |
file_name | string | Yes | -- | File name for metadata |
notebook_title | string | Yes | -- | Notebook title for metadata |
Status: 200 OK
| Code | Cause |
|---|---|
400 | Not all chunks in "success" status |
401 | Invalid or missing token |
403 | Non-admin user |
import httpx
response = httpx.post(
f"http://localhost:8000/api/notebooks/{notebook_id}/enhance/publish",
headers={"Authorization": f"Bearer {token}"},
json={
"file_id": "f1a2b3c4",
"file_name": "handbook.pdf",
"notebook_title": "Customer Support KB",
},
)
result = response.json()["data"]
print(f"Published: {result['published_count']} chunks")
POST /api/notebooks/{notebook_id}/enhance/publish-chunks¶
Publish specific enhanced chunks to the vector store (chunk-level).
Auth: Admin
Headers:
| Header | Value |
|---|---|
Authorization | Bearer <token> |
Content-Type | application/json |
Body:
{
"chunk_ids": ["chunk-1234-...", "chunk-5678-..."],
"file_id": "f1a2b3c4-...",
"file_name": "handbook.pdf",
"notebook_title": "Customer Support KB"
}
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
chunk_ids | array | Yes | -- | Chunk IDs to publish |
file_id | string | Yes | -- | Parent file ID |
file_name | string | Yes | -- | File name for metadata |
notebook_title | string | Yes | -- | Notebook title for metadata |
Status: 200 OK
| Code | Cause |
|---|---|
400 | Not all specified chunks in "success" status |
401 | Invalid or missing token |
403 | Non-admin user |
curl -X POST http://localhost:8000/api/notebooks/$NOTEBOOK_ID/enhance/publish-chunks \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"chunk_ids": ["chunk-1234", "chunk-5678"],
"file_id": "f1a2b3c4",
"file_name": "handbook.pdf",
"notebook_title": "Customer Support KB"
}'
import httpx
response = httpx.post(
f"http://localhost:8000/api/notebooks/{notebook_id}/enhance/publish-chunks",
headers={"Authorization": f"Bearer {token}"},
json={
"chunk_ids": ["chunk-1234", "chunk-5678"],
"file_id": "f1a2b3c4",
"file_name": "handbook.pdf",
"notebook_title": "Customer Support KB",
},
)
print(response.json()["data"])
POST /api/notebooks/{notebook_id}/enhance/reset¶
Reset failed chunks back to "pending" status. Optionally re-triggers the enhancement pipeline.
Auth: Admin
Headers:
| Header | Value |
|---|---|
Authorization | Bearer <token> |
Content-Type | application/json |
Body:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
file_id | string | Yes | -- | File containing failed chunks |
chunk_ids | array | No | null | Specific chunks to reset (all failed if null) |
trigger_enhancement | boolean | No | false | Re-trigger enhancement after reset |
| Code | Cause |
|---|---|
401 | Invalid or missing token |
403 | Non-admin user |
POST /api/notebooks/{notebook_id}/enhance/populate¶
Backfill the contextual_retrieval_table from already-ingested documents. Use this when documents were ingested without context augmentation and you want to make them available for AI enhancement.
Auth: Admin
Headers:
| Header | Value |
|---|---|
Authorization | Bearer <token> |
| Code | Cause |
|---|---|
401 | Invalid or missing token |
403 | Non-admin user |
POST /api/notebooks/{notebook_id}/enhance/repair-metadata¶
Recompute original_metadata for chunks where it is empty or missing. Uses file_content and original_chunk to compute real line positions.
Auth: Admin
Headers:
| Header | Value |
|---|---|
Authorization | Bearer <token> |
Query Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
file_id | string | No | null | Repair specific file, or all files if omitted |
| Code | Cause |
|---|---|
401 | Invalid or missing token |
403 | Non-admin user |