Skip to main content
Version: v0.1.0

Knowledge Builder API

Knowledge Builder is an independent service that provides document indexing, chunk management, and search capabilities.

Separate Service

The Knowledge Builder API operates as a separate service from the Manager API. Please note the different Base URL.

Base URL: /api/v1/knowledger

Endpoint Summary

GroupEndpoint CountDescription
Knowledges3Knowledge deletion, search, indexing preview
Documents2Document deletion, search
Indexing7Web/document indexing start·status·delete, file download
Chunks6Chunk CRUD, list retrieval, bulk delete
Settings4Default settings, embedding model management
System1Health check

Knowledges

Performs Knowledge-level search, indexing preview, and storage deletion.

MethodPathDescription
DELETE/knowledges/{kid}Delete all chunks related to a Knowledge
POST/knowledges/{kid}/searchSearch within a Knowledge
POST/knowledges/{kid}/indexing/previewIndexing preview (chunking simulation without saving)

DELETE /knowledges/{kid}

Deletes all vector embeddings and text index data belonging to the Knowledge.

Response

204 No Content


POST /knowledges/{kid}/search

Searches for chunks across the entire Knowledge scope.

Request Body

FieldTypeDefaultDescription
querystring-Search query (1–1000 characters)
search_modestringVECTORSearch mode: VECTOR, TEXT, HYBRID
top_kinteger10Number of results to return (1–100)
score_thresholdfloat0.0Minimum score threshold (0.0–1.0)

Response

200 OK

{
"query": "How to use D.Hub",
"search_mode": "VECTOR",
"total_results": 3,
"results": [
{
"chunk_id": "chunk-001",
"document_id": "knowledge.document-xyz789",
"content": "D.Hub is a data hub platform that...",
"score": 0.92,
"highlights": [],
"metadata": {}
}
]
}

POST /knowledges/{kid}/indexing/preview

Uploads a file to preview the chunking results. No actual saving is performed.

Query Parameters

ParameterTypeDefaultDescription
max_tokensinteger512Maximum tokens per chunk (64–4096)
max_previewinteger10Number of preview chunks (1–50)
strategystring-Chunking strategy (fixed, markdown, hierarchical, hybrid, parent_child)
overlap_lengthinteger-Overlap length (0–500)

Request Body

Upload file in multipart/form-data format. (PDF, DOCX, PPTX, XLSX, HTML, TXT, MD)

Response

200 OK

{
"total_chunks": 42,
"preview_chunks": [
{
"sequence": 1,
"content": "First chunk content...",
"token_count": 128,
"char_count": 512
}
],
"truncated": true,
"file_name": "guide.pdf"
}

Documents

Performs Document-level chunk deletion and search.

MethodPathDescription
DELETE/knowledges/{kid}/documents/{did}Delete chunks related to a Document
POST/knowledges/{kid}/documents/{did}/searchSearch within a Document

DELETE /knowledges/{kid}/documents/{did}

Deletes all chunks belonging to the Document from vector and text stores.

Response

204 No Content


POST /knowledges/{kid}/documents/{did}/search

Searches for chunks within a specific Document scope. The Request/Response format is the same as Knowledge search.


Indexing

Manages indexing (chunking → embedding → storing) for web URLs and document files.

All indexing endpoints are under /knowledges/{kid}/documents/{did}/indexing.

Web Indexing

MethodPathDescription
POST.../indexing/webStart web indexing
GET.../indexing/webCheck web indexing status
DELETE.../indexing/webDelete web indexing

POST .../indexing/web

Starts web URL crawling and indexing. Crawls the URL set in the Document's options.url.

Response: 202 Accepted

{
"message": "Web indexing started"
}

GET .../indexing/web

Retrieves the current web indexing status.

Response: 200 OK

{
"status": "READY"
}

Document File Indexing

MethodPathDescription
POST.../indexing/docStart document file indexing
GET.../indexing/docCheck document indexing status
DELETE.../indexing/docDelete document indexing
GET.../indexing/doc/fileDownload original file

POST .../indexing/doc

Uploads a document file and starts indexing. If the file is omitted, the previously stored file is reused.

Request Body: multipart/form-data (file field, optional)

Response: 202 Accepted

{
"message": "Document indexing started"
}

GET .../indexing/doc/file

Downloads the original file used for indexing.

Response: File stream (Content-Disposition: attachment)


Chunks

Directly manages chunks under a Document. Creation, modification, and deletion of chunks are available for MANUAL type documents.

All endpoints are under /knowledges/{kid}/documents/{did}/chunks.

MethodPathDescription
POST/chunksCreate a chunk
GET/chunksList chunks (pagination)
GET/chunks/{id}Get chunk details
PATCH/chunks/{id}Update a chunk
DELETE/chunks/{id}Delete a chunk
DELETE/chunksBulk delete chunks

POST /chunks

Creates a new chunk. If the VECTOR target is enabled, embeddings are automatically generated.

Request Body

{
"content": "Chunk text content",
"metadata": {"source": "manual", "page": 1}
}

Response

201 Created

{
"id": "550e8400-e29b-41d4-a716-446655440000",
"content": "Chunk text content",
"metadata": {"source": "manual", "page": 1},
"created_at": "2024-06-15T10:30:00Z"
}

GET /chunks

Retrieves a paginated list of chunks.

Query Parameters

ParameterTypeDefaultDescription
pageinteger1Page number (starting from 1)
page_sizeinteger20Items per page (1–100)

Response

200 OK

{
"data": [
{
"id": "550e8400-...",
"type": "TEXT",
"content": "Chunk content...",
"metadata": {},
"storages": ["TEXT", "VECTOR"],
"created_at": "2024-06-15T10:30:00Z",
"updated_at": null
}
],
"meta": {
"total": 42,
"page": 1,
"page_size": 20,
"total_pages": 3
}
}

PATCH /chunks/{id}

Updates the content or metadata of an existing chunk. If the content is changed, embeddings are regenerated.

Request Body

{
"content": "Updated chunk content",
"metadata": {"reviewed": true}
}

Settings

Manages global settings for the Knowledge Builder service.

MethodPathDescription
GET/settings/defaultsGet default processing options
PUT/settings/defaultsUpdate default processing options
GET/settings/embedding-modelCurrent embedding model info
GET/settings/embedding-modelsAvailable embedding model list

GET /settings/embedding-model

Retrieves detailed information about the currently configured embedding model.

Response

200 OK

{
"model_name": "BAAI/bge-m3",
"model_version": null,
"dimensions": 1024,
"max_tokens": 8192
}

GET /settings/embedding-models

Retrieves all available embedding models.

Response

200 OK

{
"models": [
{
"id": "BAAI/bge-m3",
"name": "BAAI/bge-m3",
"provider": "local",
"dimensions": 1024,
"max_tokens": 8192,
"supports_image": false,
"is_default": true
}
]
}

System

GET /system/health

Checks the health of the Knowledge Builder service.

Response

200 OK

{
"status": "healthy",
"timestamp": "2024-06-15T10:30:00Z",
"components": {}
}

Usage Examples

cURL

# Search within a Knowledge
curl -X POST https://api.dhub.io/api/v1/knowledger/knowledges/knowledge-abc123/search \
-H "Authorization: Bearer <access_token>" \
-H "Content-Type: application/json" \
-d '{
"query": "How to configure data pipelines",
"search_mode": "HYBRID",
"top_k": 5
}'

# Start document file indexing
curl -X POST https://api.dhub.io/api/v1/knowledger/knowledges/knowledge-abc123/documents/doc-xyz/indexing/doc \
-H "Authorization: Bearer <access_token>" \
-F "file=@guide.pdf"

# List chunks
curl "https://api.dhub.io/api/v1/knowledger/knowledges/knowledge-abc123/documents/doc-xyz/chunks?page=1&page_size=20" \
-H "Authorization: Bearer <access_token>"

# Create a manual chunk
curl -X POST https://api.dhub.io/api/v1/knowledger/knowledges/knowledge-abc123/documents/doc-xyz/chunks \
-H "Authorization: Bearer <access_token>" \
-H "Content-Type: application/json" \
-d '{
"content": "This is manually added chunk content.",
"metadata": {"source": "manual"}
}'

# List embedding models
curl https://api.dhub.io/api/v1/knowledger/settings/embedding-models \
-H "Authorization: Bearer <access_token>"
Relationship with Manager API

Knowledge and Document metadata (name, tags, etc.) is managed by the Manager through the Knowledge API. Knowledge Builder handles only AI/RAG-related features such as indexing, chunks, and search.