Knowledge Builder API
Knowledge Builder is an independent service that provides document indexing, chunk management, and search capabilities.
The Knowledge Builder API operates as a separate service from the Manager API. Please note the different Base URL.
Base URL: /api/v1/knowledger
Endpoint Summary
| Group | Endpoint Count | Description |
|---|---|---|
| Knowledges | 3 | Knowledge deletion, search, indexing preview |
| Documents | 2 | Document deletion, search |
| Indexing | 7 | Web/document indexing start·status·delete, file download |
| Chunks | 6 | Chunk CRUD, list retrieval, bulk delete |
| Settings | 4 | Default settings, embedding model management |
| System | 1 | Health check |
Knowledges
Performs Knowledge-level search, indexing preview, and storage deletion.
| Method | Path | Description |
|---|---|---|
| DELETE | /knowledges/{kid} | Delete all chunks related to a Knowledge |
| POST | /knowledges/{kid}/search | Search within a Knowledge |
| POST | /knowledges/{kid}/indexing/preview | Indexing preview (chunking simulation without saving) |
DELETE /knowledges/{kid}
Deletes all vector embeddings and text index data belonging to the Knowledge.
Response
204 No Content
POST /knowledges/{kid}/search
Searches for chunks across the entire Knowledge scope.
Request Body
| Field | Type | Default | Description |
|---|---|---|---|
query | string | - | Search query (1–1000 characters) |
search_mode | string | VECTOR | Search mode: VECTOR, TEXT, HYBRID |
top_k | integer | 10 | Number of results to return (1–100) |
score_threshold | float | 0.0 | Minimum score threshold (0.0–1.0) |
Response
200 OK
{
"query": "How to use D.Hub",
"search_mode": "VECTOR",
"total_results": 3,
"results": [
{
"chunk_id": "chunk-001",
"document_id": "knowledge.document-xyz789",
"content": "D.Hub is a data hub platform that...",
"score": 0.92,
"highlights": [],
"metadata": {}
}
]
}
POST /knowledges/{kid}/indexing/preview
Uploads a file to preview the chunking results. No actual saving is performed.
Query Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
max_tokens | integer | 512 | Maximum tokens per chunk (64–4096) |
max_preview | integer | 10 | Number of preview chunks (1–50) |
strategy | string | - | Chunking strategy (fixed, markdown, hierarchical, hybrid, parent_child) |
overlap_length | integer | - | Overlap length (0–500) |
Request Body
Upload file in multipart/form-data format. (PDF, DOCX, PPTX, XLSX, HTML, TXT, MD)
Response
200 OK
{
"total_chunks": 42,
"preview_chunks": [
{
"sequence": 1,
"content": "First chunk content...",
"token_count": 128,
"char_count": 512
}
],
"truncated": true,
"file_name": "guide.pdf"
}
Documents
Performs Document-level chunk deletion and search.
| Method | Path | Description |
|---|---|---|
| DELETE | /knowledges/{kid}/documents/{did} | Delete chunks related to a Document |
| POST | /knowledges/{kid}/documents/{did}/search | Search within a Document |
DELETE /knowledges/{kid}/documents/{did}
Deletes all chunks belonging to the Document from vector and text stores.
Response
204 No Content
POST /knowledges/{kid}/documents/{did}/search
Searches for chunks within a specific Document scope. The Request/Response format is the same as Knowledge search.
Indexing
Manages indexing (chunking → embedding → storing) for web URLs and document files.
All indexing endpoints are under /knowledges/{kid}/documents/{did}/indexing.
Web Indexing
| Method | Path | Description |
|---|---|---|
| POST | .../indexing/web | Start web indexing |
| GET | .../indexing/web | Check web indexing status |
| DELETE | .../indexing/web | Delete web indexing |
POST .../indexing/web
Starts web URL crawling and indexing. Crawls the URL set in the Document's options.url.
Response: 202 Accepted
{
"message": "Web indexing started"
}
GET .../indexing/web
Retrieves the current web indexing status.
Response: 200 OK
{
"status": "READY"
}
Document File Indexing
| Method | Path | Description |
|---|---|---|
| POST | .../indexing/doc | Start document file indexing |
| GET | .../indexing/doc | Check document indexing status |
| DELETE | .../indexing/doc | Delete document indexing |
| GET | .../indexing/doc/file | Download original file |
POST .../indexing/doc
Uploads a document file and starts indexing. If the file is omitted, the previously stored file is reused.
Request Body: multipart/form-data (file field, optional)
Response: 202 Accepted
{
"message": "Document indexing started"
}
GET .../indexing/doc/file
Downloads the original file used for indexing.
Response: File stream (Content-Disposition: attachment)
Chunks
Directly manages chunks under a Document. Creation, modification, and deletion of chunks are available for MANUAL type documents.
All endpoints are under /knowledges/{kid}/documents/{did}/chunks.
| Method | Path | Description |
|---|---|---|
| POST | /chunks | Create a chunk |
| GET | /chunks | List chunks (pagination) |
| GET | /chunks/{id} | Get chunk details |
| PATCH | /chunks/{id} | Update a chunk |
| DELETE | /chunks/{id} | Delete a chunk |
| DELETE | /chunks | Bulk delete chunks |
POST /chunks
Creates a new chunk. If the VECTOR target is enabled, embeddings are automatically generated.
Request Body
{
"content": "Chunk text content",
"metadata": {"source": "manual", "page": 1}
}
Response
201 Created
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"content": "Chunk text content",
"metadata": {"source": "manual", "page": 1},
"created_at": "2024-06-15T10:30:00Z"
}
GET /chunks
Retrieves a paginated list of chunks.
Query Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
page | integer | 1 | Page number (starting from 1) |
page_size | integer | 20 | Items per page (1–100) |
Response
200 OK
{
"data": [
{
"id": "550e8400-...",
"type": "TEXT",
"content": "Chunk content...",
"metadata": {},
"storages": ["TEXT", "VECTOR"],
"created_at": "2024-06-15T10:30:00Z",
"updated_at": null
}
],
"meta": {
"total": 42,
"page": 1,
"page_size": 20,
"total_pages": 3
}
}
PATCH /chunks/{id}
Updates the content or metadata of an existing chunk. If the content is changed, embeddings are regenerated.
Request Body
{
"content": "Updated chunk content",
"metadata": {"reviewed": true}
}
Settings
Manages global settings for the Knowledge Builder service.
| Method | Path | Description |
|---|---|---|
| GET | /settings/defaults | Get default processing options |
| PUT | /settings/defaults | Update default processing options |
| GET | /settings/embedding-model | Current embedding model info |
| GET | /settings/embedding-models | Available embedding model list |
GET /settings/embedding-model
Retrieves detailed information about the currently configured embedding model.
Response
200 OK
{
"model_name": "BAAI/bge-m3",
"model_version": null,
"dimensions": 1024,
"max_tokens": 8192
}
GET /settings/embedding-models
Retrieves all available embedding models.
Response
200 OK
{
"models": [
{
"id": "BAAI/bge-m3",
"name": "BAAI/bge-m3",
"provider": "local",
"dimensions": 1024,
"max_tokens": 8192,
"supports_image": false,
"is_default": true
}
]
}
System
GET /system/health
Checks the health of the Knowledge Builder service.
Response
200 OK
{
"status": "healthy",
"timestamp": "2024-06-15T10:30:00Z",
"components": {}
}
Usage Examples
cURL
# Search within a Knowledge
curl -X POST https://api.dhub.io/api/v1/knowledger/knowledges/knowledge-abc123/search \
-H "Authorization: Bearer <access_token>" \
-H "Content-Type: application/json" \
-d '{
"query": "How to configure data pipelines",
"search_mode": "HYBRID",
"top_k": 5
}'
# Start document file indexing
curl -X POST https://api.dhub.io/api/v1/knowledger/knowledges/knowledge-abc123/documents/doc-xyz/indexing/doc \
-H "Authorization: Bearer <access_token>" \
-F "file=@guide.pdf"
# List chunks
curl "https://api.dhub.io/api/v1/knowledger/knowledges/knowledge-abc123/documents/doc-xyz/chunks?page=1&page_size=20" \
-H "Authorization: Bearer <access_token>"
# Create a manual chunk
curl -X POST https://api.dhub.io/api/v1/knowledger/knowledges/knowledge-abc123/documents/doc-xyz/chunks \
-H "Authorization: Bearer <access_token>" \
-H "Content-Type: application/json" \
-d '{
"content": "This is manually added chunk content.",
"metadata": {"source": "manual"}
}'
# List embedding models
curl https://api.dhub.io/api/v1/knowledger/settings/embedding-models \
-H "Authorization: Bearer <access_token>"
Knowledge and Document metadata (name, tags, etc.) is managed by the Manager through the Knowledge API. Knowledge Builder handles only AI/RAG-related features such as indexing, chunks, and search.