Version: v0.1.0

Knowledge Builder API

Knowledge Builder is an independent service that provides document indexing, chunk management, and search capabilities.

Separate Service

The Knowledge Builder API operates as a separate service from the Manager API. Please note the different Base URL.

Base URL: /api/v1/knowledger

Endpoint Summary

Group	Endpoint Count	Description
Knowledges	3	Knowledge deletion, search, indexing preview
Documents	2	Document deletion, search
Indexing	7	Web/document indexing start·status·delete, file download
Chunks	6	Chunk CRUD, list retrieval, bulk delete
Settings	4	Default settings, embedding model management
System	1	Health check

Knowledges

Performs Knowledge-level search, indexing preview, and storage deletion.

Method	Path	Description
DELETE	`/knowledges/{kid}`	Delete all chunks related to a Knowledge
POST	`/knowledges/{kid}/search`	Search within a Knowledge
POST	`/knowledges/{kid}/indexing/preview`	Indexing preview (chunking simulation without saving)

DELETE /knowledges/{kid}

Deletes all vector embeddings and text index data belonging to the Knowledge.

Response

204 No Content

POST /knowledges/{kid}/search

Searches for chunks across the entire Knowledge scope.

Request Body

Field	Type	Default	Description
`query`	string	-	Search query (1–1000 characters)
`search_mode`	string	`VECTOR`	Search mode: `VECTOR`, `TEXT`, `HYBRID`
`top_k`	integer	10	Number of results to return (1–100)
`score_threshold`	float	0.0	Minimum score threshold (0.0–1.0)

Response

200 OK

{
  "query": "How to use D.Hub",
  "search_mode": "VECTOR",
  "total_results": 3,
  "results": [
    {
      "chunk_id": "chunk-001",
      "document_id": "knowledge.document-xyz789",
      "content": "D.Hub is a data hub platform that...",
      "score": 0.92,
      "highlights": [],
      "metadata": {}
    }
  ]
}

POST /knowledges/{kid}/indexing/preview

Uploads a file to preview the chunking results. No actual saving is performed.

Query Parameters

Parameter	Type	Default	Description
`max_tokens`	integer	512	Maximum tokens per chunk (64–4096)
`max_preview`	integer	10	Number of preview chunks (1–50)
`strategy`	string	-	Chunking strategy (`fixed`, `markdown`, `hierarchical`, `hybrid`, `parent_child`)
`overlap_length`	integer	-	Overlap length (0–500)

Request Body

Upload file in multipart/form-data format. (PDF, DOCX, PPTX, XLSX, HTML, TXT, MD)

Response

200 OK

{
  "total_chunks": 42,
  "preview_chunks": [
    {
      "sequence": 1,
      "content": "First chunk content...",
      "token_count": 128,
      "char_count": 512
    }
  ],
  "truncated": true,
  "file_name": "guide.pdf"
}

Documents

Performs Document-level chunk deletion and search.

Method	Path	Description
DELETE	`/knowledges/{kid}/documents/{did}`	Delete chunks related to a Document
POST	`/knowledges/{kid}/documents/{did}/search`	Search within a Document

DELETE /knowledges/{kid}/documents/{did}

Deletes all chunks belonging to the Document from vector and text stores.

Response

204 No Content

POST /knowledges/{kid}/documents/{did}/search

Searches for chunks within a specific Document scope. The Request/Response format is the same as Knowledge search.

Indexing

Manages indexing (chunking → embedding → storing) for web URLs and document files.

All indexing endpoints are under /knowledges/{kid}/documents/{did}/indexing.

Web Indexing

Method	Path	Description
POST	`.../indexing/web`	Start web indexing
GET	`.../indexing/web`	Check web indexing status
DELETE	`.../indexing/web`	Delete web indexing

POST .../indexing/web

Starts web URL crawling and indexing. Crawls the URL set in the Document's options.url.

Response: 202 Accepted

{
  "message": "Web indexing started"
}

GET .../indexing/web

Retrieves the current web indexing status.

Response: 200 OK

{
  "status": "READY"
}

Document File Indexing

Method	Path	Description
POST	`.../indexing/doc`	Start document file indexing
GET	`.../indexing/doc`	Check document indexing status
DELETE	`.../indexing/doc`	Delete document indexing
GET	`.../indexing/doc/file`	Download original file

POST .../indexing/doc

Uploads a document file and starts indexing. If the file is omitted, the previously stored file is reused.

Request Body: multipart/form-data (file field, optional)

Response: 202 Accepted

{
  "message": "Document indexing started"
}

GET .../indexing/doc/file

Downloads the original file used for indexing.

Response: File stream (Content-Disposition: attachment)

Chunks

Directly manages chunks under a Document. Creation, modification, and deletion of chunks are available for MANUAL type documents.

All endpoints are under /knowledges/{kid}/documents/{did}/chunks.

Method	Path	Description
POST	`/chunks`	Create a chunk
GET	`/chunks`	List chunks (pagination)
GET	`/chunks/{id}`	Get chunk details
PATCH	`/chunks/{id}`	Update a chunk
DELETE	`/chunks/{id}`	Delete a chunk
DELETE	`/chunks`	Bulk delete chunks

POST /chunks

Creates a new chunk. If the VECTOR target is enabled, embeddings are automatically generated.

Request Body

{
  "content": "Chunk text content",
  "metadata": {"source": "manual", "page": 1}
}

Response

201 Created

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "content": "Chunk text content",
  "metadata": {"source": "manual", "page": 1},
  "created_at": "2024-06-15T10:30:00Z"
}

GET /chunks

Retrieves a paginated list of chunks.

Query Parameters

Parameter	Type	Default	Description
`page`	integer	1	Page number (starting from 1)
`page_size`	integer	20	Items per page (1–100)

Response

200 OK

{
  "data": [
    {
      "id": "550e8400-...",
      "type": "TEXT",
      "content": "Chunk content...",
      "metadata": {},
      "storages": ["TEXT", "VECTOR"],
      "created_at": "2024-06-15T10:30:00Z",
      "updated_at": null
    }
  ],
  "meta": {
    "total": 42,
    "page": 1,
    "page_size": 20,
    "total_pages": 3
  }
}

PATCH /chunks/{id}

Updates the content or metadata of an existing chunk. If the content is changed, embeddings are regenerated.

Request Body

{
  "content": "Updated chunk content",
  "metadata": {"reviewed": true}
}

Settings

Manages global settings for the Knowledge Builder service.

Method	Path	Description
GET	`/settings/defaults`	Get default processing options
PUT	`/settings/defaults`	Update default processing options
GET	`/settings/embedding-model`	Current embedding model info
GET	`/settings/embedding-models`	Available embedding model list

GET /settings/embedding-model

Retrieves detailed information about the currently configured embedding model.

Response

200 OK

{
  "model_name": "BAAI/bge-m3",
  "model_version": null,
  "dimensions": 1024,
  "max_tokens": 8192
}

GET /settings/embedding-models

Retrieves all available embedding models.

Response

200 OK

{
  "models": [
    {
      "id": "BAAI/bge-m3",
      "name": "BAAI/bge-m3",
      "provider": "local",
      "dimensions": 1024,
      "max_tokens": 8192,
      "supports_image": false,
      "is_default": true
    }
  ]
}

System

GET /system/health

Checks the health of the Knowledge Builder service.

Response

200 OK

{
  "status": "healthy",
  "timestamp": "2024-06-15T10:30:00Z",
  "components": {}
}

Usage Examples

cURL

# Search within a Knowledge
curl -X POST https://api.dhub.io/api/v1/knowledger/knowledges/knowledge-abc123/search \
  -H "Authorization: Bearer <access_token>" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "How to configure data pipelines",
    "search_mode": "HYBRID",
    "top_k": 5
  }'

# Start document file indexing
curl -X POST https://api.dhub.io/api/v1/knowledger/knowledges/knowledge-abc123/documents/doc-xyz/indexing/doc \
  -H "Authorization: Bearer <access_token>" \
  -F "file=@guide.pdf"

# List chunks
curl "https://api.dhub.io/api/v1/knowledger/knowledges/knowledge-abc123/documents/doc-xyz/chunks?page=1&page_size=20" \
  -H "Authorization: Bearer <access_token>"

# Create a manual chunk
curl -X POST https://api.dhub.io/api/v1/knowledger/knowledges/knowledge-abc123/documents/doc-xyz/chunks \
  -H "Authorization: Bearer <access_token>" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "This is manually added chunk content.",
    "metadata": {"source": "manual"}
  }'

# List embedding models
curl https://api.dhub.io/api/v1/knowledger/settings/embedding-models \
  -H "Authorization: Bearer <access_token>"

Relationship with Manager API

Knowledge and Document metadata (name, tags, etc.) is managed by the Manager through the Knowledge API. Knowledge Builder handles only AI/RAG-related features such as indexing, chunks, and search.

Endpoint Summary​

Knowledges​

DELETE /knowledges/{kid}​

Response​

POST /knowledges/{kid}/search​

Request Body​

Response​

POST /knowledges/{kid}/indexing/preview​

Query Parameters​

Request Body​

Response​

Documents​

DELETE /knowledges/{kid}/documents/{did}​

Response​

POST /knowledges/{kid}/documents/{did}/search​

Indexing​

Web Indexing​

POST .../indexing/web​

GET .../indexing/web​

Document File Indexing​

POST .../indexing/doc​

GET .../indexing/doc/file​

Chunks​

POST /chunks​

Request Body​

Response​

GET /chunks​

Query Parameters​

Response​

PATCH /chunks/{id}​

Request Body​

Settings​

GET /settings/embedding-model​

Response​

GET /settings/embedding-models​

Response​

System​

GET /system/health​

Response​

Usage Examples​

cURL​

Endpoint Summary

Knowledges

DELETE /knowledges/{kid}

Response

POST /knowledges/{kid}/search

Request Body

Response

POST /knowledges/{kid}/indexing/preview

Query Parameters

Request Body

Response

Documents

DELETE /knowledges/{kid}/documents/{did}

Response

POST /knowledges/{kid}/documents/{did}/search

Indexing

Web Indexing

POST .../indexing/web

GET .../indexing/web

Document File Indexing

POST .../indexing/doc

GET .../indexing/doc/file

Chunks

POST /chunks

Request Body

Response

GET /chunks

Query Parameters

Response

PATCH /chunks/{id}

Request Body

Settings

GET /settings/embedding-model

Response

GET /settings/embedding-models

Response

System

GET /system/health

Response

Usage Examples

cURL