Datasets API
A Dataset is a storage unit for data managed in D.Hub. It is based on the Delta Lake format and manages schema and partition options.
1. List Datasets
Retrieves a list of all datasets.
Request
GET /datasets/
Response
{
"items": { ... },
"token": "..."
}
2. Create Dataset
Creates a new dataset.
Request
POST /datasets/
Body Schema (Dataset)
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Dataset name |
alias | string | No | Alias |
type | string | No | Dataset type (e.g., table, stream) |
schema | object | Yes | Data schema definition (JSON Schema format) |
options | object | No | Delta Lake configuration options |
tags | array[string] | No | List of tags |
Options Object
| Field | Description |
|---|---|
partitions | List of partition columns (comma-separated string) |
log_retention_duration | Log retention period (e.g., interval 30 days) |
deleted_file_retention_duration | Deleted file retention period |
Example
{
"name": "customer_transactions",
"type": "table",
"schema": {
"type": "struct",
"fields": [
{"name": "id", "type": "string", "nullable": false, "metadata": {}},
{"name": "amount", "type": "double", "nullable": true, "metadata": {}},
{"name": "trans_date", "type": "string", "nullable": true, "metadata": {}}
]
},
"options": {
"partitions": "trans_date",
"log_retention_duration": "interval 7 days"
}
}
3. Update Dataset
Updates dataset metadata. (Note: Schema changes may require compatibility review.)
Request
PUT /datasets/{dataset_id}
Body Schema (DatasetUpdate)
| Field | Type | Description |
|---|---|---|
schema | object | Schema change |
options | object | Options change |
tags | array[string] | Tags change |
| ... | ... | (Other basic fields) |
4. Delete Dataset
Deletes a dataset. (Whether actual data files are deleted depends on the policy)
Request
DELETE /datasets/{dataset_id}