Skip to main content

Datasets API

A Dataset is a storage unit for data managed in D.Hub. It is based on the Delta Lake format and manages schema and partition options.

1. List Datasets

Retrieves a list of all datasets.

Request

GET /datasets/

Response

{
"items": { ... },
"token": "..."
}

2. Create Dataset

Creates a new dataset.

Request

POST /datasets/

Body Schema (Dataset)

FieldTypeRequiredDescription
namestringYesDataset name
aliasstringNoAlias
typestringNoDataset type (e.g., table, stream)
schemaobjectYesData schema definition (JSON Schema format)
optionsobjectNoDelta Lake configuration options
tagsarray[string]NoList of tags

Options Object

FieldDescription
partitionsList of partition columns (comma-separated string)
log_retention_durationLog retention period (e.g., interval 30 days)
deleted_file_retention_durationDeleted file retention period

Example

{
"name": "customer_transactions",
"type": "table",
"schema": {
"type": "struct",
"fields": [
{"name": "id", "type": "string", "nullable": false, "metadata": {}},
{"name": "amount", "type": "double", "nullable": true, "metadata": {}},
{"name": "trans_date", "type": "string", "nullable": true, "metadata": {}}
]
},
"options": {
"partitions": "trans_date",
"log_retention_duration": "interval 7 days"
}
}

3. Update Dataset

Updates dataset metadata. (Note: Schema changes may require compatibility review.)

Request

PUT /datasets/{dataset_id}

Body Schema (DatasetUpdate)

FieldTypeDescription
schemaobjectSchema change
optionsobjectOptions change
tagsarray[string]Tags change
......(Other basic fields)

4. Delete Dataset

Deletes a dataset. (Whether actual data files are deleted depends on the policy)

Request

DELETE /datasets/{dataset_id}