Tables API
API for data table retrieval and upload.
Overview
Through the Tables API, you can retrieve data stored in D.Hub and upload new data. Data is stored in Delta Lake format, and fast analytical queries can be performed through ClickHouse.
Endpoints
POST /api/v1/tables/schema
Automatically infers schema from the file to be uploaded.
Query Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
format | string | parquet | File format (csv, parquet) |
Request Body
Upload file in multipart/form-data format.
| Field | Type | Description |
|---|---|---|
files | file[] | Files to infer schema from |
Response
200 OK
{
"fields": [
{"name": "id", "type": "int64", "nullable": false},
{"name": "name", "type": "string", "nullable": true},
{"name": "price", "type": "double", "nullable": true},
{"name": "created_at", "type": "timestamp", "nullable": true}
]
}
GET /api/v1/tables/{table_id}/versions
Retrieves the version history of a table (Delta Lake versions).
Path Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
table_id | string | Yes | Table (dataset) ID |
Response
200 OK
[
{
"version": 3,
"timestamp": "2024-01-20T10:30:00Z",
"operation": "WRITE",
"operationParameters": {"mode": "Append"},
"readVersion": 2
},
{
"version": 2,
"timestamp": "2024-01-19T15:00:00Z",
"operation": "WRITE",
"operationParameters": {"mode": "Append"},
"readVersion": 1
}
]
GET /api/v1/tables/{table_id}
Retrieves all data from a table.
Path Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
table_id | string | Yes | Table (dataset) ID |
Query Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
format | string | csv | Response format (csv, json, parquet, arrow) |
Response
Returns data in the specified format.
CSV (default)
id,name,price
1,Product A,10000
2,Product B,20000
JSON
[
{"id": 1, "name": "Product A", "price": 10000},
{"id": 2, "name": "Product B", "price": 20000}
]
POST /api/v1/tables/{table_id}/query
Executes an SQL query to retrieve data.
Path Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
table_id | string | Yes | Table (dataset) ID |
Query Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
format | string | json | Response format (csv, json, parquet, arrow) |
Request Body
{
"query": "SELECT category, SUM(price) as total FROM sales GROUP BY category",
"limit": 1000
}
| Field | Type | Required | Description |
|---|---|---|---|
query | string | Yes | SQL query (ClickHouse syntax) |
limit | integer | No | Result row limit |
Response
Returns query results in the specified format.
PUT /api/v1/tables/{table_id}/upload
Uploads data to a table.
Path Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
table_id | string | Yes | Table (dataset) ID |
Query Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
format | string | parquet | File format (csv, json, parquet) |
mode | string | append | Write mode (append, overwrite) |
Request Body
Upload file in multipart/form-data format.
| Field | Type | Description |
|---|---|---|
files | file[] | Data files to upload |
Response
200 OK
{
"message": "Data inserted successfully"
}
Write Modes
| Mode | Description |
|---|---|
append | Append to existing data |
overwrite | Overwrite existing data |
POST /api/v1/tables/{table_id}/sink
Starts a data sink (streaming ingestion).
Path Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
table_id | string | Yes | Table (dataset) ID |
Response
200 OK
{
"message": "Sink created successfully"
}
DELETE /api/v1/tables/{table_id}/sink
Stops and deletes the data sink.
Path Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
table_id | string | Yes | Table (dataset) ID |
Response
200 OK
{
"message": "Sink deleted successfully"
}
GET /api/v1/tables/{table_id}/sink
Retrieves the status of the data sink.
Path Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
table_id | string | Yes | Table (dataset) ID |
Response
200 OK
{
"states": {
"replica-0": "Running",
"replica-1": "Running"
}
}
Usage Examples
cURL
# Schema inference
curl -X POST https://api.dhub.io/api/v1/tables/schema?format=csv \
-H "Authorization: Bearer <access_token>" \
-F "files=@data.csv"
# Table retrieval (JSON)
curl "https://api.dhub.io/api/v1/tables/dataset-abc123?format=json" \
-H "Authorization: Bearer <access_token>"
# SQL query execution
curl -X POST https://api.dhub.io/api/v1/tables/dataset-abc123/query \
-H "Authorization: Bearer <access_token>" \
-H "Content-Type: application/json" \
-d '{"query": "SELECT * FROM sales WHERE amount > 1000", "limit": 100}'
# Data upload (CSV, append)
curl -X PUT "https://api.dhub.io/api/v1/tables/dataset-abc123/upload?format=csv&mode=append" \
-H "Authorization: Bearer <access_token>" \
-F "files=@new_data.csv"
Python
import requests
import pandas as pd
BASE_URL = "https://api.dhub.io/api/v1"
headers = {"Authorization": f"Bearer {access_token}"}
# Data retrieval
response = requests.get(
f"{BASE_URL}/tables/dataset-abc123",
headers=headers,
params={"format": "json"}
)
df = pd.DataFrame(response.json())
# SQL query execution
response = requests.post(
f"{BASE_URL}/tables/dataset-abc123/query",
headers=headers,
json={
"query": "SELECT category, SUM(amount) as total FROM sales GROUP BY category",
"limit": 1000
}
)
result = pd.DataFrame(response.json())
# Data upload
with open("data.parquet", "rb") as f:
response = requests.put(
f"{BASE_URL}/tables/dataset-abc123/upload",
headers=headers,
params={"format": "parquet", "mode": "append"},
files={"files": f}
)
Supported File Formats
| Format | Extension | Read | Upload | Description |
|---|---|---|---|---|
| CSV | .csv | O | O | Text-based, highly versatile |
| JSON | .json | O | O | Supports nested structures |
| Parquet | .parquet | O | O | Column-oriented, efficient compression |
| Arrow | .arrow | O | X | Memory-efficient transfer |