Tables API
API for data table querying and uploading.
Overview
The Tables API allows you to query data stored in D.Hub and upload new data. Data is stored in a version-controlled table format, and fast queries can be performed through the analytics engine.
Endpoints
POST /api/v1/tables/schema
Automatically infers the schema from a file to be uploaded.
Query Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
format | string | parquet | File format (csv, parquet) |
Request Body
Upload file in multipart/form-data format.
| Field | Type | Description |
|---|---|---|
files | file[] | Files to infer schema from |
Response
200 OK
{
"fields": [
{"name": "id", "type": "int64", "nullable": false},
{"name": "name", "type": "string", "nullable": true},
{"name": "price", "type": "double", "nullable": true},
{"name": "created_at", "type": "timestamp", "nullable": true}
]
}
GET /api/v1/tables/{table_id}/versions
Retrieves the version history of a table (Delta Lake versions).
Path Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
table_id | string | Yes | Table (dataset) ID |
Response
200 OK
[
{
"version": 3,
"timestamp": "2024-01-20T10:30:00Z",
"operation": "WRITE",
"operationParameters": {"mode": "Append"},
"readVersion": 2
},
{
"version": 2,
"timestamp": "2024-01-19T15:00:00Z",
"operation": "WRITE",
"operationParameters": {"mode": "Append"},
"readVersion": 1
}
]
GET /api/v1/tables/{table_id}
Retrieves all data from a table.
Path Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
table_id | string | Yes | Table (dataset) ID |
Query Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
format | string | csv | Response format (csv, json, parquet, arrow) |
Response
Returns data in the specified format.
CSV (default)
id,name,price
1,Product A,10000
2,Product B,20000
JSON
[
{"id": 1, "name": "Product A", "price": 10000},
{"id": 2, "name": "Product B", "price": 20000}
]
POST /api/v1/tables/{table_id}/query
Executes an SQL query to retrieve data.
Path Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
table_id | string | Yes | Table (dataset) ID |
Query Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
format | string | json | Response format (csv, json, parquet, arrow) |
Request Body
{
"query": "SELECT category, SUM(price) as total FROM sales GROUP BY category",
"limit": 1000
}
| Field | Type | Required | Description |
|---|---|---|---|
query | string | Yes | SQL query (D.Hub SQL syntax) |
limit | integer | No | Result row count limit |
Response
Returns query results in the specified format.
PUT /api/v1/tables/{table_id}/upload
Uploads data to a table.
Path Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
table_id | string | Yes | Table (dataset) ID |
Query Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
format | string | parquet | File format (csv, json, parquet) |
mode | string | append | Write mode (append, overwrite) |
Request Body
Upload file in multipart/form-data format.
| Field | Type | Description |
|---|---|---|
files | file[] | Data files to upload |
Response
200 OK
{
"message": "Data inserted successfully"
}
Write Modes
| Mode | Description |
|---|---|
append | Append to existing data |
overwrite | Overwrite existing data |
POST /api/v1/tables/{table_id}/sink
Starts a data sink (streaming ingestion).
Path Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
table_id | string | Yes | Table (dataset) ID |
Response
200 OK
{
"message": "Sink created successfully"
}
DELETE /api/v1/tables/{table_id}/sink
Stops and deletes a data sink.
Path Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
table_id | string | Yes | Table (dataset) ID |
Response
200 OK
{
"message": "Sink deleted successfully"
}
GET /api/v1/tables/{table_id}/sink
Retrieves the status of a data sink.
Path Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
table_id | string | Yes | Table (dataset) ID |
Response
200 OK
{
"states": {
"replica-0": "Running",
"replica-1": "Running"
}
}
Usage Examples
cURL
# Schema inference
curl -X POST https://api.dhub.io/api/v1/tables/schema?format=csv \
-H "Authorization: Bearer <access_token>" \
-F "files=@data.csv"
# Query table (JSON)
curl "https://api.dhub.io/api/v1/tables/dataset-abc123?format=json" \
-H "Authorization: Bearer <access_token>"
# Execute SQL query
curl -X POST https://api.dhub.io/api/v1/tables/dataset-abc123/query \
-H "Authorization: Bearer <access_token>" \
-H "Content-Type: application/json" \
-d '{"query": "SELECT * FROM sales WHERE amount > 1000", "limit": 100}'
# Upload data (CSV, append)
curl -X PUT "https://api.dhub.io/api/v1/tables/dataset-abc123/upload?format=csv&mode=append" \
-H "Authorization: Bearer <access_token>" \
-F "files=@new_data.csv"
Python
import requests
import pandas as pd
BASE_URL = "https://api.dhub.io/api/v1"
headers = {"Authorization": f"Bearer {access_token}"}
# Query data
response = requests.get(
f"{BASE_URL}/tables/dataset-abc123",
headers=headers,
params={"format": "json"}
)
df = pd.DataFrame(response.json())
# Execute SQL query
response = requests.post(
f"{BASE_URL}/tables/dataset-abc123/query",
headers=headers,
json={
"query": "SELECT category, SUM(amount) as total FROM sales GROUP BY category",
"limit": 1000
}
)
result = pd.DataFrame(response.json())
# Upload data
with open("data.parquet", "rb") as f:
response = requests.put(
f"{BASE_URL}/tables/dataset-abc123/upload",
headers=headers,
params={"format": "parquet", "mode": "append"},
files={"files": f}
)
Supported File Formats
| Format | Extension | Read | Upload | Description |
|---|---|---|---|---|
| CSV | .csv | O | O | Text-based, highly versatile |
| JSON | .json | O | O | Supports nested structures |
| Parquet | .parquet | O | O | Column-oriented, efficient compression |
| Arrow | .arrow | O | X | Memory-efficient transfer |