Skip to main content

Tables API

API for data table retrieval and upload.

Overview

Through the Tables API, you can retrieve data stored in D.Hub and upload new data. Data is stored in Delta Lake format, and fast analytical queries can be performed through ClickHouse.

Endpoints

POST /api/v1/tables/schema

Automatically infers schema from the file to be uploaded.

Query Parameters

ParameterTypeDefaultDescription
formatstringparquetFile format (csv, parquet)

Request Body

Upload file in multipart/form-data format.

FieldTypeDescription
filesfile[]Files to infer schema from

Response

200 OK

{
"fields": [
{"name": "id", "type": "int64", "nullable": false},
{"name": "name", "type": "string", "nullable": true},
{"name": "price", "type": "double", "nullable": true},
{"name": "created_at", "type": "timestamp", "nullable": true}
]
}

GET /api/v1/tables/{table_id}/versions

Retrieves the version history of a table (Delta Lake versions).

Path Parameters

ParameterTypeRequiredDescription
table_idstringYesTable (dataset) ID

Response

200 OK

[
{
"version": 3,
"timestamp": "2024-01-20T10:30:00Z",
"operation": "WRITE",
"operationParameters": {"mode": "Append"},
"readVersion": 2
},
{
"version": 2,
"timestamp": "2024-01-19T15:00:00Z",
"operation": "WRITE",
"operationParameters": {"mode": "Append"},
"readVersion": 1
}
]

GET /api/v1/tables/{table_id}

Retrieves all data from a table.

Path Parameters

ParameterTypeRequiredDescription
table_idstringYesTable (dataset) ID

Query Parameters

ParameterTypeDefaultDescription
formatstringcsvResponse format (csv, json, parquet, arrow)

Response

Returns data in the specified format.

CSV (default)

id,name,price
1,Product A,10000
2,Product B,20000

JSON

[
{"id": 1, "name": "Product A", "price": 10000},
{"id": 2, "name": "Product B", "price": 20000}
]

POST /api/v1/tables/{table_id}/query

Executes an SQL query to retrieve data.

Path Parameters

ParameterTypeRequiredDescription
table_idstringYesTable (dataset) ID

Query Parameters

ParameterTypeDefaultDescription
formatstringjsonResponse format (csv, json, parquet, arrow)

Request Body

{
"query": "SELECT category, SUM(price) as total FROM sales GROUP BY category",
"limit": 1000
}
FieldTypeRequiredDescription
querystringYesSQL query (ClickHouse syntax)
limitintegerNoResult row limit

Response

Returns query results in the specified format.


PUT /api/v1/tables/{table_id}/upload

Uploads data to a table.

Path Parameters

ParameterTypeRequiredDescription
table_idstringYesTable (dataset) ID

Query Parameters

ParameterTypeDefaultDescription
formatstringparquetFile format (csv, json, parquet)
modestringappendWrite mode (append, overwrite)

Request Body

Upload file in multipart/form-data format.

FieldTypeDescription
filesfile[]Data files to upload

Response

200 OK

{
"message": "Data inserted successfully"
}

Write Modes

ModeDescription
appendAppend to existing data
overwriteOverwrite existing data

POST /api/v1/tables/{table_id}/sink

Starts a data sink (streaming ingestion).

Path Parameters

ParameterTypeRequiredDescription
table_idstringYesTable (dataset) ID

Response

200 OK

{
"message": "Sink created successfully"
}

DELETE /api/v1/tables/{table_id}/sink

Stops and deletes the data sink.

Path Parameters

ParameterTypeRequiredDescription
table_idstringYesTable (dataset) ID

Response

200 OK

{
"message": "Sink deleted successfully"
}

GET /api/v1/tables/{table_id}/sink

Retrieves the status of the data sink.

Path Parameters

ParameterTypeRequiredDescription
table_idstringYesTable (dataset) ID

Response

200 OK

{
"states": {
"replica-0": "Running",
"replica-1": "Running"
}
}

Usage Examples

cURL

# Schema inference
curl -X POST https://api.dhub.io/api/v1/tables/schema?format=csv \
-H "Authorization: Bearer <access_token>" \
-F "files=@data.csv"

# Table retrieval (JSON)
curl "https://api.dhub.io/api/v1/tables/dataset-abc123?format=json" \
-H "Authorization: Bearer <access_token>"

# SQL query execution
curl -X POST https://api.dhub.io/api/v1/tables/dataset-abc123/query \
-H "Authorization: Bearer <access_token>" \
-H "Content-Type: application/json" \
-d '{"query": "SELECT * FROM sales WHERE amount > 1000", "limit": 100}'

# Data upload (CSV, append)
curl -X PUT "https://api.dhub.io/api/v1/tables/dataset-abc123/upload?format=csv&mode=append" \
-H "Authorization: Bearer <access_token>" \
-F "files=@new_data.csv"

Python

import requests
import pandas as pd

BASE_URL = "https://api.dhub.io/api/v1"
headers = {"Authorization": f"Bearer {access_token}"}

# Data retrieval
response = requests.get(
f"{BASE_URL}/tables/dataset-abc123",
headers=headers,
params={"format": "json"}
)
df = pd.DataFrame(response.json())

# SQL query execution
response = requests.post(
f"{BASE_URL}/tables/dataset-abc123/query",
headers=headers,
json={
"query": "SELECT category, SUM(amount) as total FROM sales GROUP BY category",
"limit": 1000
}
)
result = pd.DataFrame(response.json())

# Data upload
with open("data.parquet", "rb") as f:
response = requests.put(
f"{BASE_URL}/tables/dataset-abc123/upload",
headers=headers,
params={"format": "parquet", "mode": "append"},
files={"files": f}
)

Supported File Formats

FormatExtensionReadUploadDescription
CSV.csvOOText-based, highly versatile
JSON.jsonOOSupports nested structures
Parquet.parquetOOColumn-oriented, efficient compression
Arrow.arrowOXMemory-efficient transfer