SQL Query Guide
In D.Hub, you can use standard SQL for dataset exploration and in the SQL transformation nodes of pipelines.
SQL Transformation Node
Used to filter or aggregate data within a pipeline.
Structure
- Table name: Connected input datasets are referenced by the default table name
input, or by a user-specified alias. - Result: The query execution result (the result of the
SELECTstatement) is passed to the next node.
Example
-- Filter only items where 'status' is 'active' from the input data
SELECT
id,
name,
created_at
FROM
input
WHERE
status = 'active'
Dataset Exploration (SQL Scratch Pad)
You can run ad-hoc queries to inspect data from the 'Data' tab on the dataset detail page.
- Constraints: Only
SELECTqueries are allowed. - Table: You must use the current dataset's table name.
SELECT * FROM my_dataset_table LIMIT 100
Useful SQL Functions
These are the key functions available in D.Hub's analytics engine. They can be used in dashboard query mode, SQL nodes, and dataset exploration.
Date/Time Functions
| Function | Description | Example |
|---|---|---|
today() | Today's date | WHERE date = today() |
now() | Current time | WHERE created_at > now() - INTERVAL 1 HOUR |
toStartOfMonth(date) | First day of the month | GROUP BY toStartOfMonth(date) |
toStartOfWeek(date) | First day of the week | GROUP BY toStartOfWeek(date) |
toStartOfHour(datetime) | Start of the hour | GROUP BY toStartOfHour(ts) |
toYYYYMM(date) | Convert to YYYYMM integer | SELECT toYYYYMM(date) |
dateDiff('day', d1, d2) | Date difference | dateDiff('day', start, end) |
formatDateTime(dt, fmt) | Date formatting | formatDateTime(dt, '%Y-%m-%d') |
Aggregate Functions
| Function | Description | Example |
|---|---|---|
count() | Row count | COUNT(*) |
sum(col) | Sum | SUM(amount) |
avg(col) | Average | AVG(price) |
min(col) / max(col) | Min/Max | MIN(temperature) |
uniq(col) | Approximate distinct count | uniq(user_id) |
uniqExact(col) | Exact distinct count | uniqExact(session_id) |
quantile(0.95)(col) | Quantile | quantile(0.95)(latency) |
groupArray(col) | Collect array per group | groupArray(tag) |
argMax(col, val) | col when val is maximum | argMax(name, score) |
String Functions
| Function | Description | Example |
|---|---|---|
lower(s) / upper(s) | Lowercase/uppercase conversion | lower(name) |
trim(s) | Trim whitespace from both ends | trim(input_str) |
substring(s, offset, len) | Substring | substring(code, 1, 3) |
concat(s1, s2) | String concatenation | concat(first, ' ', last) |
like(s, pattern) | Pattern matching | WHERE name LIKE '%Seoul%' |
match(s, regexp) | Regex matching | WHERE match(url, '^/api/') |
splitByChar(sep, s) | Split by character | splitByChar(',', tags) |
replaceAll(s, from, to) | String replacement | replaceAll(text, '\n', ' ') |
Array Functions
| Function | Description | Example |
|---|---|---|
length(arr) | Array length | length(tags) |
arrayJoin(arr) | Expand array → rows | SELECT arrayJoin(items) |
has(arr, elem) | Contains check | WHERE has(tags, 'urgent') |
arrayMap(f, arr) | Array map | arrayMap(x -> x * 2, values) |
arrayFilter(f, arr) | Array filter | arrayFilter(x -> x > 0, values) |
JSON Functions
| Function | Description | Example |
|---|---|---|
JSONExtractString(json, key) | Extract string | JSONExtractString(data, 'name') |
JSONExtractInt(json, key) | Extract integer | JSONExtractInt(data, 'count') |
JSONExtractFloat(json, key) | Extract float | JSONExtractFloat(data, 'score') |
JSONExtractBool(json, key) | Extract boolean | JSONExtractBool(data, 'active') |
JSONExtractArrayRaw(json, key) | Extract array | JSONExtractArrayRaw(data, 'items') |
Performance Optimization Tips
D.Hub's analytics engine is a column-oriented OLAP database optimized for aggregate queries on large datasets.
- SELECT only needed columns: Specifying required columns instead of
SELECT *significantly improves query performance. - Use LIMIT: Always add
LIMITfor exploratory queries. - Use WHERE clauses: Place filter conditions as early as possible to reduce unnecessary data scans.
- Choose appropriate aggregate functions: Use
uniqinstead ofuniqExactwhen exact distinct values are not needed.