Skip to main content
Version: v0.1.0

SQL Query Guide

In D.Hub, you can use standard SQL for dataset exploration and in the SQL transformation nodes of pipelines.

SQL Transformation Node

Used to filter or aggregate data within a pipeline.

Structure

  • Table name: Connected input datasets are referenced by the default table name input, or by a user-specified alias.
  • Result: The query execution result (the result of the SELECT statement) is passed to the next node.

Example

-- Filter only items where 'status' is 'active' from the input data
SELECT
id,
name,
created_at
FROM
input
WHERE
status = 'active'

Dataset Exploration (SQL Scratch Pad)

You can run ad-hoc queries to inspect data from the 'Data' tab on the dataset detail page.

  • Constraints: Only SELECT queries are allowed.
  • Table: You must use the current dataset's table name.
SELECT * FROM my_dataset_table LIMIT 100

Useful SQL Functions

These are the key functions available in D.Hub's analytics engine. They can be used in dashboard query mode, SQL nodes, and dataset exploration.

Date/Time Functions

FunctionDescriptionExample
today()Today's dateWHERE date = today()
now()Current timeWHERE created_at > now() - INTERVAL 1 HOUR
toStartOfMonth(date)First day of the monthGROUP BY toStartOfMonth(date)
toStartOfWeek(date)First day of the weekGROUP BY toStartOfWeek(date)
toStartOfHour(datetime)Start of the hourGROUP BY toStartOfHour(ts)
toYYYYMM(date)Convert to YYYYMM integerSELECT toYYYYMM(date)
dateDiff('day', d1, d2)Date differencedateDiff('day', start, end)
formatDateTime(dt, fmt)Date formattingformatDateTime(dt, '%Y-%m-%d')

Aggregate Functions

FunctionDescriptionExample
count()Row countCOUNT(*)
sum(col)SumSUM(amount)
avg(col)AverageAVG(price)
min(col) / max(col)Min/MaxMIN(temperature)
uniq(col)Approximate distinct countuniq(user_id)
uniqExact(col)Exact distinct countuniqExact(session_id)
quantile(0.95)(col)Quantilequantile(0.95)(latency)
groupArray(col)Collect array per groupgroupArray(tag)
argMax(col, val)col when val is maximumargMax(name, score)

String Functions

FunctionDescriptionExample
lower(s) / upper(s)Lowercase/uppercase conversionlower(name)
trim(s)Trim whitespace from both endstrim(input_str)
substring(s, offset, len)Substringsubstring(code, 1, 3)
concat(s1, s2)String concatenationconcat(first, ' ', last)
like(s, pattern)Pattern matchingWHERE name LIKE '%Seoul%'
match(s, regexp)Regex matchingWHERE match(url, '^/api/')
splitByChar(sep, s)Split by charactersplitByChar(',', tags)
replaceAll(s, from, to)String replacementreplaceAll(text, '\n', ' ')

Array Functions

FunctionDescriptionExample
length(arr)Array lengthlength(tags)
arrayJoin(arr)Expand array → rowsSELECT arrayJoin(items)
has(arr, elem)Contains checkWHERE has(tags, 'urgent')
arrayMap(f, arr)Array maparrayMap(x -> x * 2, values)
arrayFilter(f, arr)Array filterarrayFilter(x -> x > 0, values)

JSON Functions

FunctionDescriptionExample
JSONExtractString(json, key)Extract stringJSONExtractString(data, 'name')
JSONExtractInt(json, key)Extract integerJSONExtractInt(data, 'count')
JSONExtractFloat(json, key)Extract floatJSONExtractFloat(data, 'score')
JSONExtractBool(json, key)Extract booleanJSONExtractBool(data, 'active')
JSONExtractArrayRaw(json, key)Extract arrayJSONExtractArrayRaw(data, 'items')

Performance Optimization Tips

D.Hub's analytics engine is a column-oriented OLAP database optimized for aggregate queries on large datasets.

  • SELECT only needed columns: Specifying required columns instead of SELECT * significantly improves query performance.
  • Use LIMIT: Always add LIMIT for exploratory queries.
  • Use WHERE clauses: Place filter conditions as early as possible to reduce unnecessary data scans.
  • Choose appropriate aggregate functions: Use uniq instead of uniqExact when exact distinct values are not needed.