Version: v0.1.0

API 튜토리얼

이 튜토리얼에서는 D.Hub API를 사용하여 데이터 수집부터 파이프라인 실행까지 전 과정을 단계별로 실습합니다. cURL 명령어를 중심으로 진행하며, 각 단계의 Python 코드도 함께 제공합니다.

사전 준비

D.Hub 인스턴스 접속 URL ({host})
사용자 계정 (username, password)
cURL 또는 Python 3.x 환경

Step 1: 로그인하여 토큰 획득

가장 먼저 API 인증에 필요한 JWT 토큰을 발급받습니다.

cURL

# 로그인
curl -X POST https://{host}/api/v1/auth/login \
  -H "Content-Type: application/json" \
  -d '{
    "username": "your-username",
    "password": "your-password"
  }'

응답

{
  "access_token": "eyJhbGciOiJIUzI1NiIs...",
  "refresh_token": "eyJhbGciOiJIUzI1NiIs...",
  "token_type": "bearer"
}

이후 모든 요청에서 사용할 토큰을 환경 변수로 설정합니다.

export TOKEN="eyJhbGciOiJIUzI1NiIs..."

Python

import requests

HOST = "https://{host}"

login_response = requests.post(
    f"{HOST}/api/v1/auth/login",
    json={"username": "your-username", "password": "your-password"},
)
tokens = login_response.json()
headers = {"Authorization": f"Bearer {tokens['access_token']}"}

Step 2: Collection 생성

데이터셋과 파이프라인을 관리할 컬렉션을 생성합니다.

cURL

curl -X POST https://{host}/api/v1/collections \
  -H "Authorization: Bearer ${TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "서울시 교통 분석",
    "description": "서울시 교통 데이터 수집 및 분석 프로젝트"
  }'

응답

{
  "id": "col-abc123",
  "name": "서울시 교통 분석",
  "description": "서울시 교통 데이터 수집 및 분석 프로젝트",
  "created_at": "2026-03-12T09:00:00Z"
}

export COLLECTION_ID="col-abc123"

Python

collection = requests.post(
    f"{HOST}/api/v1/collections",
    headers=headers,
    json={
        "name": "서울시 교통 분석",
        "description": "서울시 교통 데이터 수집 및 분석 프로젝트",
    },
).json()
collection_id = collection["id"]

Step 3: Dataset 생성 및 CSV 업로드

컬렉션 내에 데이터셋을 생성하고 CSV 파일을 업로드합니다.

3-1. 데이터셋 생성

curl -X POST https://{host}/api/v1/datasets \
  -H "Authorization: Bearer ${TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "교통량 데이터",
    "description": "서울시 주요 도로 교통량 측정 데이터",
    "collection_id": "'${COLLECTION_ID}'"
  }'

export DATASET_ID="ds-xyz789"

3-2. CSV 파일 업로드

curl -X POST https://{host}/api/v1/tables/${DATASET_ID}/upload \
  -H "Authorization: Bearer ${TOKEN}" \
  -F "file=@traffic_data.csv"

응답

{
  "message": "Upload successful",
  "rows": 1500,
  "columns": 8
}

Python

dataset = requests.post(
    f"{HOST}/api/v1/datasets",
    headers=headers,
    json={
        "name": "교통량 데이터",
        "description": "서울시 주요 도로 교통량 측정 데이터",
        "collection_id": collection_id,
    },
).json()
dataset_id = dataset["id"]

with open("traffic_data.csv", "rb") as f:
    upload = requests.post(
        f"{HOST}/api/v1/tables/{dataset_id}/upload",
        headers=headers,
        files={"file": ("traffic_data.csv", f, "text/csv")},
    )

Step 4: 데이터 조회

업로드된 데이터를 테이블 API로 조회합니다.

cURL

curl -X GET "https://{host}/api/v1/tables/${DATASET_ID}?page=1&page_size=10" \
  -H "Authorization: Bearer ${TOKEN}"

응답

{
  "columns": ["date", "road_name", "direction", "traffic_count", "avg_speed"],
  "data": [
    ["2026-03-01", "강남대로", "northbound", 12500, 35.2],
    ["2026-03-01", "테헤란로", "eastbound", 9800, 28.7]
  ],
  "total_rows": 1500
}

Python

table_data = requests.get(
    f"{HOST}/api/v1/tables/{dataset_id}",
    headers=headers,
    params={"page": 1, "page_size": 10},
).json()

Step 5: Pipeline 생성 및 실행

데이터를 가공하는 파이프라인을 생성하고 실행합니다.

5-1. 파이프라인 생성

curl -X POST https://{host}/api/v1/pipelines \
  -H "Authorization: Bearer ${TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "교통량 일별 집계",
    "description": "도로별 일일 교통량을 집계하는 파이프라인",
    "collection_id": "'${COLLECTION_ID}'"
  }'

export PIPELINE_ID="pl-def456"

5-2. 파이프라인 실행

curl -X POST https://{host}/api/v1/pipelines/${PIPELINE_ID}/run \
  -H "Authorization: Bearer ${TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{}'

응답

{
  "batch_id": "batch-ghi012",
  "status": "running",
  "started_at": "2026-03-12T09:15:00Z"
}

export BATCH_ID="batch-ghi012"

Python

pipeline = requests.post(
    f"{HOST}/api/v1/pipelines",
    headers=headers,
    json={
        "name": "교통량 일별 집계",
        "description": "도로별 일일 교통량을 집계하는 파이프라인",
        "collection_id": collection_id,
    },
).json()
pipeline_id = pipeline["id"]

run = requests.post(
    f"{HOST}/api/v1/pipelines/{pipeline_id}/run",
    headers=headers,
    json={},
).json()
batch_id = run["batch_id"]

파이프라인 스텝 구성

파이프라인의 세부 스텝(노드)은 웹 UI의 파이프라인 에디터에서 시각적으로 구성하는 것을 권장합니다. API로는 파이프라인 생성과 실행을 자동화하는 용도로 활용하세요.

Step 6: 실행 결과 확인

6-1. 배치 상태 조회

curl -X GET https://{host}/api/v1/batches/${BATCH_ID} \
  -H "Authorization: Bearer ${TOKEN}"

응답

{
  "id": "batch-ghi012",
  "pipeline_id": "pl-def456",
  "status": "completed",
  "started_at": "2026-03-12T09:15:00Z",
  "finished_at": "2026-03-12T09:15:32Z"
}

6-2. 실행 트레이스 조회

파이프라인 실행의 상세 로그와 각 노드의 처리 결과를 확인할 수 있습니다.

curl -X GET "https://{host}/api/v1/traces?pipeline_id=${PIPELINE_ID}&batch_id=${BATCH_ID}" \
  -H "Authorization: Bearer ${TOKEN}"

Python

import time

while True:
    batch = requests.get(
        f"{HOST}/api/v1/batches/{batch_id}",
        headers=headers,
    ).json()

    print(f"상태: {batch['status']}")
    if batch["status"] in ("completed", "failed"):
        break
    time.sleep(3)

traces = requests.get(
    f"{HOST}/api/v1/traces",
    headers=headers,
    params={"pipeline_id": pipeline_id, "batch_id": batch_id},
).json()

전체 흐름 요약

단계	API 엔드포인트	HTTP 메서드
로그인	`/api/v1/auth/login`	POST
Collection 생성	`/api/v1/collections`	POST
Dataset 생성	`/api/v1/datasets`	POST
CSV 업로드	`/api/v1/tables/{id}/upload`	POST
데이터 조회	`/api/v1/tables/{id}`	GET
Pipeline 생성	`/api/v1/pipelines`	POST
Pipeline 실행	`/api/v1/pipelines/{id}/run`	POST
배치 상태 조회	`/api/v1/batches/{id}`	GET
트레이스 조회	`/api/v1/traces`	GET

다음 단계

파이프라인 코드 노드 작성 → Python 가이드
SQL 기반 데이터 변환 → SQL 가이드
인증 심화 (서비스 토큰, 토큰 갱신) → API 인증
에러 발생 시 대응 → 에러 처리

Step 1: 로그인하여 토큰 획득​

cURL​

응답​

Python​

Step 2: Collection 생성​

cURL​

응답​

Python​

Step 3: Dataset 생성 및 CSV 업로드​

3-1. 데이터셋 생성​

3-2. CSV 파일 업로드​

응답​

Python​

Step 4: 데이터 조회​

cURL​

응답​

Python​

Step 5: Pipeline 생성 및 실행​

5-1. 파이프라인 생성​

5-2. 파이프라인 실행​

응답​

Python​

Step 6: 실행 결과 확인​

6-1. 배치 상태 조회​

응답​

6-2. 실행 트레이스 조회​

Python​

전체 흐름 요약​

다음 단계​

Step 1: 로그인하여 토큰 획득

cURL

응답

Python

Step 2: Collection 생성

cURL

응답

Python

Step 3: Dataset 생성 및 CSV 업로드

3-1. 데이터셋 생성

3-2. CSV 파일 업로드

응답

Python

Step 4: 데이터 조회

cURL

응답

Python

Step 5: Pipeline 생성 및 실행

5-1. 파이프라인 생성

5-2. 파이프라인 실행

응답

Python

Step 6: 실행 결과 확인

6-1. 배치 상태 조회

응답

6-2. 실행 트레이스 조회

Python

전체 흐름 요약

다음 단계