버전: Next

API 튜토리얼

이 튜토리얼에서는 D.Hub API로 데이터 수집부터 파이프라인 실행까지 전 과정을 단계별로 실습합니다. cURL을 중심으로 진행하며, 각 단계의 Python 코드도 함께 제공합니다.

사전 준비

D.Hub 인스턴스 접속 URL ({host})
사용자 계정 (email, password)
cURL 또는 Python 3.x 환경 (cURL 트랙은 JSON 파싱용 jq 권장)

Placeholder 표기 안내

cURL 예시에는 {host}, you@example.com, col-abc123 같은 placeholder가 등장합니다. 컬렉션·데이터셋·파이프라인 ID는 각 step의 응답에서 결정되므로, 아래 패턴처럼 환경 변수로 이어 받으세요 — placeholder를 그대로 복붙하면 동작하지 않습니다.

RESP=$(curl -s -X POST https://{host}/api/v1/collections ...)
COLLECTION_ID=$(echo "$RESP" | jq -r .id)
echo "COLLECTION_ID=$COLLECTION_ID"

{host}와 계정 정보만 환경에 맞게 한 번 치환하면 나머지 ID는 jq로 자동 추출됩니다.

Step 1: 로그인하여 토큰 획득

가장 먼저 API 인증에 필요한 JWT 토큰을 발급받습니다. 로그인은 이메일과 비밀번호로 합니다.

cURL

RESP=$(curl -s -X POST https://{host}/api/v1/auth/login \
  -H "Content-Type: application/json" \
  -d '{"email": "you@example.com", "password": "your-password"}')
TOKEN=$(echo "$RESP" | jq -r .access_token)
echo "TOKEN=${TOKEN:0:24}..."

응답

{
  "access_token": "eyJhbGciOiJIUzI1NiIs...",
  "refresh_token": "eyJhbGciOiJIUzI1NiIs...",
  "token_type": "bearer"
}

Python

import requests

HOST = "https://{host}"

login_response = requests.post(
    f"{HOST}/api/v1/auth/login",
    json={"email": "you@example.com", "password": "your-password"},
)
tokens = login_response.json()
headers = {"Authorization": f"Bearer {tokens['access_token']}"}

Step 2: Collection 생성

데이터셋과 파이프라인을 관리할 컬렉션을 생성합니다.

cURL

RESP=$(curl -s -X POST https://{host}/api/v1/collections \
  -H "Authorization: Bearer ${TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{"name": "서울시 교통 분석", "description": "서울시 교통 데이터 수집 및 분석 프로젝트"}')
COLLECTION_ID=$(echo "$RESP" | jq -r .id)
echo "COLLECTION_ID=$COLLECTION_ID"

응답

{
  "id": "col-abc123",
  "name": "서울시 교통 분석",
  "description": "서울시 교통 데이터 수집 및 분석 프로젝트",
  "created_at": "2026-03-12T09:00:00Z"
}

Python

collection = requests.post(
    f"{HOST}/api/v1/collections",
    headers=headers,
    json={
        "name": "서울시 교통 분석",
        "description": "서울시 교통 데이터 수집 및 분석 프로젝트",
    },
).json()
collection_id = collection["id"]

Step 3: Dataset 생성 및 CSV 업로드

컬렉션 내에 데이터셋을 생성하고 CSV 파일을 업로드합니다.

3-1. 데이터셋 생성

RESP=$(curl -s -X POST https://{host}/api/v1/datasets \
  -H "Authorization: Bearer ${TOKEN}" \
  -H "Content-Type: application/json" \
  -d "{\"name\": \"교통량 데이터\", \"description\": \"서울시 주요 도로 교통량\", \"collection_id\": \"${COLLECTION_ID}\"}")
DATASET_ID=$(echo "$RESP" | jq -r .id)
echo "DATASET_ID=$DATASET_ID"

3-2. CSV 파일 업로드

업로드는 데이터셋 하위 엔드포인트에 multipart/form-data로 보내며, 파일 필드 이름은 files 입니다.

curl -X POST https://{host}/api/v1/datasets/${DATASET_ID}/upload \
  -H "Authorization: Bearer ${TOKEN}" \
  -F "files=@traffic_data.csv"

Python

dataset = requests.post(
    f"{HOST}/api/v1/datasets",
    headers=headers,
    json={
        "name": "교통량 데이터",
        "description": "서울시 주요 도로 교통량",
        "collection_id": collection_id,
    },
).json()
dataset_id = dataset["id"]

with open("traffic_data.csv", "rb") as f:
    requests.post(
        f"{HOST}/api/v1/datasets/{dataset_id}/upload",
        headers=headers,
        files={"files": ("traffic_data.csv", f, "text/csv")},
    )

Step 4: 데이터 조회

업로드된 데이터셋의 테이블을 조회합니다(커서 기반 페이지네이션 — limit).

cURL

curl -X GET "https://{host}/api/v1/datasets/${DATASET_ID}/table?limit=10" \
  -H "Authorization: Bearer ${TOKEN}"

Python

table_data = requests.get(
    f"{HOST}/api/v1/datasets/{dataset_id}/table",
    headers=headers,
    params={"limit": 10},
).json()

SQL로 조회

컬럼 선택·필터·집계가 필요하면 POST /api/v1/datasets/{id}/table/query에 {"query": "SELECT ...", "limit": N} 형태로 SQL을 보낼 수 있습니다. SQL 문법은 SQL 가이드를 참고하세요.

Step 5: Pipeline 생성 및 실행

데이터를 가공하는 파이프라인을 생성하고 실행합니다.

5-1. 파이프라인 생성

RESP=$(curl -s -X POST https://{host}/api/v1/pipelines \
  -H "Authorization: Bearer ${TOKEN}" \
  -H "Content-Type: application/json" \
  -d "{\"name\": \"교통량 일별 집계\", \"collection_id\": \"${COLLECTION_ID}\"}")
PIPELINE_ID=$(echo "$RESP" | jq -r .id)
echo "PIPELINE_ID=$PIPELINE_ID"

5-2. 파이프라인 실행 (배치)

파이프라인 실행은 배치(batch) 엔드포인트로 시작합니다.

RESP=$(curl -s -X POST https://{host}/api/v1/pipelines/${PIPELINE_ID}/batch \
  -H "Authorization: Bearer ${TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{}')
BATCH_ID=$(echo "$RESP" | jq -r .batch_id)
echo "BATCH_ID=$BATCH_ID"

Python

pipeline = requests.post(
    f"{HOST}/api/v1/pipelines",
    headers=headers,
    json={"name": "교통량 일별 집계", "collection_id": collection_id},
).json()
pipeline_id = pipeline["id"]

run = requests.post(
    f"{HOST}/api/v1/pipelines/{pipeline_id}/batch",
    headers=headers,
    json={},
).json()
batch_id = run["batch_id"]

파이프라인 스텝 구성

파이프라인의 세부 스텝(노드)은 웹 UI의 파이프라인 에디터에서 시각적으로 구성하는 것을 권장합니다. API는 생성과 실행을 자동화하는 용도로 활용하세요.

Step 6: 실행 결과 확인

6-1. 배치 상태 조회

curl -X GET https://{host}/api/v1/pipelines/${PIPELINE_ID}/batch \
  -H "Authorization: Bearer ${TOKEN}"

6-2. 실행 트레이스 조회

특정 배치의 단계별 처리 결과(트레이스)를 확인합니다.

curl -X GET https://{host}/api/v1/trace/pipelines/${PIPELINE_ID}/batches/${BATCH_ID} \
  -H "Authorization: Bearer ${TOKEN}"

Python

import time

while True:
    batch = requests.get(
        f"{HOST}/api/v1/pipelines/{pipeline_id}/batch",
        headers=headers,
    ).json()

    print(f"상태: {batch.get('status')}")
    if batch.get("status") in ("completed", "failed"):
        break
    time.sleep(3)

trace = requests.get(
    f"{HOST}/api/v1/trace/pipelines/{pipeline_id}/batches/{batch_id}",
    headers=headers,
).json()

전체 흐름 요약

단계	API 엔드포인트	HTTP 메서드
로그인	`/api/v1/auth/login`	POST
Collection 생성	`/api/v1/collections`	POST
Dataset 생성	`/api/v1/datasets`	POST
CSV 업로드	`/api/v1/datasets/{id}/upload`	POST
데이터 조회	`/api/v1/datasets/{id}/table`	GET
Pipeline 생성	`/api/v1/pipelines`	POST
Pipeline 실행	`/api/v1/pipelines/{id}/batch`	POST
배치 상태 조회	`/api/v1/pipelines/{id}/batch`	GET
트레이스 조회	`/api/v1/trace/pipelines/{id}/batches/{batch_id}`	GET

다음 단계

파이프라인 코드 노드 작성 → Python 가이드
SQL 기반 데이터 변환 → SQL 가이드
인증 심화 (서비스 토큰, 토큰 갱신) → API 인증
에러 발생 시 대응 → 에러 처리
동일 흐름을 UI + 온톨로지로 → covid19 온톨로지 튜토리얼
도구별 클라이언트 reference → API 클라이언트 도구

Step 1: 로그인하여 토큰 획득​

cURL​

응답​

Python​

Step 2: Collection 생성​

cURL​

응답​

Python​

Step 3: Dataset 생성 및 CSV 업로드​

3-1. 데이터셋 생성​

3-2. CSV 파일 업로드​

Python​

Step 4: 데이터 조회​

cURL​

Python​

Step 5: Pipeline 생성 및 실행​

5-1. 파이프라인 생성​

5-2. 파이프라인 실행 (배치)​

Python​

Step 6: 실행 결과 확인​

6-1. 배치 상태 조회​

6-2. 실행 트레이스 조회​

Python​

전체 흐름 요약​

다음 단계​

Step 1: 로그인하여 토큰 획득

cURL

응답

Python

Step 2: Collection 생성

cURL

응답

Python

Step 3: Dataset 생성 및 CSV 업로드

3-1. 데이터셋 생성

3-2. CSV 파일 업로드

Python

Step 4: 데이터 조회

cURL

Python

Step 5: Pipeline 생성 및 실행

5-1. 파이프라인 생성

5-2. 파이프라인 실행 (배치)

Python

Step 6: 실행 결과 확인

6-1. 배치 상태 조회

6-2. 실행 트레이스 조회

Python

전체 흐름 요약

다음 단계