Skip to main content
Version: v0.1.0

Pipelines

D.Hub's pipelines provide a node-based visual editor for designing and executing data processing workflows. You can place code and datasets as nodes, define data flow with connection lines, and visually compose complex data processing workflows.

Key Features

The pipeline module provides the following features:

  • Visual Workflow Editing: Design data flows by dragging and dropping nodes and connecting them
  • Various Node Types: Combine Dataset nodes (Delta Lake, Kafka, etc.) with Code nodes (Python, SQL)
  • Automated Execution Management: Deploy pipelines through the workflow engine, and perform scheduling and monitoring
  • Execution History Tracking: Manage execution records by batch and review traces per step

Pipeline Types

D.Hub supports two types of pipelines:

TypeDescriptionExecution Method
BatchPipelines for batch data processingManual execution via Run/Stop buttons, automatic execution via Cron schedule
EventPipelines for event-driven real-time data processingRegister/Unregister event listeners
info

In Event pipelines, schedule settings and the Run History Bar are not displayed. Processing occurs automatically when events are received.

Pipeline List Screen

Manage created pipelines from the pipeline list page (/pipelines).

View Modes

ViewDescription
Card ViewDisplays pipelines as cards with a visual overview of recent execution status
Table ViewDisplays pipelines in list form for easy sorting and viewing of detailed information

Status Filters

You can filter pipelines by status to view only the items you need:

FilterDescription
AllShow all pipelines
RunningPipelines currently in execution
ReadyPipelines that have completed execution or are in standby
FailedPipelines whose last execution failed
info

When an Event pipeline is waiting to receive events, it is displayed with a Listening status. Pipelines with a registered schedule are indicated with a separate Schedule badge.

Architecture Overview

Pipelines operate through the following flow from editing to viewing execution results:

Components

ComponentRole
EditorDesign workflows in the node-based visual editor
Workflow EngineConverts saved pipeline definitions into workflows and executes them
BatchAn individual execution unit of a pipeline, each assigned a unique ID
TracesRecords execution logs, duration, and input/output information for each step

Node Types

Pipelines are composed of two core node types:

Dataset Node (Input/Output)

Serves as the source or sink for data.

TypeDescription
Delta LakeBatch data processing that reads from or writes to Delta Lake tables
KafkaReal-time streaming data integration through Kafka topics
DDSReal-time distributed system integration through the DDS interface
REST APIExternal HTTP endpoint calls

Code Node (Processing Logic)

Performs logic to transform and process input data.

TypeUse Case
PythonGeneral-purpose data processing, ML model application, external API calls
SQLData transformation, aggregation, joins, filtering

Basic Pipeline Pattern

The most basic pipeline configuration follows the Input → Processing → Output pattern:

Common Use Cases

PatternInputProcessingOutputDescription
ETLDelta LakePython/SQLDelta LakeExtract, transform, and load data
Real-time TransformationKafkaPythonKafkaReal-time streaming data processing
Aggregation ReportingDelta LakeSQLDelta LakeDaily/monthly statistics aggregation
ML InferenceDelta LakePythonDelta LakeRun predictions with trained models
tip

When designing pipelines, breaking them into small units, testing each one, and then combining them makes debugging and maintenance much easier.

Key Screens

ScreenPathDescription
Pipeline List/pipelinesView, create, and delete all pipelines
Workflow Editor/pipelines/editNode-based visual editing and execution

Next Steps