Version: v0.1.0

Pipelines

D.Hub's pipelines provide a node-based visual editor for designing and executing data processing workflows. You can place code and datasets as nodes, define data flow with connection lines, and visually compose complex data processing workflows.

Key Features

The pipeline module provides the following features:

Visual Workflow Editing: Design data flows by dragging and dropping nodes and connecting them
Various Node Types: Combine Dataset nodes (Delta Lake, Kafka, etc.) with Code nodes (Python, SQL)
Automated Execution Management: Deploy pipelines through the workflow engine, and perform scheduling and monitoring
Execution History Tracking: Manage execution records by batch and review traces per step

Pipeline Types

D.Hub supports two types of pipelines:

Type	Description	Execution Method
Batch	Pipelines for batch data processing	Manual execution via Run/Stop buttons, automatic execution via Cron schedule
Event	Pipelines for event-driven real-time data processing	Register/Unregister event listeners

info

In Event pipelines, schedule settings and the Run History Bar are not displayed. Processing occurs automatically when events are received.

Pipeline List Screen

Manage created pipelines from the pipeline list page (/pipelines).

View Modes

View	Description
Card View	Displays pipelines as cards with a visual overview of recent execution status
Table View	Displays pipelines in list form for easy sorting and viewing of detailed information

Status Filters

You can filter pipelines by status to view only the items you need:

Filter	Description
All	Show all pipelines
Running	Pipelines currently in execution
Ready	Pipelines that have completed execution or are in standby
Failed	Pipelines whose last execution failed

info

When an Event pipeline is waiting to receive events, it is displayed with a Listening status. Pipelines with a registered schedule are indicated with a separate Schedule badge.

Architecture Overview

Pipelines operate through the following flow from editing to viewing execution results:

Components

Component	Role
Editor	Design workflows in the node-based visual editor
Workflow Engine	Converts saved pipeline definitions into workflows and executes them
Batch	An individual execution unit of a pipeline, each assigned a unique ID
Traces	Records execution logs, duration, and input/output information for each step

Node Types

Pipelines are composed of two core node types:

Dataset Node (Input/Output)

Serves as the source or sink for data.

Type	Description
Delta Lake	Batch data processing that reads from or writes to Delta Lake tables
Kafka	Real-time streaming data integration through Kafka topics
DDS	Real-time distributed system integration through the DDS interface
REST API	External HTTP endpoint calls

Code Node (Processing Logic)

Performs logic to transform and process input data.

Type	Use Case
Python	General-purpose data processing, ML model application, external API calls
SQL	Data transformation, aggregation, joins, filtering

Basic Pipeline Pattern

The most basic pipeline configuration follows the Input → Processing → Output pattern:

Common Use Cases

Pattern	Input	Processing	Output	Description
ETL	Delta Lake	Python/SQL	Delta Lake	Extract, transform, and load data
Real-time Transformation	Kafka	Python	Kafka	Real-time streaming data processing
Aggregation Reporting	Delta Lake	SQL	Delta Lake	Daily/monthly statistics aggregation
ML Inference	Delta Lake	Python	Delta Lake	Run predictions with trained models

tip

When designing pipelines, breaking them into small units, testing each one, and then combining them makes debugging and maintenance much easier.

Key Screens

Screen	Path	Description
Pipeline List	`/pipelines`	View, create, and delete all pipelines
Workflow Editor	`/pipelines/edit`	Node-based visual editing and execution

Next Steps

Workflow Editor - Editor screen layout and usage
Adding Nodes - How to add and configure each node type
Connecting Nodes - Rules for connecting data flow between nodes
Running Workflows - Manual execution and monitoring
Scheduling - Setting up periodic execution schedules
Debugging - Reviewing execution history and analyzing errors
Pipeline Settings - Metadata and deployment configuration

Key Features​

Pipeline Types​

Pipeline List Screen​

View Modes​

Status Filters​

Architecture Overview​

Components​

Node Types​

Dataset Node (Input/Output)​

Code Node (Processing Logic)​

Basic Pipeline Pattern​

Common Use Cases​

Key Screens​

Next Steps​