Pipelines
D.Hub's pipelines provide a node-based visual editor for designing and executing data processing workflows. You can place code and datasets as nodes, define data flow with connection lines, and visually compose complex data processing workflows.
Key Features
The pipeline module provides the following features:
- Visual Workflow Editing: Design data flows by dragging and dropping nodes and connecting them
- Various Node Types: Combine Dataset nodes (Delta Lake, Kafka, etc.) with Code nodes (Python, SQL)
- Automated Execution Management: Deploy pipelines through the workflow engine, and perform scheduling and monitoring
- Execution History Tracking: Manage execution records by batch and review traces per step
Pipeline Types
D.Hub supports two types of pipelines:
| Type | Description | Execution Method |
|---|---|---|
| Batch | Pipelines for batch data processing | Manual execution via Run/Stop buttons, automatic execution via Cron schedule |
| Event | Pipelines for event-driven real-time data processing | Register/Unregister event listeners |
In Event pipelines, schedule settings and the Run History Bar are not displayed. Processing occurs automatically when events are received.
Pipeline List Screen
Manage created pipelines from the pipeline list page (/pipelines).
View Modes
| View | Description |
|---|---|
| Card View | Displays pipelines as cards with a visual overview of recent execution status |
| Table View | Displays pipelines in list form for easy sorting and viewing of detailed information |
Status Filters
You can filter pipelines by status to view only the items you need:
| Filter | Description |
|---|---|
| All | Show all pipelines |
| Running | Pipelines currently in execution |
| Ready | Pipelines that have completed execution or are in standby |
| Failed | Pipelines whose last execution failed |
When an Event pipeline is waiting to receive events, it is displayed with a Listening status. Pipelines with a registered schedule are indicated with a separate Schedule badge.
Architecture Overview
Pipelines operate through the following flow from editing to viewing execution results:
Components
| Component | Role |
|---|---|
| Editor | Design workflows in the node-based visual editor |
| Workflow Engine | Converts saved pipeline definitions into workflows and executes them |
| Batch | An individual execution unit of a pipeline, each assigned a unique ID |
| Traces | Records execution logs, duration, and input/output information for each step |
Node Types
Pipelines are composed of two core node types:
Dataset Node (Input/Output)
Serves as the source or sink for data.
| Type | Description |
|---|---|
| Delta Lake | Batch data processing that reads from or writes to Delta Lake tables |
| Kafka | Real-time streaming data integration through Kafka topics |
| DDS | Real-time distributed system integration through the DDS interface |
| REST API | External HTTP endpoint calls |
Code Node (Processing Logic)
Performs logic to transform and process input data.
| Type | Use Case |
|---|---|
| Python | General-purpose data processing, ML model application, external API calls |
| SQL | Data transformation, aggregation, joins, filtering |
Basic Pipeline Pattern
The most basic pipeline configuration follows the Input → Processing → Output pattern:
Common Use Cases
| Pattern | Input | Processing | Output | Description |
|---|---|---|---|---|
| ETL | Delta Lake | Python/SQL | Delta Lake | Extract, transform, and load data |
| Real-time Transformation | Kafka | Python | Kafka | Real-time streaming data processing |
| Aggregation Reporting | Delta Lake | SQL | Delta Lake | Daily/monthly statistics aggregation |
| ML Inference | Delta Lake | Python | Delta Lake | Run predictions with trained models |
When designing pipelines, breaking them into small units, testing each one, and then combining them makes debugging and maintenance much easier.
Key Screens
| Screen | Path | Description |
|---|---|---|
| Pipeline List | /pipelines | View, create, and delete all pipelines |
| Workflow Editor | /pipelines/edit | Node-based visual editing and execution |
Next Steps
- Workflow Editor - Editor screen layout and usage
- Adding Nodes - How to add and configure each node type
- Connecting Nodes - Rules for connecting data flow between nodes
- Running Workflows - Manual execution and monitoring
- Scheduling - Setting up periodic execution schedules
- Debugging - Reviewing execution history and analyzing errors
- Pipeline Settings - Metadata and deployment configuration