Connecting Nodes
In pipelines, connections (Edges) between nodes define the direction of data flow and dependencies. Proper connections ensure that data flows naturally from the source through processing stages to the final destination.
Creating Edges (Connections)
Create connections between nodes by dragging from an output port to an input port:
- Hover the mouse over the right handle (output port) of the upstream node.
- When the handle becomes active, click and drag.
- Drop onto the left handle (input port) of the downstream node.
- A connection line (Edge) is created, establishing the data dependency.
When dragging over a port that cannot be connected, a visual rejection indicator appears. Only valid targets are highlighted.
Data Flow Rules
The following rules apply to connections between nodes:
Allowed Connections
| Source Node | Target Node | Data Flow |
|---|---|---|
| Dataset → | Code | Read dataset as input to the code node |
| Code → | Dataset | Write the code node's processing results to a dataset |
| Code → | Code | Pass intermediate processing results to the next code node |
Disallowed Connections
| Source Node | Target Node | Restriction Reason |
|---|---|---|
| Dataset → | Dataset | Direct connection between datasets is not allowed — a Code node is required in between |
If connections between nodes form a circular structure, the pipeline cannot be saved. For example, a structure like A → B → C → A, where the data flow returns to its starting point, is not allowed because it would cause an infinite loop.
Auto Mapping
When connecting a Code node and a Dataset node, input/output variables are automatically mapped if the schemas are compatible.
How Auto Mapping Works
- On Connection: When a Code node and Dataset node are connected, the input/output variable names in the code are compared with the dataset ID.
- Schema Compatibility Check: The system checks whether the connected dataset's schema (column names, data types) matches the format expected by the code.
- Mapping Applied: If compatible, variable mapping is automatically configured.
Auto mapping can be reviewed and modified in the Options tab of the Inspector panel. Manually change the mapping if it is incorrect.
Multiple Inputs/Outputs
Multiple Dataset nodes can be connected to a single Code node.
Multiple Input Example
- Connect multiple datasets as inputs to a single Code node to perform joins, merges, and other operations.
- Each input is referenced as a separate variable within the Code node.
Multiple Output Example
- A single Code node can distribute results to multiple datasets based on conditions.
Edge Management
Deleting Connections
- Click a connection line (Edge) to select it, then press the
Deletekey. - Or right-click the connection line and select Delete from the menu.
Changing Connections
To change an existing connection to a different node:
- Delete the existing connection.
- Create a new connection to the new target node.
Checking Connection Status
You can view all input/output connections for the selected node in the Options tab of the Inspector panel:
| Item | Description |
|---|---|
| Upstream | List of preceding nodes that provide data to the current node |
| Downstream | List of subsequent nodes that receive results from the current node |
| Variable Mapping | Mapping status between code variables and datasets |