Version: v0.1.0

Version Control

All resources in D.Hub (Dataset, Code, Pipeline, Knowledge) have their change history managed by version. This document explains D.Hub's version management system and how to use it.

Version Control Overview

D.Hub provides two layers of version management.

Manifest-Based Version Control

Metadata and configuration information for all resources are managed as Manifests. Manifests are stored on S3-compatible object storage, and change history is automatically recorded through object versioning.

When a resource is created, an initial manifest is stored.
When metadata (name, tags, description, etc.) is modified, a new version of the manifest is created.
Previous manifest versions are retained and not deleted.

Delta Lake Table Version Control

The actual data of datasets is managed as Delta Lake tables. Delta Lake tracks data versions through its own transaction log.

New versions are created when CSV uploads, data additions, or schema changes occur.
Each version is assigned a unique version number and timestamp.
Time Travel: You can query past data by a specific point in time or version number.

info

Manifest versions and Delta Lake table versions are managed independently. Manifests track metadata changes of resources, while Delta Lake tracks actual data changes.

Version Control by Resource

Dataset Versions

Datasets have two levels of versioning.

Version Type	Managed Target	Tracking Method
Manifest Version	Metadata such as name, schema, tags, description	S3 Object Versioning
Table Version	Actual table data (rows/columns)	Delta Lake Transaction Log

When referencing a dataset as input in a pipeline, you can specify a particular version number to use data from an exact point in time.

Code Versions

Code artifacts are stored version by version in object storage.

A new version is automatically created when code content is modified.
You can check the modification time for each version in the version list.
Selecting two versions lets you compare changes in the Monaco DiffEditor (Side-by-side / Unified mode).
You can download the code file for a specific version.

Pipeline Versions

Pipeline configuration information (step list, dependencies, options, etc.) is managed as a manifest.

New versions are created when steps are added/removed or dependencies change.
The configuration at the time of execution is used when a pipeline runs.

Knowledge Versions

Knowledge resources also have versions managed on a manifest basis. Actions such as adding documents and changing settings are recorded as versions.

Viewing the Version List

Click the Versions tab in a resource's detail screen to view the complete version list for that resource.

Each version entry displays the following information:

Item	Description
Version ID	Unique version identifier
Latest	Whether it is the latest version
Last Modified	Timestamp when the version was created

Viewing a Specific Version

Click an entry in the version list to view the resource state at that point in time.

Dataset: Schema and data preview for that version
Code: Code content for that version
Pipeline: Step configuration for that version

Version Restore

You can restore a previous version as the current version. Select the version to restore from the version list and click the Restore button. A confirmation modal is displayed. After confirmation, the state of that version is applied as the latest version.

tip

For datasets, you can select a specific version from the Delta Lake table version list to preview data at that point in time. This allows you to track data change history.

Checkpoint

A Checkpoint is a mechanism that records the execution state of each step during a pipeline's batch processing.

Checkpoint Structure

Each checkpoint contains the following information:

Field	Description
State	Execution state (running, completed, failed, etc.)
Offsets	Data processing offsets (for resuming from where it left off on restart)
Start Time	Execution start time
End Time	Execution end time
Comment	Additional notes

Checkpoint Usage

Checkpoints are useful in the following scenarios:

Failure Recovery: When an error occurs during pipeline execution, you can restart from the last checkpoint.
Progress Tracking: Monitor progress through each step's processing offset.
Execution History Analysis: Analyze past batch execution start/end times, throughput, etc.

warning

Checkpoints are a batch processing-only feature. Checkpoints are not created for single (ad-hoc) pipeline executions.

Next Steps

Datasets — Dataset schema and data management
Codes — Code artifact management
Pipelines (Collection) — Pipeline status review
Running Pipelines — Batch execution and monitoring

Version Control Overview​

Manifest-Based Version Control​

Delta Lake Table Version Control​

Version Control by Resource​

Dataset Versions​

Code Versions​

Pipeline Versions​

Knowledge Versions​

Viewing the Version List​

Viewing a Specific Version​

Version Restore​

Checkpoint​

Checkpoint Structure​

Checkpoint Usage​

Next Steps​