Version: v0.1.0

Core Concepts

Here are the core concepts you need to know to effectively use D.Hub. Understanding the role of each component and how they connect to each other will help you get up to speed with the platform faster.

Component Relationship Diagram

The diagram below shows how the core components of D.Hub are connected.

Collection

A Collection is the top-level container in D.Hub for logically grouping resources. You can bundle related resources together by any criteria you choose, such as project, department, or topic.

A single collection can contain the following resources:

Dataset — Structured data tables
Code — Reusable code artifacts
Pipeline — Data processing workflows
Knowledge — Unstructured document knowledge stores

By using collections, you can systematically organize data assets by team or project and manage access permissions in bulk.

→ Learn more: Collection Management

Dataset

A Dataset is the unit for storing and managing structured data. Datasets in D.Hub are internally stored in Delta Lake table format, which automatically supports version control and schema management.

Ways to create a dataset:

CSV File Upload — Upload a local CSV file and the schema is automatically inferred and converted to a table
Template-Based Creation — Use predefined templates to quickly set up a dataset structure
API Call — Create programmatically via the REST API

Each dataset has a unique version history, and you can query and analyze data using SQL or Python queries.

→ Learn more: Dataset Wizard

Code

Code is a resource for managing reusable code artifacts within D.Hub. You can write Python scripts and SQL queries, and connect them as nodes in pipelines for data processing.

Supported code types:

Type	Purpose
Python	Data transformation, API calls, complex business logic implementation
SQL	Data querying, aggregation, join processing between tables

You can also use the AI Assistant's code generation feature to describe requirements in natural language and automatically generate code.

→ Learn more: Code Wizard

Pipeline

A Pipeline is a workflow system for visually designing and executing data processing flows. Build data flows using drag-and-drop in the node-based editor, and run them automatically through the workflow engine.

Key components of a pipeline:

Node — Individual processing steps (data reading, transformation, storage, etc.)
Edge — Data flow connections between nodes
Run — A single execution instance of the pipeline

Pipelines can be automatically triggered by schedule or events in addition to manual execution, enabling you to fully automate repetitive data processing.

→ Learn more: Pipeline Workflow Editor

Ontology

Ontology is a feature for modeling semantic relationships between data as Entities and Relationships. The defined models are stored in a graph database, enabling intuitive exploration of complex data relationships.

Key components of an ontology:

Entity — Represents real-world objects (e.g., user, product, sensor)
Relationship — Defines connections between entities (e.g., "owns", "located at")
Property — Detailed information assigned to entities and relationships

Build models visually in the Ontology Builder and interactively explore relationship networks in the Graph Explorer.

→ Learn more: Ontology Overview

Knowledge

Knowledge is a feature for collecting unstructured data, turning it into knowledge, and using it for AI-based search and conversation. Documents can be collected through various methods including web page crawling, file upload, and manual entry. They are then automatically chunked and embedded, and stored in a vector database.

Collected knowledge can be queried in natural language through RAG (Retrieval-Augmented Generation)-based AI Chat. It searches for relevant documents based on user questions and the LLM generates context-appropriate answers.

Supported document collection methods:

Web Crawling — Automatically collect web pages by specifying a URL
File Upload — Directly upload document files such as PDF, DOCX, TXT
Manual Entry — Manual document input through the editor

→ Learn more: Knowledge Management Overview

Dashboard

A Dashboard is a feature for visually representing and monitoring data. It uses an analytics database as its backend to perform fast real-time aggregation and visualization on large-scale data.

Key dashboard features:

Widget — Various visualization components such as charts, tables, and metrics
Data Connection — Connect to datasets via SQL queries or simple mode
Real-Time Updates — Visualization automatically updates when data changes

Select a chart type from the widget library and map dataset columns to build intuitive dashboards.

→ Learn more: Dashboard Overview

Next Steps

Now that you understand the core concepts, try using the platform yourself.

Explore the First Screen — Take a look at D.Hub's screen layout
Quick Start — Complete a dataset upload to dashboard in just 5 minutes
Role-Based Guide — Choose a learning path that matches your role

Component Relationship Diagram​

Collection​

Dataset​

Code​

Pipeline​

Ontology​

Knowledge​

Dashboard​

Next Steps​