Core Concepts
Here are the core concepts you need to know to effectively use D.Hub. Understanding the role of each component and how they connect to each other will help you get up to speed with the platform faster.
Component Relationship Diagram
The diagram below shows how the core components of D.Hub are connected.
Collection
A Collection is the top-level container in D.Hub for logically grouping resources. You can bundle related resources together by any criteria you choose, such as project, department, or topic.
A single collection can contain the following resources:
- Dataset — Structured data tables
- Code — Reusable code artifacts
- Pipeline — Data processing workflows
- Knowledge — Unstructured document knowledge stores
By using collections, you can systematically organize data assets by team or project and manage access permissions in bulk.
→ Learn more: Collection Management
Dataset
A Dataset is the unit for storing and managing structured data. Datasets in D.Hub are internally stored in Delta Lake table format, which automatically supports version control and schema management.
Ways to create a dataset:
- CSV File Upload — Upload a local CSV file and the schema is automatically inferred and converted to a table
- Template-Based Creation — Use predefined templates to quickly set up a dataset structure
- API Call — Create programmatically via the REST API
Each dataset has a unique version history, and you can query and analyze data using SQL or Python queries.
→ Learn more: Dataset Wizard
Code
Code is a resource for managing reusable code artifacts within D.Hub. You can write Python scripts and SQL queries, and connect them as nodes in pipelines for data processing.
Supported code types:
| Type | Purpose |
|---|---|
| Python | Data transformation, API calls, complex business logic implementation |
| SQL | Data querying, aggregation, join processing between tables |
You can also use the AI Assistant's code generation feature to describe requirements in natural language and automatically generate code.
→ Learn more: Code Wizard
Pipeline
A Pipeline is a workflow system for visually designing and executing data processing flows. Build data flows using drag-and-drop in the node-based editor, and run them automatically through the workflow engine.
Key components of a pipeline:
- Node — Individual processing steps (data reading, transformation, storage, etc.)
- Edge — Data flow connections between nodes
- Run — A single execution instance of the pipeline
Pipelines can be automatically triggered by schedule or events in addition to manual execution, enabling you to fully automate repetitive data processing.
→ Learn more: Pipeline Workflow Editor
Ontology
Ontology is a feature for modeling semantic relationships between data as Entities and Relationships. The defined models are stored in a graph database, enabling intuitive exploration of complex data relationships.
Key components of an ontology:
- Entity — Represents real-world objects (e.g., user, product, sensor)
- Relationship — Defines connections between entities (e.g., "owns", "located at")
- Property — Detailed information assigned to entities and relationships
Build models visually in the Ontology Builder and interactively explore relationship networks in the Graph Explorer.
→ Learn more: Ontology Overview
Knowledge
Knowledge is a feature for collecting unstructured data, turning it into knowledge, and using it for AI-based search and conversation. Documents can be collected through various methods including web page crawling, file upload, and manual entry. They are then automatically chunked and embedded, and stored in a vector database.
Collected knowledge can be queried in natural language through RAG (Retrieval-Augmented Generation)-based AI Chat. It searches for relevant documents based on user questions and the LLM generates context-appropriate answers.
Supported document collection methods:
- Web Crawling — Automatically collect web pages by specifying a URL
- File Upload — Directly upload document files such as PDF, DOCX, TXT
- Manual Entry — Manual document input through the editor
→ Learn more: Knowledge Management Overview
Dashboard
A Dashboard is a feature for visually representing and monitoring data. It uses an analytics database as its backend to perform fast real-time aggregation and visualization on large-scale data.
Key dashboard features:
- Widget — Various visualization components such as charts, tables, and metrics
- Data Connection — Connect to datasets via SQL queries or simple mode
- Real-Time Updates — Visualization automatically updates when data changes
Select a chart type from the widget library and map dataset columns to build intuitive dashboards.
→ Learn more: Dashboard Overview
Next Steps
Now that you understand the core concepts, try using the platform yourself.
- Explore the First Screen — Take a look at D.Hub's screen layout
- Quick Start — Complete a dataset upload to dashboard in just 5 minutes
- Role-Based Guide — Choose a learning path that matches your role