Data Modeling

Data modeling is a technique of organizing data in a logical pattern that allows for the efficient storage, retrieval, and manipulation of data within a data warehouse.

Curved lines on a blue background.

Data modeling concepts

The design and construction of the logical structure of data in a database includes tables, columns, and relationships between the data. There are several different data modeling approaches, each with its own set of advantages and disadvantages. It is worth noting that these methodologies are not mutually exclusive, and each project may use multiple variations of approaches. The choice of which to use depends on the specific requirements and constraints of the organization. Some of the most common data modeling approaches include:

Conceptual data modeling: focuses on identifying the main entities and relationships within a system, and is used to create high-level views of the data.

Logical data modeling: focuses on creating detailed representations of the data structures and relationships, regardless of the choice of technologies or database management systems.

Physical data modeling: focuses on creating detailed representations of the data structures and relationships, while taking into account the specific technologies and database management systems that will be used to implement the model.

NoSQL data modeling: optimized for scalability and high performance in non-relational databases. It is modeled with different data structures, including documents, key-values, and graphs, mainly used for big data and real-time analytics.

Dimensional data modeling: optimized for read performance. It is typically modeled in a star or snowflake schema with a coarse-grained granularity, mainly dealing with slowly changing dimensions, and used for reporting and analytics.

Relational data modeling: optimized for write performance. It is typically modeled in third normal form (3NF), with fine-grained granularity, mainly dealing with slowly changing facts, and used for transactional systems.

Data vault modeling: optimized for scalability and change. It is modeled in a hub-and-spoke structure, with hybrid granularity, mainly dealing with full history of data, and used for reporting and analytics.

Data modeling in data warehousing

In a data warehouse, it is recommended to implement a central core layer. Its purpose is the integration of data from multiple source systems and the historization of the data. The core is the only source for the data marts that are used for BI applications such as reporting platforms or OLAP tools.

Data marts are usually designed as a dimensional data model with dimensions and facts. This model is either implemented as a star schema or as a multidimensional cube. But what kind of data model is appropriate for the core layer? There is no best approach that suits for all analytical data solutions, but different modeling techniques are commonly used, each has its advantages depending on the requirements and the modeling strategy:

  1. Dimensional Modeling with dimensions and fact tables
  2. Relational Modeling with master data versioning
  3. Data Vault Modeling with hubs, satellites and links

Dimensional data modeling

Dimensional data modeling, also known as the Star Schema / Kimball approach, has been proven to be adapted to reporting needs by modelling the data in a optimized way for user queries over the last two decades. It is a set of techniques and concepts proposed by Ralph Kimball, and is widely used in data warehouse design.

The architecture contains the following main layers:

  • Operational Data Store (ODS): for each data source, the needed data is stored without transformation. The data warehouse is then based on the operational data stores.
  • Data warehouse: based on the source data (source system or ODS based on source system) and transforms the data in a snowflake schema (core: facts and entities) and a star schema (mart: fact tables and dimensions)
  • Data mart: a structure / access pattern specific to data warehouse environments, used to retrieve client-facing data.
  • Test project: used to create test cases on your other projects ODS, data warehouse, data vault etc.

Relational data modeling

A relational data warehouse is based on the source data (source system or ODS based on source system) and transforms the data in a relational schema.

Relational data modeling is usually used for operational systems (OLTP systems) that store their data on relational databases. But it can only be used for Data Warehouses – most structured DWH databases are implemented with relational database systems.

The relational data warehouse approach differs from the star schema / Kimball approach in the following aspects:

  • It is a snow flake schema
  • The modelling is done in third normal form
  • The designing is mostly done with entities and rarely with facts
  • Master data entities are split up in two tables: header table, which contains the business key and all SCD0 and SCD1 attributes, and version table (optional), which contains all SCD2 attributes, if existing. Conclusion, historization is done in the CORE layer in the version tables of an entity
  • Transactional data can be modeled as entities or facts, depending on the loading requirements (Merge or Insert only or different fact load strategies)

Data vault data modeling

Data vault modeling, invented by Dan Linstedt in 1990, is used to model the enterprise data warehouse Core layer. This approach is suitable for multi-source environments needing a fast adaptation to changes. The model consists of a complete denormalization of three base constructs into separate entities:

  1. Hub: the business key defining a core business concept. This key is related to a business entity, not to a particular source system.
  2. Link: the business natural relationship between business core entities.
  3. Satellite: the contextual information around keys and relations. It contains the descriptive information of a business entity.

Data Vault should store "all data, any time", and the data should not be cleansed. A data vault data warehouse is based on the source data (source system or ODS based on source system) and transforms the data in a data vault modeling (hubs, links, satellites in a raw vault area and a business vault area).

  • The business transformations are only in the Business Vault area
  • Complete historization of all attributes
  • The data warehouse can be multi-sourced
  • Ability to scale the model

Summary

Each data modeling approach has its own set of advantages and disadvantages. The decision of which one to use depends on the specific requirements and constraints of your organization:

Dimensional Data Modeling Relational Data Modeling Data Vault Modeling
Structure Star or Snowflake Schema Third Normal Form (3NF) Hub-and-Spoke
Granularity Coarse-grained Fine-grained Hybrid
History Slowly changing dimensions Slowly changing dimensions Full history
Governance Limited Medium Strong
Performance Optimized for read Optimized for read Optimized for scalability and change
Popularity Widely used Widely used Increasing popularity

Suggested reading:

Data warehouse, data lake, or data lakehouse?

Centralized data management

Advanced data automation for data vault modeling

Want to learn more about the basics of business intelligence (BI)? Check out our free e-Learning course.

Future-proof your data with biGENIUS-X today.

Accelerate and automate your analytical data workflow with comprehensive features that biGENIUS-X offers.