The data flow breakdown

Data FlowFor easy reference, I’d like to define the typical data flow encountered in business intelligence by its steps: Data integration, dimensional modelling, and data consumption. Basically these are the typical flows that make first and third party data available for analysis. Most of all it’s this data flow that brings a ROI when investing into BI.

1. Data flow part one: Data Integration

Integration measures how many of the first and third party data sources have been unified in a single place. This single place is usually called an integration layer. It can be any easy-to-access data store. Typically it would take the form of files on a network location (FTP, S3, GCS) or a relational database.

Usually this part is pretty messy in that the external sources are diverse and different. Often, each source requires its own access method to extract relevant data. As a result you get a very messy environment. Ideally handle it by a widely used scripting language to ‘glue’ systems together. It is a good practice here to use some form of utils to dry your code.Using common utils for loading data sources will prove verz time/saving when loading data.

2. Data flow part two: Dimensional Modelling

Dimensional modelling as a step in the data flow is the transformation of integration data to a format suitable for analysis. Usually in the form of a snowflake schema. The goal of this is to arrange the dimensions and fact tables such that metrics are unified.  Thus they are the same across reports.

Some small reporting set-ups omit dimensional modelling, but it is useful in enabling self service reporting.

Ultimately the end goal of dimensional modelling is to provide fact and dimension tables containing correct data. Later on, you will use this data for reporting and analysis. Ideally locate it in a SQL database for easy access.

3. Data flow part three: Data consumption

Here, I want to avoid using the terms reporting or analysis. In my opinion reporting is just one of many channels that a data consumer or analyst uses to access data. Such data consumption can take the form of performing an analysis that requires ad hoc query writing and statistical analysis. Or say the analyst has to perform a  seasonal product analysis based on data from a product performance dashboard.

The end goal of data consumption is to boost business metrics through informed decision making. Consequently this is where investing in data actually pays off, so make sure the needs of the decision-makers are met.