Business Intelligence in companies is a decision support system. Its implementation is a reflection of the questions that need to be answered in the company at any given time. Many companies start with just a few spreadsheets with data. They populate them with copy pasted numbers. This allows them to understand how their business is evolving and where the next opportunities lie. As time goes on, the companies usually go through several stages of Business Intelligence solutions.
1. The copy paste spreadsheets
At the earliest stage in the company’s life, there is little automation of data extraction and reporting. Usually, the founder already has some profit and loss spreadsheets. Then he will ask the employees to fill in data relevant to their departments. This is when the team defines the first metrics. Unfortunately oftentimes the calculations are not complete due to missing data.
2. The VBA reports
First, let me state that I consider VBA+excel reporting to be a disaster when it comes to data solutions. The reasons are simple: it is a programming paradigm that follows zero best practices. It’s programming, so it’s too hard for the average Joe to build on. It’s VBA, so it’s not made for data munging, versioning, collaborative working, easy file handling etc. It works for a one-off project, but you will end up with too much code that is too hard to maintain for too little benefit.
This usually appears as a quick-and-dirty way of automating some reporting coming from someone with financial background. This happens because finance folks like to have things neatly ordered in Excel, with custom functionality for particular reports. This works on a small scale, but it is not worth maintaining bad code with custom functionality and particular layouts for all the reports on everyone’s computers.
3. The Excel Hell
Natural evolution: Each department fills in some data sheets that feed into the managers’ reports. After a while, each department wants their own reports as well, so they start using the data sources from the other departments. Soon, they want new numbers and start changing the data sources to add more data. This inevitably breaks other dependent reports and creates pain for everyone.
Soon, the reports become data sources for new reports, to try to avoid redundant maintenance. This only makes things worse, as now there are dependencies in many directions between many things. One small change could trigger catastrophic failure on company level. Moreover all the knowledge to maintain this system is so fragmented that it is not possible to centralize.
4. The data bucket
After excel hell, some companies fall back to what they know: Data source files. To allow collaborative development, they put these files on a shared drive or cloud storage solution. From here, the excel-savvy can build their own reports.
It has the advantage of standardizing data sources. Moreover it allows anyone to build their own reports based on those sources, instead of people building on top of existing reports.
Besides scale, the first shortcoming is the manual effort involved. Additionally, the lack of maintainability (cannot at a glace understand the logic behind a report) causes redundant reports to be build, which just adds to the work.
5. The data river
The company looks online to how to do better BI and receives misinformation from advertorials, ads, agencies and parrots (people who just repeat what they read). They end up buying some dashboard solution that promises seamless data integration and real time stats. They get IT to write some SQL queries for the tool, and they connect to the other services (advertising, ticketing systems etc) with ease.
It now looks like all the data is available. In a tool that displays, but cannot transform data to a useful format. This is the birth of the need for a data warehouse. Complex logic (and there is always complex logic in data munging) will need to be done by coding.
Now we can see our data, but we cannot leverage it for the reporting we need.
6. The data store, or unstructured data warehouse
After having struggled with having the data in files, or by piping the data directly to reports via a tool, the need for a database that can store both raw data and processed data appears.
Of course it makes sense to combine the data in a SQL database to allow all the other tools to pull data from it too. Also most of the transformations will happen in SQL. First of all it is easy to learn and use. And secondly it is a very powerful language for working with sets of data – tables.
This collection usually consists of everything that can be imported and is usually not structured or grouped together by a particular logic. It’s simply a place to store the data to make it more accessible for analysis and further transformations. Often, much of the data is coming from manually populated data sources.
Some argue they are too cool for SQL. But in the end SQL just works because of how limited and powerful the language is. It’s powerful because it performs all operations on a table or column/row level. This allows for example to perform a Cartesian join between 2 data sets with a single keyword. Think of all the loops you’d write (run) in a programming language!
SQL has limitations because it only allows a restricted set of operations on tables. This is actually a huge boon, as it allows both good developers and bad developers to speak the same language. This means the code is unlikely to become too complex to understand.
7. The data warehouse
When a BI person wants to churn out a variety of reports, the benefits of a dimensional model become apparent. Typically, one adds this as a layer on top of the raw data. This makes the mapping of business processes at each grain or step possible. And thereby also enables easy creation of reports.
8. The self service reporting
Once the dimensional model is available, the team can create data cubes. This way they can make the data available easily to the rest of the company. Every stakeholder can filter the data cubes or drill down/roll up. With this functionality, a business analyst with access to the data can build complex reports and dashboards in minutes. The tool containing the data would usually be Excel or Dashboard tools. An operational data consumer might notice an anomaly and be able to troubleshoot it by drilling down in the same window. This is the end goal for most companies.
9. The advanced self service reporting
Drilling down predefined paths is not enough for complex analyses. Also, for the sake of agility many data sources will not be fully integrated into the data warehouse. (Example: a test of an ad network.) This is where custom complex transformations come into place. These are often to generate a report of a new data source, or to analyze particular details such as user behavior.
10. The data products: Automated decision making instead of BI as decision support.
Often times, the decision based on data is logical and repeatable. Therefore the data team creates algorithms to automate that decision. Take for example the decision of which products you should recommend to each customer, or the adjustment of bids based on performance metrics.
The data team can create these data products with relative ease. Sometimes the team builds them on top of the dimensional model of the DWH. Often, due to performance and different SLA requirements, the products run independently of the data warehouse. They consume data from the source or are integrated in an application.
Hope you enjoyed and let me know in the comments what else you have encountered. And where do you find yourself on this scale?