I’d like to cover here some basics about the common business intelligence stack of Python and SQL. Let’s see why this stack is so popular, and why it’s not yet the one and only gold standard.
Why Python for business intelligence?
Most business intelligence professionals using Python come from all walks of life and do not have formal programming education. This can lead to incredibly random ways of coding – not for Python though, as Python requires indentation.
This doesn’t mean that someone who has no idea will produce good code or a good architecture. And regardless of how clean the code, people can do very obscure and roundabout complex things. What it does mean that it is very easy to read their code, which is the first step in understanding it.
Python is an easy to learn, widespread, and easy to use language. It’s much easier to find Python developers than developers with VBA knowledge. Additionally, it’s even more difficult to find developers with BI knowledge and knowledge in other specific languages.
For BI work, you need to extract/load data, transform data and analyze data. Python can do all – most advertising data producers offer Python libraries to access their APIs. Python can access any database you can think of. Python natively has great features for data munging. Python can be used to perform large parallel computations or statistical analysis. Data science too.
This means that whatever problem you are encountering, someone else probably encountered it too. You will have a very easy time to find solutions or get help for anything Python. While other languages might be better at specific things, Python is a jack of all trades, master of some.
Why not Python?
Well, many BI developers stop there because the language is so versatile. However, this means they will not have the opportunity to work with languages designed for large software projects. Also they will miss out on best practices and programming paradigms. Too easy sometimes means people stop learning.
Python is relatively slower than most other languages. In the context of working with data, this only appears as an issue when doing complex calculations on very large data sets.
Why SQL for business intelligence?
I will talk here about PostgreSQL for small data specifically, since it is the best open source database for analysis. It has better functionality that paid solutions like Microsoft’s, and equal or better performance and functionality to Oracle’s product. For free, without licensing fees or vendor locks, it is built by developers, for developers, with robustness and ease of use in mind. You can even run Python on Postgres 🙂
For big data, the leading SQL solution seems to be the postgres-like product from Amazon, Redshift.
Ease of access:
For the sake of easy data access, SQL makes a lot of sense. There are plenty of tools that can run SQL against a database to generate dashboards or can connect it easily in excel. And it is easy to learn for analysts.
Ease of data manipulation:
Most of the data you will have will not be single numbers or random sentences, but structured in tables. This means that instead of conventional algebra, we use relational algebra, to perform operations between data sets instead of single values. This is where SQL shines, since it is designed for these kinds of operations specifically. Once you have the data in a database, you can leverage the power of this language to very easily perform these operations.
Every self-respecting BI professional known at least a little SQL. Most good BI engineers know a lot of SQL. Also, particular type of SQL is not very relevant, as the flavors are very similar.
Why not SQL?
Poor options for architecture.
SQL is a query language, not a programming language. It’s meant to be used to rearrange and compute data, and not to do complex operations. You end up producing a lot of code, often wet (opposite of DRY). The code ends up being monolithic, and it is not easy to see at a glance what a query does. Basically, you cannot write easily maintainable code.
Lowers the bar.
SQL is so easy to learn, that a lot of people end up being able to use it. However, this does not suffice for writing clean code, designing an ETL, a BI application, or a data warehouse. I’ve personally seen ridiculous projects that were a dependency hell of views on views on views . Sadly, knowing SQL is not enough to say about one’s ability as a professional to deliver a quality product, but the two often get confused in the world of BI.
I hope you enjoyed this article, and if you have any opinions on the topic do not hesitate to leave a comment.