Let me start by saying I like SQL, and I believe it is fantastic in its ease of use. It’s meant for rearranging and aggregating data, making it easy to create reports from a dimensional model. I do however believe we should not misuse it, and simply write less SQL.
Now, there are two types of SQL code in BI: Simple SQL and Complex SQL.
Simple SQL
Simple SQL is the type of SQL that makes reports from a dimensional model – it seldom has subqueries, and all it does is some joining/grouping/aggregating, as all the numbers are available in fact tables in ready-to-use form. It’s easy enough that even some business users can easily understand it and possibly maintain it.
Simple SQL is fantastic because it allows easy operations on prepared data sets. It is easy to read and to maintain.
Complex SQL
Complex SQL is the SQL that makes fact tables, or the SQL of data products(data transformations for data applications). This SQL often has many subqueries, multiple steps of transformation, sometimes data cleaning and enriching.
Complex SQL is hard to maintain, and even harder to extend or modify. It often ends up worse and worse after each iteration. It’s next to impossible to refactor and you usually end up having to rewrite it after a while.
Complex SQL is cause for high rate of bugs, high maintenance and low bus factors. Since SQL is not easy to test (vs tests in a programming language), it is unlikely that it will ever have good test coverage, making it dangerous to work with. Subtle logic bugs will be well hidden, and due to the monolithic nature of the applications written in SQL it is very hard to maintain and debug.
I suggest to avoid Complex SQL entirely. The key concept for this argument is the ETL vs the ELT. Transform your data before it hits the database, and do it with easily testable and dry code.
Many decide to do the complex transformations in SQL mainly because it’s quick and easy. However, in the long run, this quick and easy SQL code costs you quite a bit of maintenance. And will get worse and worse with each iteration, as it is not possible to easily refactor it.
Instead of the query language, use a programing language with tests. This simply is the better approach if you desire iteration speed and accuracy.
Write less SQL, use a programming language.
That’s right, write less SQL. Use a programming language and dry your code. Only use SQL in easy or temporary transformations, and use something like Python when it comes to doing the transformations. There are limitations to this approach, but they are minor in comparison to the improvement it brings to the BI development process. Make a leap of faith and write less SQL!
Agreed. Watching our senior developer try to tackle an easy problem for Python with SQL was rather hilarious. He thought it was hilarious too. This was the kind of dude that was on the other end of the “caring” spectrum – he’d already made his millions and he literally got paid to play games on a tablet at work because he was the DBA. Having to tackle a problem that took longer than expected was “job security” for him. I think he might have just liked the chair.