When designing a job, you will have to decide at some point between ETL vs ELT. This boils down to the question if you should perform the transformation to dimensional model in SQL. Or if you want to do it in a programming language.
Why, you ask?
What does ETL vs ELT mean for you practically?
Well, ETL means “Extract – Transform – Load”. So it is about extracting data from a source and then transforming it to a dimensional model. In the last step you load to the data warehouse for analysis.
ELT on the other hand stands for – you guessed it – “Extract – Load – Transform”. It is about extracting data and then loading it to a staging database. And from there transforming it to the dimensional model.
The difference is that the transformation step occurs before loading to a database (ETL) vs after (ELT). In theory, the data is loaded before transformation in order to assist with ease of access. The transformation takes place from the loaded staging database to the data warehouse dimensional model. In practice, the fundamental decision between ETL vs ELT is the one between using SQL or a programming language. Because once the data is in a staging database, the transformation will usually happen using complex SQL.
ETL vs ELT: my recommendation
I am going to assume you are using a tech stack of one or more programming languages, and SQL. In this case, what you need to decide is if you want to write the complex transformations in a programming language or if you want to use a query language. In my experience, it’s easier to maintain jobs where you do a little more programming and write less SQL. This happens because SQL does not support testing natively, and you cannot refactor it, so you have to rebuild instead.