ETL is a type of data integration that refers to the three steps (extract, transform, load) used to blend data from multiple sources.

It’s often used to build a data warehouse.

ETL is a process that extracts the data from different source systems, then transforms the data (like applying calculations, concatenations, etc.) and finally loads the data into the Data Warehouse system.

It’s tempting to think a creating a Data warehouse is simply extracting data from multiple sources and loading into database of a Data warehouse.

This is far from the truth and requires a complex ETL process.

The ETL process requires active inputs from various stakeholders including developers, analysts, testers, top executives and is technically challenging.

In order to maintain its value as a tool for decision-makers, Data warehouse system needs to change with business changes.

ETL is a recurring activity (daily, weekly, monthly) of a Data warehouse system and needs to be agile, automated, and well documented.

ETL was created because data usually serves multiple purposes. For example:

  • Data about customers is important for tracking orders. A company needs to understand all of a customer’s recent orders so they can be fulfilled accurately. The system that manages customer orders might be SAP.
  • The same data is also used to understand buying patterns across all customers. For example, what products are selling the most quickly. Or, which product combinations are selling most effectively in each geography. The system that manages analytics might be a data warehouse.
  • The data is the same in both cases, but it is copied into different systems to serve each purpose. In this example, ETL moves the data from SAP to the data warehouse.

Types of ETL Tools

ETL tools have been around for over 30 years. As technology has evolved in that time, different types of solutions have entered the market.

There are several pure-play ETL vendors, such as Informatica, who specialize in ETL.

Other tools are offered by large software vendors, such as IBM, Oracle and Microsoft.

More recently, open source ETL tools and ETL cloud services have emerged.