Traditional data warehouses tend to use an ETL (Extract-Transform-Load) approach which grabs data from various sources, normalises it, then feeds it into the data warehouse itself. Put the data warehouse into the cloud though, and things need to be done a bit differently. That’s because you don’t want additional steps of moving and processing data before loading it; it takes up additional time and, in a very literal sense, bandwidth (and bandwidth = $). And that’s what’s behind the acceptance of ELT as a better option.
In the finest tradition of the tech industry, ELT is awfully close an acronym to ETL. There’s good reason for that (this time round, at least) in that each letter stands for the same thing. It’s just the order which shifts around and that too, is for good reason. Extract, Load and Transform does what it says: the processing (transform) is done after the data is loaded into the cloud, leveraging cloud's scale.
Let’s take a step back and see why this seemingly minor difference in approach makes such a big difference in the cloud.
In the traditional way of processing data, this is what the ETL system does:
It Extracts data from multiple outside sources
Then Transforms it, independent of the data warehouse, to meet the specific business requirements
And Loads it into the target data warehouse.
This approach works for on-premise data warehouses. After all, the data, although potentially in multiple systems, is usually on the same network, processing is readily available in the dedicated ETL server, and the reports which are to come out of the exercise are often pre-determined.
At least two things have changed. One is that cloud data warehouses are not just highly viable these days, they are also highly attractive. The second is that the sheer volume of data (and sources) has changed. Back in the old days, ETL was a good idea because it addressed a bottleneck with server processing capability by taking the organisation of raw data away from the database. Twenty years on, server technology has taken multiple great leaps forward and that means it’s no longer necessary to separate out the ETL. In an ironic stroke, the ETL server itself becomes the bottleneck for the cloud data warehouse as you’re moving potentially huge volumes of data around when there’s no longer any benefit to moving it.
The emergent alternative is ELT. As the name implies, the Extract and Load parts are similar to ETL, but the order is shifted around. The Transform component happens only once the data is already loaded into the data warehouse.
In the context of a cloud data warehouse, ELT means being able to get the data into the cloud faster. The bottleneck of slow transformations (and the need to constantly set up additional ETL servers to handle more and more data) is removed and only once the data is in the data warehouse, does the processing (if required) take place. In effect, you can store as much data as you like, in its raw form, with transformation and analytics applied on top of the source data to meet the various needs of the users and their applications. Simply put, ELT allows scale; it doesn’t matter how much data there is or how many sources, feed it straight into the cloud and then get on with it.
There’s no need to have to set up bigger and bigger servers to process the data before it makes it into the data warehouse.
The good news is that WhereScape RED is designed for both on premise and cloud environments. It delivers ELT by design, so you can feed data straight into the cloud and transform it natively, to the required quality or specifications. The RED architecture is data-centric. It’s simple and it leverages the strength of modern cloud-based PaaS data platforms.