As the previous ETL-IA blog post mentions that an addition of a source, adds an additional ETL process. It means that the time and effort that it takes to build the data warehouse is directly proportional to the number of sources. The more the number of sources, the more the warehouse system would cost.
An enterprise data warehouse contains a unified data model to obtain integrated data. Therefore, a unified entity is defined for various business entities. For example, there will be tables defined for various business entities such as employee, customer, and sales transaction. Furthermore, regardless of the source systems or the number of source systems, the attributes of business entity remain same. Therefore, in an ideal scenario in a unified data model, a unified process should populate such unified tables. Can this ideal scenario be translated into a meaningful solution in an imperfect world? Can the time and money spent on a project be independent of the number of sources?
Can we achieve something that is represented by ETL-IA represented by Red line? The ETL-IA approach indicates that the cost and effort of the project does not linearly increase with the increase in sources. It happens because the process that ultimately loads into data warehouse is reusable. Doesn’t matter how many sources you add, the cost and effort becomes non-linear after the initial development cost. The above chart displays how the ETL-IA is better to the traditional ETL development methodology.
If we look back and analyze the traditional code, what is the reusability of the code from source to source; may be no reusability or very limited.
Traditional ETL Methodology
Another Traditional ETL Methodology
Problems with Traditional Approaches
There are many problems with both the above ETL methodologies. The first methodology does not offer any reusability. It requires a separate process every time a new source is added, which is very costly solution to implement. Unfortunately, most of the enterprises are spending $s on this methodology.
And the problem with the second approach is that it contains the logic for the entire source at one place. If something happens, the whole system is at risk. And also, the troubleshooting becomes more difficult in such a design or lack of design.
Then what is the solution?
The ETL-IA specifies that the target is in unified format and designed for data integration. Then why not having only one process irrespective of the number of sources and their type? ETL-IA designs the target process that is source independent. The whole solution is based on this theory, which is further implemented using the object oriented programming concepts such as interface and controller objects.
After the implementation of ETL-IA, the ETL process facilitates the addition of new source as plug and play device. Thus, the more the number of sources added to the system, the more the $s savings.
Stay tuned for more information on ETL-IA.
its awesome. Thanks for the post
Wow great instructions and very useful diagrams
Very helpful indeed.