ETL Strategy

Slower Development Databases and Servers

It is a normal trend in it to buy the most powerful machine for production usage. Example you production database box will have 16 CPUs your QA 8 and your Development box 4 or 6. Similar ratio is maintained on ETL servers, Application servers, and Hard Disk performance. Logic being the production environment is critical […]

Type II dimension: What is expiry date? Does it come from Source or is it ETL date?

I assume everyone is aware of type II dimension concept.  In type II dimension when we get an updated row we retire old row and insert new one. But what do you populate  in the from_date —to_date attributes. ETL create/update datetime or when that particular row was retired in the source system? The question can be […]

ETL Strategy for Reversing wrong file/data load.

We all know ‘garbage in, garbage out’. But what happens when the data provider (source system) realizes that they have provided incorrect data to the data warehouse? And now they want to correct the mistake by providing you new data! By then ETL has already loaded the data. The data is inserted, updated & deleted! […]

Loading & testing fact/transactional/balances (data), which is valid between dates!

This is going to be a very interesting topic for ETL & Data modelers who design processes/tables to load fact or transactional data which keeps on changing between dates. ex: prices of shares, Company ratings, etc. The table above shows an entity in the source system that contains time variant values but they don’t change […]

Introduction to Error and exception management.

ETL is all about transportation, transformation and organizing of data. Of anytime something moves (as a matter of fact even if you are perfectly stationary and items around moves) accidents are bound to happen. So any ETL specialist believes that their code is perfect and nothing can happen obviously lives in a fool’s paradise. The […]

ETL and the importance of dates.

ETL is all about data movement, batch processes and so on. The biggest criterias for data movement are based on dates. Dates like ETL_CREATE_DTTM, ETL_UPDATE_DTTM, SRC_SYS_CREATE_DTTM, SRC_SYS_UPDATE_DTTM, BUSINESS_CREATE_DATE, BUSINESS_UPDATE_DATE Proper understanding of dates is one of the most essential requirements, while designing, developing, testing, and scheduling ETL process. Lets discuss each of these in details.. […]