ETL and the importance of dates.

ETL is all about data movement, batch processes and so on. The biggest criterias for data movement are based on dates. Dates like

ETL_CREATE_DTTM,

ETL_UPDATE_DTTM,

SRC_SYS_CREATE_DTTM,

SRC_SYS_UPDATE_DTTM,

BUSINESS_CREATE_DATE,

BUSINESS_UPDATE_DATE

Proper understanding of dates is one of the most essential requirements, while designing, developing, testing, and scheduling ETL process.

Lets discuss each of these in details..

ETL Create Date Time: -This is the date on which the row is loaded in the data warehouse. It has nothing to do with the business or business logic. It simply documents the exact date and time, the row as populated in the database by the ETL process. Once time stam is added it is never updated for the life of the database. It defines the first time the row arrived in the database. Also it is important to note that during a batch load the date and time at the begining of the process should be stored in the variable & same exact date and time should be used for one instance of the batch process. Why… think about it, more details will be offered soon.

ETL Update Date Time: – It is same as ETL Create Date Time except that, it may be populated during inserts, but unlike create timestamp above, it is updated every time any value in the row is updated. It give the information, as to the last time the database row was updated.

Source System Create Date Time: – This is the date time on which the incoming row was created in the source system. Some times due to combination of rows from different tables, or summerization of rows will make it difficult to find out the create date. But a best guess can be used. Also some times the source may not have the create date time at all, in that case the system date on the source may be used.

Source System Update Date Time: -This is similar to source create timestamp. But only applies to the update date time.

Now lets focus on some of the least understood & most misunderstood date concept on dates. Some times it takes so much time to explain this at worksite that it prompted me to write this article.

Business Date: -This is the official business date to which the the data/row belongs. This has nothing to do with the system date or clock time on the source application; or when it was loaded in the data warehouse.

Example 1: If a sales order was booked on the 1/1/2006 and was not entered into the system until system date becomes 1/4/2006 still the business date will be 1/1/2006.

Example 2: Trade are happening during the day 22/12/2006 and the batch process starts within the application at 8 pm and even after the stytem clock passess midnight into next day 23/12/2006; still the data belongs to 22nd and not to 23rd. This is normaly controlled by a function that holds the business date and is only changed when the triggering element is instructs to change the business date to new one. This also helps because the data can be processed for any historical date just by controlling the business date in the system.

Uses of these six types of dates? Think….

Delta processing, Batch processing, Back dated processing, Auditing, etc.

Leave a Reply