I assume everyone is aware of type II dimension concept. In type II dimension when we get an updated row we retire old row and insert new one. But what do you populate in the from_date —to_date attributes. ETL create/update datetime or when that particular row was retired in the source system?
The question can be asked in a different way… When a employee changes from one department to another department; While populating in the data warehouse type II dimension EMPLOYEE table… Do you use the actual date of his transfer or the ETL date when you are populating the table?
What are you doing in your ETL code?
I believe, it depends on the batch window when we are loading the data into the warehouse. If it is a daily load I would like to set to_date to processing date -1 when updating and from_date to processing date during insert.
If the load is not daily then I would like to have a separate date field like effective data which can replace the processing date in the first scenario.
To begin with, I would expect the source system to record following dates: begin_date, end_date, created_date and modified_date, where begin and end dates record actual transfer dates, and created and modified are the audit data.
My type 2 target (fact or dimension) would have following date columns: begin_date, end_date, start_effective_date and end_effective_date with matching date_keys.
So, when employee changes department, actual transfer dates are recorded in begin and end dates. The effective date fields record when the record was actually modified (inserted or changed) in the source system. Prior to that modification, the reporting from the source system would not reflect the employee transfer.
For start and end effective dates, we should use the created/modified dates from the source system. That way, the reporting between DW and the source would match.