Data Architecture

Pitfalls of type II dimension

Type II dimension has been popularized by R Kimball. it has become so popular that in any interview related to data warehouse, the interviewer will surely ask to explain the concepts. And chances are that if you don’t know; they will laugh at your ignorance and reject you. Here’s your chance to laugh at them.. […]

Difference between Reference Data and Master Data

It is not unusual for people to use ‘Reference Data’ and ‘Master Data’ interchangeably without understanding the differences. Lets try to understand the differences with an example of sales transaction. A sales transaction contains information like…. Store, Products Sold, Sales Person, Store Name, Sales Date, Customer, Price, Quantity, etc. Attributes from the above example can be separated […]

Type II dimension: What is expiry date? Does it come from Source or is it ETL date?

I assume everyone is aware of type II dimension concept.  In type II dimension when we get an updated row we retire old row and insert new one. But what do you populate  in the from_date —to_date attributes. ETL create/update datetime or when that particular row was retired in the source system? The question can be […]

Hard references to database objects, Synonyms and ETL

1. Any time hard references are made to a database object it is an invitation to trouble. Especially in ETL where the object (table /view /procedure) can belong to schema. 2. Oracle has –> Database server name –> Schema –> Object 3. SCHEMA_NAME.TABLE_NAME is a bad way of referencing objects from the ETL architecture point […]

Loading & testing fact/transactional/balances (data), which is valid between dates!

This is going to be a very interesting topic for ETL & Data modelers who design processes/tables to load fact or transactional data which keeps on changing between dates. ex: prices of shares, Company ratings, etc. The table above shows an entity in the source system that contains time variant values but they don’t change […]