It is not unusual for people to use ‘Reference Data’ and ‘Master Data’ interchangeably without understanding the differences.
Lets try to understand the differences with an example of sales transaction.
A sales transaction contains information like….
Store,
Products Sold,
Sales Person,
Store Name,
Sales Date,
Customer,
Price,
Quantity,
etc.
Attributes from the above example can be separated into two types: Factual (transactional) and Dimensional information
Price and Quantity are measurable attributes of a transaction.
Store, Products Sold, Sales Person, Store Name, Sales Date, and Customer are dimensional attributes of a transaction.
We can see that the dimensional data is already embedded in the transaction. And with dimensional attributes we can successfully complete the transaction.Dimensional data that directly participates in a transaction is master data.
But is the list of dimensional attributes in the transaction complete?Â
Asking few analytical questions can help us discover the answer.Â
     -What is the Male to Female ratio of customers doing purchase at the store?
    -What type of products are customers buying? Ex: Electronic, Computers, Toys
    -What type of Store is it? Ex: Web store, Brick & Mortar, Telesales, Catalog Sales
The above questions cannot be answered by attributes in the transaction. These dimensional data is missing in the transactions. This missing dimensional data that does not directly participate in transaction but are attributes of the dimension is reference data.
Why it is important for an ETL person to understand the differences? Well once the ‘Reference Data Management’ (RDM) was popular then suddenly in last few years there is this new word ‘Master Data Management’ (MDM). These words mean different things and they have significant implication on how they are managed. But that will be a topic of discussion for some future post!  I hope this article will help clear atleast some confusion.
Â
Â