It is not unusual for people to use ‘Reference Data’ and ‘Master Data’ interchangeably without understanding the differences.
Lets try to understand the differences with an example of sales transaction.
A sales transaction contains information like….
Store,
Products Sold,
Sales Person,
Store Name,
Sales Date,
Customer,
Price,
Quantity,
etc.
Attributes from the above example can be separated into two types: Factual (transactional) and Dimensional information
Price and Quantity are measurable attributes of a transaction.
Store, Products Sold, Sales Person, Store Name, Sales Date, and Customer are dimensional attributes of a transaction.
We can see that the dimensional data is already embedded in the transaction. And with dimensional attributes we can successfully complete the transaction.Dimensional data that directly participates in a transaction is master data.
But is the list of dimensional attributes in the transaction complete?Â
Asking few analytical questions can help us discover the answer.Â
     -What is the Male to Female ratio of customers doing purchase at the store?
    -What type of products are customers buying? Ex: Electronic, Computers, Toys
    -What type of Store is it? Ex: Web store, Brick & Mortar, Telesales, Catalog Sales
The above questions cannot be answered by attributes in the transaction. These dimensional data is missing in the transactions. This missing dimensional data that does not directly participate in transaction but are attributes of the dimension is reference data.
Why it is important for an ETL person to understand the differences? Well once the ‘Reference Data Management’ (RDM) was popular then suddenly in last few years there is this new word ‘Master Data Management’ (MDM). These words mean different things and they have significant implication on how they are managed. But that will be a topic of discussion for some future post!  I hope this article will help clear atleast some confusion.
Â
Â
Hello,
How about the term Master Reference Data?
You can create any combination of the words Reference Data and Master Data. There are also some borderline cases where the data can fall in either category. In short you should not bother about ‘Master Reference Data’….
When reading MDM, I understood that master data will often turns out to be non-transactional. please correct me if I am wrong
This article is about Master data and not MDM. Master data is a type of data and MDM is system to manage master data.
Anyway MDM stands for Master Data Management. MDM has three tasks 1. Master data creation management 2. Master data integration & 3. Master data distribution.
1. Pre: Lets say you work for publishing house a Author comes to you and asks to publish his book. Do you just go ahead and add his name and information into the request system or you will try to check if the author already exists? You would want to go check if he already exists in the system. This is Management of Master Data (MDM) during creation.
2. Post: Lets you don’t have a system to check authors at entry time. And the request system adds the author. Now the message that a new author has been added should be sent to some system which will at some point try to verify if it should allow a new name entry or merge with the existing one. This is Post master data creation MDM.
3. Another task of MDM system is to distribute this master data in the organization.
The article about the difference between reference data and master data is very useful. But clear explanation is needed. Even the sales transaction data is given as example, but the definition of reference data and master data should be given.
hi nice article
A lot of ETL tools have the ability to make a clear distinction between reference data and master data. They support the concept of surrogate keys. Agree, master data management is a total cup of tea.