Archive for the ‘ETL Book’ Category

Difference between Reference Data and Master Data

Thursday, June 19th, 2008

It is not unusual for people to use ‘Reference Data’ and ‘Master Data’ interchangeably without understanding the differences.
Lets try to understand the differences with an example of sales transaction.

A sales transaction contains information like….
Products Sold,
Sales Person,
Store Name,
Sales Date,

Attributes from the above example can be separated into two types: Factual (transactional) and Dimensional information
Price and Quantity are measurable attributes of a transaction.
Store, Products Sold, Sales Person, Store Name, Sales Date, and Customer are dimensional attributes of a transaction.

We can see that the dimensional data is already embedded in the transaction. And with dimensional attributes we can successfully complete the transaction.Dimensional data that directly participates in a transaction is master data.

But is the list of dimensional attributes in the transaction complete? 

Asking few analytical questions can help us discover the answer. 
     -What is the Male to Female ratio of customers doing purchase at the store?
     -What type of products are customers buying? Ex: Electronic, Computers, Toys
     -What type of Store is it?  Ex: Web store, Brick & Mortar, Telesales, Catalog Sales

The above questions cannot be answered by attributes in the transaction. These dimensional data is missing in the transactions.  This missing dimensional data that does not directly participate in transaction but are attributes of the dimension is reference data.

Why it is important for an ETL person to understand the differences? Well once the  ‘Reference Data Management’ (RDM) was popular then suddenly in last few years there is this new word ‘Master Data Management’ (MDM). These words mean different things and they have significant implication on how they are managed. But that will be a topic of discussion for some future post!  I hope this article will help clear atleast some confusion. ETL Strategies and Solutions for Data Warehouse -Contents of ETL Book

Monday, November 5th, 2007

Section A ,The Beginning
01 ETL The Basics
02 ETL Strategy

Section B, Analysis
03 Target Systems Analysis
04 Source Systems Analysis
05 Source Target Mapping- Part I
06 Understanding Data Quality
07 Data Profiling

Section C, Develop Part I
08 Understanding Data Patterns for ETL
09 Simple ETL Development

Section D,  ETL Architecture & Design
10 ETL & Data Integration
11 ETL-IA (Interface Architecture)
12 ETL-IA Implementation
13 Designing Standard ETL Templates

Section E, Develop Part II
14 File Management & Transportation
15 Extraction
16 Staging Data
17 ETL Transformation Development
18 Unit Testing ETL Processes
19 Coding Wrappers
20 Automation of ETL Processes

Section F, Migration
21 Migration ETL Processes

Section G, Post Production
22 Reference Data Management
23 Exception & Error Management
24 Production Support & Change Management For ETL Processes
25 ETL & Performance Tuning

Section H, Other
26 ETL Tools
27 ETL & Metadata Management ETL Strategies and Solutions for Data Warehouse……..ETL book by Sandesh Gawande

Friday, November 3rd, 2006

Ok! I am finally planning to publish ETL book with detailed information on all aspects related to ETL and Data Integration. The ETL Book will be used by Data Warehouse Managers, Data Warehouse Architects & ETL Leads. It will also contain solutions for ETL developers. The Book will be independent of ETL tools like Informatica, DataStage, etc.

The good news is that readers will not have to wait for the completion of all chapters. I will make individual completed chapters available so that readers do not have to wait for others to be complete. Once all the chapters are complete readers can return individual chapters and get complete ETL Book free or paying the difference. The ETL forum will have a thread for each chapter so that reader can discuss/ recommend/ suggest. This will make it a complete ETL package.

Comments welcome! Like on chapters, title of the book, what you would like see in the contents, etc. I will also offer money back guarantee if you dislike the book.  For competing ETL books, I would suggest you buy it (The Data Warehouse ETL Toolkit) only if the bookstore near you permits for returns, so that you can compare yourself and decide.

Please wait for link to purchase Book /Chapters… Coming soon… First few chapters before Nov 30 2006.  Dec 30 2006.