Every time there is movement of data the results have to be tested against the expected results. For every ETL process, test conditions for testing data are defined before/during design and development phase itself. Some that are missed can be added later on.
Various test conditions are used to validate data when the ETL process is migrated from DEV-to->QA-to->PRD. These test conditions are can exists in the developer’s/tester’s mind /documented in word or excel. With time the test conditions either lost ignored or scattered all around to be really useful.
In production if the ETL process runs successfully without error is a good thing. But it does not really mean anything. You still need rules to validate data processed by ETL. At this point you need data validation rules again!
A better ETL strategy is to store the ETL business rules in a RULES table by target table, source system. These rules can be in SQL text. This will create a repository of all the rules in a single location which can be called by any ETL process/ auditor at any phase of the project life cycle.
There is also no need to re-write /rethink rules. Any or all of these rules can be made optional, tolerances can be defined, called immediately after the process is run or data can be audited at leisure.
This Data validation /auditing system will basically contain
A table that contains the rules,
A process to call is dynamically and
A table to store the results from the execution of the rules
Benefits:
Rules can be added dynamically with no cange to code.
Rules are stored permanantly.
Tolerance level can be changed with ever changing the code
Biz rules can be added or validated by business experts without worring about the ETL code.
NOTE: This post is applicable to all etl tools or databases like Informatica, DataStage, Syncsort DMExpress, Sunopsis or Oracle, Sybase, SQL Server Integration Services (SSIS)/DTS, Ab Initio,  MS SQL Server, RDB, etc.
Hi,
I was trying to build something similar. Basically a common framework for creating measures for Dashboards. But I didn’t make much headway as to how to have a standard frame work as each measure involved different calculations and ended up hard coding the actual logic in the ETL though manage to have table driven approach for storing all measures and their definitions.
Any elaboration or tips on the rule based approach would help.
Thanks
Mani