Testing in ETL, Data-Centric Projects

Testing is an investigation process that is conducted to check the quality of the product. Product can either be an applications or data. The quality of the data can only be determined by checking data against some existing standards by following a set of processes. By doing so, you find out the symptoms in form of invalid/incorrect data that happened because of erroneous processes. So the data-centric testing results in achieving high quality data by getting the erroneous processes fixed.

This article highlights the types of data-centric testing approaches and discovers what each testing approach uncovers.

Types of Testing

Application Testing: The focus of this article is data-centric testing so weâ€™ll not discuss application testing here.
Data-Centric Testing: Data-centric testing revolves around testing quality of the data. The objective of the data-centric testing is to ensure valid and correct data is in the system. Following are the couple of reasons that cause the requirement of performing data-centric testing:
- ETL Processes/Data Movement: When you apply ETL processes on source database, and transform and load data in the target database
- System Migration/Upgrade: When you migrate your database from one database to another or you upgrade an existing system where the database is currently running.

Data-Centric Testing
The data-centric testing validates data using the following approaches:

Technical Testing: Technical testing ensures that the data is moved, copied, or loaded from the source system to target system correctly and completely. Technical testing is performed by comparing the target data against the source data. Following is a list of functions that can be performed under technical testing:
- Checksum Comparison: The data-centric testing makes use of checksum approach to discover errors. Checksum can be performed on source and target databases in n number of ways such as counting the number of rows, and adding the data of a column. Later the result of checksum calculated on source database is compared against the checksum calculated on the target database.
  For example, row count compares the number of rows in the target database with the number of corresponding rows in the source database. Or the target database may contain the summarized annual data for monthly salaries in the source database. So the target database should contain sum of the monthly salaries paid within a year for each year.
- Domain comparison: The domain list in the target database is compared against the corresponding domain list in the source database. For example, the source system has 100 employees and the target system also has 100 employees but it does not guarantee that the employees in both the source and target domain lists are same unless compared. In Domain comparison, employee names in the target domain list are compared against the employee names in the source domain list.
- Multi-value comparison: Similar to List comparison, multi-value comparison compares the whole record or the critical columns in a record against the corresponding values in the source system.
  For example, the domain comparison reports that all the employees are same and checksum comparison reports that the salaries on source and target match, but still it does not guarantee of a valid record because the data movement may result in assigning wrong employee name to the salary entry. Such issues are not discovered by Checksum comparison and Domain comparison. The multi-value comparison discovers such issues by comparing the key attributes of each entity in source and target databases.
Business Testing: Business testing is done to ensure that the data fulfills the requirements of the business. Data may have been moved, copied, or loaded completely and accurately, and technical testing does not report any issue still there are chances that the system still contains the invalid data. To ensure high quality data, the data is evaluated against the business rules.
For example, value of certain fields such as salary and commission cannot be less than zero. These types of errors can occur because of data manipulation between the source and target systems. Unless you test data for any such errors, the quality of the data is not guaranteed.
For every business, the list of the business requirements is different. An exhaustive list of business rules against which the data is compared, ensures high quality data.
Reconciliation: Reconciliation ensures that the data in the target system is in agreement with the overall system requirements. Following are the couple of examples of how the reconciliation helps in achieving high quality data:
- Internal reconciliation: In this type of reconciliation, the data is compared within the system against the corresponding data set. For example shipping would also always be less than or equal to the orders. If the shipping ever exceeds the orders then it means data is invalid.
- External reconciliation: In this type of reconciliation, data in system is compared against its counterpart in other systems. For example, in a module or in an application, number of employees can never be more than the number of employees in HR Employee database. Because HR Employee database is the master database that keeps record of all the employees. If such a situation occurs where the number of employees anywhere in the system is more than the HR Employee database, then the data is invalid.

My company developed the ICEÂ± product, which targets to accomplish all types of testing mentioned in this article. For more information, visit www.iCEdq.com

This entry was posted on Thursday, May 19th, 2011 at 8:42 pm and is filed under ETL Testing. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

14 Responses to “Testing in ETL, Data-Centric Projects”

samta says:

May 23, 2011 at 3:08 pm

very nice and informative
Joya says:

June 9, 2011 at 5:55 am

Thanks for the post

You can find some good interview Question Answers on Informatica in the below link

http://www.aired.in/2010/12/informatica-interview-question-answers.html

Thanks
Joya
Namita says:

September 1, 2011 at 2:37 pm

I am new to ETL tesing and trying for a job in ETL testing.can you please share ETL testing .ASAP.
I would appreciate your kindness and helping people by sharing your knowledge.
Please send ETL testing material to CAE_2010@yahoo.com
MLV Krishna Chaitanya says:

October 9, 2011 at 7:44 pm

FYI Please…

Blog/Article on ETL Testing.
Raghavendra says:

December 11, 2011 at 12:12 am

Hi, please forward me any material for still testing.
Raghavendra says:

December 11, 2011 at 12:13 am

ETL Testing.
siva says:

December 28, 2011 at 6:50 am

hi,
this is siva new etl tester.plz frw any etl material my mail
Sivasankar says:

March 17, 2012 at 12:42 pm

Plaese send this ETL Testing book to my email id (sivasankarreddy267@gmail.com).
sairam says:

April 17, 2012 at 3:58 pm

kindly send any data related to Data Centric Testing to my mail “sairammallarapu77@gmail.com”
santosh says:

April 28, 2012 at 9:52 am

hello,
This is santosh , i want to know the ETL Testing process, please send the material and process of ETL testing to my mail id.

Regards,
santosh
Hemant says:

May 2, 2012 at 11:04 am

I am looking for ETL testing as my carrer path.Could you please send the ETL testing book to my mail ID(hemant.raut33@gmail.com)
Ranganadhbabuduggi says:

May 16, 2012 at 10:45 am

Very Nice basic Information for d ETL Teting
satya says:

May 25, 2012 at 8:25 pm

Please share the ETL testing books with me..
my mail id :usatyait@gmail.com
Suddi says:

July 23, 2012 at 2:10 pm

Alternatively you can find one more blog on DWH/ETL concepts and testing below.
http://xtremthink.blogspot.in/2011/08/testing-data-warehouse.html

ETLGuru

Testing in ETL, Data-Centric Projects

14 Responses to “Testing in ETL, Data-Centric Projects”

Leave a Reply