Testing in ETL, Data-Centric Projects

Testing is an investigation process that is conducted to check the quality of the product. Product can either be an applications or data. The quality of the data can only be determined by checking data against some existing standards by following a set of processes. By doing so, you find out the symptoms in form of invalid/incorrect data that happened because of erroneous processes. So the data-centric testing results in achieving high quality data by getting the erroneous processes fixed.

This article highlights the types of data-centric testing approaches and discovers what each testing approach uncovers.

Types of Testing

  • Application Testing: The focus of this article is data-centric testing so we’ll not discuss application testing here.
  • Data-Centric Testing: Data-centric testing revolves around testing quality of the data. The objective of the data-centric testing is to ensure valid and correct data is in the system. Following are the couple of reasons that cause the requirement of performing data-centric testing:
    • ETL Processes/Data Movement: When you apply ETL processes on source database, and transform and load data in the target database
    • System Migration/Upgrade: When you migrate your database from one database to another or you upgrade an existing system where the database is currently running.

Data-Centric Testing
The data-centric testing validates data using the following approaches:

  • Technical Testing: Technical testing ensures that the data is moved, copied, or loaded from the source system to target system correctly and completely. Technical testing is performed by comparing the target data against the source data. Following is a list of functions that can be performed under technical testing:
    • Checksum Comparison: The data-centric testing makes use of checksum approach to discover errors. Checksum can be performed on source and target databases in n number of ways such as counting the number of rows, and adding the data of a column. Later the result of checksum calculated on source database is compared against the checksum calculated on the target database.
      For example, row count compares the number of rows in the target database with the number of corresponding rows in the source database. Or the target database may contain the summarized annual data for monthly salaries in the source database. So the target database should contain sum of the monthly salaries paid within a year for each year.
    • Domain comparison: The domain list in the target database is compared against the corresponding domain list in the source database. For example, the source system has 100 employees and the target system also has 100 employees but it does not guarantee that the employees in both the source and target domain lists are same unless compared. In Domain comparison, employee names in the target domain list are compared against the employee names in the source domain list.
    • Multi-value comparison: Similar to List comparison, multi-value comparison compares the whole record or the critical columns in a record against the corresponding values in the source system.
      For example, the domain comparison reports that all the employees are same and checksum comparison reports that the salaries on source and target match, but still it does not guarantee of a valid record because the data movement may result in assigning wrong employee name to the salary entry. Such issues are not discovered by Checksum comparison and Domain comparison. The multi-value comparison discovers such issues by comparing the key attributes of each entity in source and target databases.
  • Business Testing: Business testing is done to ensure that the data fulfills the requirements of the business. Data may have been moved, copied, or loaded completely and accurately, and technical testing does not report any issue still there are chances that the system still contains the invalid data. To ensure high quality data, the data is evaluated against the business rules.

    For example, value of certain fields such as salary and commission cannot be less than zero. These types of errors can occur because of data manipulation between the source and target systems. Unless you test data for any such errors, the quality of the data is not guaranteed.
    For every business, the list of the business requirements is different. An exhaustive list of business rules against which the data is compared, ensures high quality data.

  • Reconciliation: Reconciliation ensures that the data in the target system is in agreement with the overall system requirements. Following are the couple of examples of how the reconciliation helps in achieving high quality data:
    • Internal reconciliation: In this type of reconciliation, the data is compared within the system against the corresponding data set. For example shipping would also always be less than or equal to the orders. If the shipping ever exceeds the orders then it means data is invalid.
    • External reconciliation: In this type of reconciliation, data in system is compared against its counterpart in other systems. For example, in a module or in an application, number of employees can never be more than the number of employees in HR Employee database. Because HR Employee database is the master database that keeps record of all the employees. If such a situation occurs where the number of employees anywhere in the system is more than the HR Employee database, then the data is invalid.

My company developed the ICE± product, which targets to accomplish all types of testing mentioned in this article. For more information, visit www.iCEdq.com

14 Responses to “Testing in ETL, Data-Centric Projects”

  1. samta says:

    very nice and informative

  2. Joya says:

    Thanks for the post

    You can find some good interview Question Answers on Informatica in the below link

    http://www.aired.in/2010/12/informatica-interview-question-answers.html

    Thanks
    Joya

  3. Namita says:

    I am new to ETL tesing and trying for a job in ETL testing.can you please share ETL testing .ASAP.
    I would appreciate your kindness and helping people by sharing your knowledge.
    Please send ETL testing material to CAE_2010@yahoo.com

  4. MLV Krishna Chaitanya says:

    FYI Please…

    Blog/Article on ETL Testing.

  5. Raghavendra says:

    Hi, please forward me any material for still testing.

  6. Raghavendra says:

    ETL Testing.

  7. siva says:

    hi,
    this is siva new etl tester.plz frw any etl material my mail

  8. Sivasankar says:

    Plaese send this ETL Testing book to my email id (sivasankarreddy267@gmail.com).

  9. sairam says:

    kindly send any data related to Data Centric Testing to my mail “sairammallarapu77@gmail.com”

  10. santosh says:

    hello,
    This is santosh , i want to know the ETL Testing process, please send the material and process of ETL testing to my mail id.

    Regards,
    santosh

  11. Hemant says:

    I am looking for ETL testing as my carrer path.Could you please send the ETL testing book to my mail ID(hemant.raut33@gmail.com)

  12. Ranganadhbabuduggi says:

    Very Nice basic Information for d ETL Teting

  13. satya says:

    Please share the ETL testing books with me..
    my mail id :usatyait@gmail.com

  14. Suddi says:

    Alternatively you can find one more blog on DWH/ETL concepts and testing below.
    http://xtremthink.blogspot.in/2011/08/testing-data-warehouse.html

Leave a Reply