{"id":32,"date":"2007-05-13T18:00:34","date_gmt":"2007-05-13T18:00:34","guid":{"rendered":"http:\/\/etlguru.com\/blog\/2007\/05\/13\/for-every-rule-there-is-an-exception-for-each-exception-there-are-more-exceptions%e2%80%a6\/"},"modified":"2007-05-13T18:00:34","modified_gmt":"2007-05-13T18:00:34","slug":"for-every-rule-there-is-an-exception-for-each-exception-there-are-more-exceptions%e2%80%a6","status":"publish","type":"post","link":"https:\/\/etlguru.com\/?p=32","title":{"rendered":"For every rule there is an exception; for each exception there are more exceptions\u00e2\u20ac\u00a6"},"content":{"rendered":"<p><font size=\"2\">To implement an ETL process there are many steps that are followed. One such step is creating a mapping document. This mapping document describes the data mapping between the source systems and the target and the rules of data transformation.<\/font><\/p>\n<p><font size=\"2\">Ex. Table \/ column map between source and target, rules to identify unique rows, not null attributes, unique values, and range of a attributes, transformations rules, etc.<\/font><\/p>\n<p><font size=\"2\"><br \/>\n<\/font><font size=\"2\">Without going into further details of the document, lets analyze the very next step. It seems obvious and natural to start development of the of the ETL process. The ETL developer is all fired up and comes up with a design document and starts developing, few days time the code is ready for data loading.<\/font><br \/>\nBut unexpectedly (?) the code starts having issues every few days. Issues are found and fixed. And then it fails again. What\u00e2\u20ac\u2122s happening? Analysis was done properly; rules were chalked out &#038; implemented according to the mapping document. But why are issues popping up?\u00c2\u00a0 Was something missed?<br \/>\nMaybe not! Isn\u00e2\u20ac\u2122t it, normal to have more issues in the initial lifetime of the processes?<br \/>\n\u00c2\u00a0<br \/>\nMaybe Yes! You have surely missed \u00e2\u20ac\u02dcSource System Data Profiling\u00e2\u20ac\u2122. The business analyst has told you rules as the how the data is structured in the source system and how it is supposed to behave; but he\/she has not told you the \u00e2\u20ac\u02dcbuts and ifs\u00e2\u20ac\u2122 called as EXCEPTIONS for those rules.<br \/>\n\u00c2\u00a0<br \/>\nTo be realistic it is not possible for anyone to just read you all rules and exceptions like a parrot. You have to collaborate and dig the truth. The actual choice is yours, to do data profiling on the source system and try to break all the rules told by the analyst. Or you can choose to wait for the process to go live and then wakeup every night as the load fails.\u00c2\u00a0 If you are lucky enough you deal with an unhappy user every morning you go to the office.\u00c2\u00a0<br \/>\n\u00c2\u00a0<br \/>\nMake the right choice; don\u00e2\u20ac\u2122t miss \u00e2\u20ac\u02dcSource system data profiling\u00e2\u20ac\u2122 before actually righting a single line of code. Question every rule. Try to find exception to the rules. There must be at least 20 tables. One table on an average will have 30 columns; each column will have on an average 100k values. If you make matrix of number of tables * columns * data values, it will give the number of reasons the why your assumptions may be wrong. \u00c2\u00a0\u00c2\u00a0It\u00e2\u20ac\u2122s like unit testing source data even without loading. There is a reason why machines alone cannot do your job; there is reason why IT jobs are more paying.<br \/>\n\u00c2\u00a0<br \/>\nRemember, \u00e2\u20ac\u02dcfor every rule there is an exception; for each exception there are more exceptions\u00e2\u20ac\u00a6\u00e2\u20ac\u2122<\/p>\n","protected":false},"excerpt":{"rendered":"<p>To implement an ETL process there are many steps that are followed. One such step is creating a mapping document. This mapping document describes the data mapping between the source systems and the target and the rules of data transformation. Ex. Table \/ column map between source and target, rules to identify unique rows, not [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[34,20,23,33,1],"tags":[],"class_list":["post-32","post","type-post","status-publish","format-standard","hentry","category-data-profiling","category-data-quality","category-etl-exception-error-handling","category-source-system-analysis","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/etlguru.com\/index.php?rest_route=\/wp\/v2\/posts\/32","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/etlguru.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/etlguru.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/etlguru.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/etlguru.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=32"}],"version-history":[{"count":0,"href":"https:\/\/etlguru.com\/index.php?rest_route=\/wp\/v2\/posts\/32\/revisions"}],"wp:attachment":[{"href":"https:\/\/etlguru.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=32"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/etlguru.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=32"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/etlguru.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=32"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}