{"id":14,"date":"2006-06-06T16:30:38","date_gmt":"2006-06-06T16:30:38","guid":{"rendered":"http:\/\/etlguru.com\/blog\/2006\/06\/06\/etl-delta-logic-de-normalization-of-data-model\/"},"modified":"2012-01-09T23:52:28","modified_gmt":"2012-01-09T23:52:28","slug":"etl-delta-logic-de-normalization-of-data-model","status":"publish","type":"post","link":"https:\/\/etlguru.com\/?p=14","title":{"rendered":"ETL delta logic &#038; de-normalization of data model."},"content":{"rendered":"<p>It is a normal practice in data warehouse to de normalizes (Or once auto corrected as <strong>demoralize<\/strong>) as the data model for performance. I am not going to discuss the benefits vs. issues with de-normalization.\u00c2\u00a0 As by the time it comes to the ETL guy the fate of the model is already decided.<\/p>\n<p>Let\u00e2\u20ac\u2122s look at the model in the source side, which is perfectly normalized.\u00c2\u00a0<br \/>\n<a href=\"http:\/\/etlguru.com\/blog\/?attachment_id=315\" rel=\"attachment wp-att-315\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/etlguru.com\/blog\/wp-content\/uploads\/NormalizedDM.jpg\" alt=\"\" title=\"NormalizedDM\" width=\"364\" height=\"314\" class=\"aligncenter size-full wp-image-315\" srcset=\"https:\/\/etlguru.com\/wp-content\/uploads\/NormalizedDM.jpg 364w, https:\/\/etlguru.com\/wp-content\/uploads\/NormalizedDM-300x258.jpg 300w\" sizes=\"auto, (max-width: 364px) 100vw, 364px\" \/><\/a><br \/>\n\u00c2\u00a0\u00c2\u00a0\u00c2\u00a0\u00c2\u00a0\u00c2\u00a0\u00c2\u00a0<br \/>\nNow let\u00e2\u20ac\u2122s look at the de normalized model on the target side.\u00c2\u00a0<br \/>\n<a href=\"http:\/\/etlguru.com\/blog\/?attachment_id=314\" rel=\"attachment wp-att-314\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/etlguru.com\/blog\/wp-content\/uploads\/DeNormalizedDM.jpg\" alt=\"\" title=\"DeNormalizedDM\" width=\"189\" height=\"234\" class=\"aligncenter size-full wp-image-314\" \/><\/a><br \/>\n\u00c2\u00a0<br \/>\nNext lets think of delta logic for loading of the dim_employee table. Ideally you would only check changes in the employee table. Then if there is any changes after the last load date time ; then get those rows from ref_employee and do the lookup to get the department &#038; the designation and load it into the target table.<br \/>\n\u00c2\u00a0<br \/>\nThe issue with this delta logic is that it has not considered the effect of de normalization of employee table on the target side. If you carefully look at the two de normalized attributes dept_name and emp_designation_desc, the ETL process will miss any changes in the parent tables, so only new employees or updated employee will get the new definition of department &#038; designation. And any employee that has not been updated in the source side will still have the same dept_name &#038; emp_designation_desc. This is wrong.<\/p>\n<p>The reason it is wrong is the ETL delta logic only picked the row from the employee table when it changed and ignored the changes in the dept &#038; designation tables. The truth of the matter is, \u00e2\u20ac\u009d For any de normalized target table data (affected rows) should be re-captured from the source, any time there is change in the driving\/core table as well as when there is change in any parent tables to which the driving table refers to.\u00e2\u20ac\u009d In this case, even if there is change in department or designation table, all the rows affected on the employee tables should be re-processed.<\/p>\n<p>It might seem very simple, but ETL developers\/designers\/modelers always miss this point. Also once developed it is very difficult to catch.<\/p>\n<p>The next question is how you would catch the affected rows. Well there are ways to write SQL that combine the three tables (in this case) and treat them as one single entity and the pull rows based on the any update_dttm greater than the last ETL run. Figure out the SQL\u00e2\u20ac\u00a6\u00c2\u00a0<\/p>\n","protected":false},"excerpt":{"rendered":"<p>It is a normal practice in data warehouse to de normalizes (Or once auto corrected as demoralize) as the data model for performance. I am not going to discuss the benefits vs. issues with de-normalization.\u00c2\u00a0 As by the time it comes to the ETL guy the fate of the model is already decided. Let\u00e2\u20ac\u2122s look [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[18,19,1],"tags":[],"class_list":["post-14","post","type-post","status-publish","format-standard","hentry","category-data-extraction","category-etl-automation","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/etlguru.com\/index.php?rest_route=\/wp\/v2\/posts\/14","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/etlguru.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/etlguru.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/etlguru.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/etlguru.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=14"}],"version-history":[{"count":5,"href":"https:\/\/etlguru.com\/index.php?rest_route=\/wp\/v2\/posts\/14\/revisions"}],"predecessor-version":[{"id":317,"href":"https:\/\/etlguru.com\/index.php?rest_route=\/wp\/v2\/posts\/14\/revisions\/317"}],"wp:attachment":[{"href":"https:\/\/etlguru.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=14"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/etlguru.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=14"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/etlguru.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=14"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}