{"id":12,"date":"2006-05-25T16:11:24","date_gmt":"2006-05-25T16:11:24","guid":{"rendered":"http:\/\/etlguru.com\/blog\/2006\/05\/25\/data-integration-basics-dimension-confirmation\/"},"modified":"2008-05-19T23:14:35","modified_gmt":"2008-05-19T23:14:35","slug":"data-integration-basics-dimension-confirmation","status":"publish","type":"post","link":"https:\/\/etlguru.com\/?p=12","title":{"rendered":"Data integration basics-Dimension Confirmation."},"content":{"rendered":"<p>The purpose of this topic is to establish the basics for design of ETL processes. With out the understanding an ETL process for data integration cannot be designed or developed.\u00c2\u00a0<\/p>\n<p>A database can have multiple sources. Multiple sources may contain a data set of entirely different subject areas, but some data set will intersect. Example Sales data and salary data will have employee as the common set.\u00c2\u00a0<\/p>\n<p><img decoding=\"async\" id=\"image57\" height=\"300\" alt=\"Data Set Intersection.JPG\" src=\"http:\/\/etlguru.com\/tmp\/Data%20Set%20Intersection.JPG\" \/>\u00c2\u00a0<\/p>\n<p>Between two or more sources, the subjects, entities or even attributes can be common. So can we integrate the data easily? Mathematically it seems very easy, but the real world is not just about numbers or exact same string values. Every thing can be similar or same but may be not represented in the exact same manner in all the sources. The differences in representation of same information and facts between two or more sources can create some of the most interesting challenges in Data Integration.<\/p>\n<p><strong>Data integration<\/strong>: -The <strong>first step<\/strong> in data integration is identification of common elements<br \/>\n1. Identify the common entities.<br \/>\nExample, Employee dimension can come from Sales system, Payroll System, etc. Products can come from manufacturing, sales, purchase etc. Once the common entity is identified its definition should be standardized. Example, does employee include fulltime employees as well as temporary workers?<br \/>\n\u00c2\u00a0<br \/>\n2. Identify the common attributes.<br \/>\nWhat are the attributes that are common to employee, 1st name, 2nd name, last name, date of joining, etc? Each attribute should be defined.<\/p>\n<p>3. Identify the common values<br \/>\nSame information can be represented in different forms in multiple source system. Example, male sex, can be represented as \u00e2\u20ac\u02dcM\u00e2\u20ac\u2122 or \u00e2\u20ac\u02dc1\u00e2\u20ac\u00b2 or \u00e2\u20ac\u02dcmale\u00e2\u20ac\u2122 or some thing else by each source system. A common representation must be\u00c2\u00a0decided (example, \u00e2\u20ac\u02dcMale\u00e2\u20ac\u2122). Also if necessary a finite set of values should be established. Example the employee sex = (\u00e2\u20ac\u02dcMale\u00e2\u20ac\u2122,\u00e2\u20ac\u2122 Female\u00e2\u20ac\u2122) and it will not allow more than these two values.<\/p>\n<p>The <strong>second step<\/strong> is the identification of Data Steward who will own the responsibility and ownership for particular set data elements.<\/p>\n<p>The <strong>third step<\/strong> is to design an ETL process to integrate the data into the target.\u00c2\u00a0 This is the most important area in the implementation of an ETL process for data integration. This topic will be discussed in more detailed under its own heading.<\/p>\n<p>The <strong>final fourth step<\/strong> is to establish a process of maintenance, review &#038; reporting of such elements.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The purpose of this topic is to establish the basics for design of ETL processes. With out the understanding an ETL process for data integration cannot be designed or developed.\u00c2\u00a0 A database can have multiple sources. Multiple sources may contain a data set of entirely different subject areas, but some data set will intersect. Example [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[16,14,15,1],"tags":[],"class_list":["post-12","post","type-post","status-publish","format-standard","hentry","category-enterprise-dimension-confirmation","category-etl-interface-architecture-tm","category-reference-data","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/etlguru.com\/index.php?rest_route=\/wp\/v2\/posts\/12","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/etlguru.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/etlguru.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/etlguru.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/etlguru.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=12"}],"version-history":[{"count":0,"href":"https:\/\/etlguru.com\/index.php?rest_route=\/wp\/v2\/posts\/12\/revisions"}],"wp:attachment":[{"href":"https:\/\/etlguru.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=12"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/etlguru.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=12"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/etlguru.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=12"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}