{"id":4,"date":"2006-04-23T03:19:19","date_gmt":"2006-04-23T03:19:19","guid":{"rendered":"http:\/\/etlguru.com\/blog\/2007\/07\/22\/slow-running-etl-process-read-side-table-statistics-oracle\/"},"modified":"2007-07-22T05:38:00","modified_gmt":"2007-07-22T05:38:00","slug":"slow-running-etl-process-read-side-table-statistics-oracle","status":"publish","type":"post","link":"https:\/\/etlguru.com\/?p=4","title":{"rendered":"Slow Running ETL process (read side) &#038; Table Statistics (Oracle)."},"content":{"rendered":"<p>Sometimes an ETL process runs considerably slow speed. During test for the small result set it might fly but when a million rows are applied the performance takes a nosedive. There can be many reasons for slow ETL process.\u00c2\u00a0 The process can be slow because read, transformation, load. Lets eliminate the transformation and load for the sake of discussion.<\/p>\n<p>For ETL process to be slow on read side here are some reasons .1. No indexes on joins and\/or \u00e2\u20ac\u02dcwhere clause\u00e2\u20ac\u2122 2. Query badly written. 3. Source not analyzed.\u00c2\u00a0 Out of these three lets rule out 1 &#038; 2.<br \/>\nIn the past most of the databases had RULE based optimizer in the INIT.ORA file, but with new development and specially Data warehouses \u00e2\u20ac\u02dcCHOOSE\u00e2\u20ac\u2122 optimizer is preferred. With \u00e2\u20ac\u02dcCHOOSE\u00e2\u20ac\u2122 option the query uses COST based optimizer if the statistics are available for the tables in the query.<\/p>\n<p>There are two methods to gather statistics 1. DBMS_STATS package, 2. ANALYZE command. Oracle does not recommend ANALYZE command going forward for various reasons (its a command, cannot analyze external tables, it gathers stats that are not essential, inaccurate stats on partitioned tables and indexes, in future ANALYZE will not support cost-based optimizer, no Monitoring for stale statistics, etc.)<\/p>\n<p><em>DBMS_STATS.GATHER_SCHEMA_STATS (ownname =>\u00e2\u20ac\u2122DWH\u00e2\u20ac\u2122, options =>\u00e2\u20ac\u2122GATHER AUTO\u00e2\u20ac\u2122);<\/em><br \/>\nThis Package will gather all necessary statistics automatically (there is no need to write a process to check for tables that have stale or no stats). Oracle implicitly determines which objects need new statistics, and determines how to gather those statistics. So once it is put in the ETL flow you can sleep at home and everybody will be happy.<\/p>\n<p><strong>Bottom Line:<\/strong><br \/>\nAfter major data load the tables should be analyzed with an automated process. Also please do not use ANALYZE\u00c2\u00a0, as in future\u00c2\u00a0it\u00c2\u00a0will not collect statistics needed for COB. The code can be found here.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" height=\"1\" alt=\"1\" src=\"http:\/\/web.archive.org\/web\/20070523031241\/http:\/\/geo.yahoo.com\/serv?s=76001405&#038;t=1145901579&#038;f=p7w3\" width=\"1\" \/>This package should\u00c2\u00a0be used as the part of ETL workflow to make sure as batch processes are run the benefit of statistics are available to the next process in the que.<\/p>\n<p>Also ETL programmers can call this package within the ETL process\/procedure in the begening so that if the tables required for process are not analyzed they will be analyzed automatically. Also if the procedure is going to modify lot of data in the\u00c2\u00a0table then the package can be called just before the end of the procedure.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Sometimes an ETL process runs considerably slow speed. During test for the small result set it might fly but when a million rows are applied the performance takes a nosedive. There can be many reasons for slow ETL process.\u00c2\u00a0 The process can be slow because read, transformation, load. Lets eliminate the transformation and load for [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4,3,6,1],"tags":[],"class_list":["post-4","post","type-post","status-publish","format-standard","hentry","category-database-tuning","category-etl-performance-tuning","category-oracle","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/etlguru.com\/index.php?rest_route=\/wp\/v2\/posts\/4","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/etlguru.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/etlguru.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/etlguru.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/etlguru.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=4"}],"version-history":[{"count":0,"href":"https:\/\/etlguru.com\/index.php?rest_route=\/wp\/v2\/posts\/4\/revisions"}],"wp:attachment":[{"href":"https:\/\/etlguru.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=4"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/etlguru.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=4"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/etlguru.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=4"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}