Slow Running ETL process (read side) & Table Statistics (Oracle).

Sometimes an ETL process runs considerably slow speed. During test for the small result set it might fly but when a million rows are applied the performance takes a nosedive. There can be many reasons for slow ETL process.  The process can be slow because read, transformation, load. Lets eliminate the transformation and load for the sake of discussion.

For ETL process to be slow on read side here are some reasons .1. No indexes on joins and/or ‘where clause’ 2. Query badly written. 3. Source not analyzed.  Out of these three lets rule out 1 & 2.
In the past most of the databases had RULE based optimizer in the INIT.ORA file, but with new development and specially Data warehouses ‘CHOOSE’ optimizer is preferred. With ‘CHOOSE’ option the query uses COST based optimizer if the statistics are available for the tables in the query.

There are two methods to gather statistics 1. DBMS_STATS package, 2. ANALYZE command. Oracle does not recommend ANALYZE command going forward for various reasons (its a command, cannot analyze external tables, it gathers stats that are not essential, inaccurate stats on partitioned tables and indexes, in future ANALYZE will not support cost-based optimizer, no Monitoring for stale statistics, etc.)

DBMS_STATS.GATHER_SCHEMA_STATS (ownname =>’DWH’, options =>’GATHER AUTO’);
This Package will gather all necessary statistics automatically (there is no need to write a process to check for tables that have stale or no stats). Oracle implicitly determines which objects need new statistics, and determines how to gather those statistics. So once it is put in the ETL flow you can sleep at home and everybody will be happy.

Bottom Line:
After major data load the tables should be analyzed with an automated process. Also please do not use ANALYZE , as in future it will not collect statistics needed for COB. The code can be found here.

1This package should be used as the part of ETL workflow to make sure as batch processes are run the benefit of statistics are available to the next process in the que.

Also ETL programmers can call this package within the ETL process/procedure in the begening so that if the tables required for process are not analyzed they will be analyzed automatically. Also if the procedure is going to modify lot of data in the table then the package can be called just before the end of the procedure.

Leave a Reply