Archive for November, 2007 ETL Strategies and Solutions for Data Warehouse -Contents of ETL Book

Monday, November 5th, 2007

Section A ,The Beginning
01 ETL The Basics
02 ETL Strategy

Section B, Analysis
03 Target Systems Analysis
04 Source Systems Analysis
05 Source Target Mapping- Part I
06 Understanding Data Quality
07 Data Profiling

Section C, Develop Part I
08 Understanding Data Patterns for ETL
09 Simple ETL Development

Section D,  ETL Architecture & Design
10 ETL & Data Integration
11 ETL-IA (Interface Architecture)
12 ETL-IA Implementation
13 Designing Standard ETL Templates

Section E, Develop Part II
14 File Management & Transportation
15 Extraction
16 Staging Data
17 ETL Transformation Development
18 Unit Testing ETL Processes
19 Coding Wrappers
20 Automation of ETL Processes

Section F, Migration
21 Migration ETL Processes

Section G, Post Production
22 Reference Data Management
23 Exception & Error Management
24 Production Support & Change Management For ETL Processes
25 ETL & Performance Tuning

Section H, Other
26 ETL Tools
27 ETL & Metadata Management

Slower Development Databases and Servers

Friday, November 2nd, 2007

It is a normal trend in it to buy the most powerful machine for production usage. Example you production database box will have 16 CPUs your QA 8 and your Development box 4 or 6. Similar ratio is maintained on ETL servers, Application servers, and Hard Disk performance.

Logic being the production environment is critical for end user experience.
Agreed! That the response time is critical for end users; however that does not mean you buy slower machine for your development. Imagine the time wasted while a resource is sitting in front of the machine, waiting for the query to return with data. A developer usually spends all his time working in the development database. So imagine a development box slower by 50% then production. That means a developer is loosing 50% of his time waiting for his result. It’s not just about his time but also about his focus and concentration that he will loose because his experience is bad with the system. It will be not an exaggeration to say that he will loose 4 hours from an 8 hours day.

In US total cost on an ETL developer is about $100 an hour. So the company is loosing $400 per developer per day. You do the math for 5 to 10 developers working for 1 year.

I am not saying that Production and Development systems must be same, but I firmly believe the difference should be more in the reliability and availability of the servers rather then on the performance difference between the servers.

I thought of this article while waiting for a query to return from a development box. It took exactly half the time as the production server and I remembered that the management is already discussing of upgrading the development server to improve the performance.