Archive for the ‘SQL Server’ Category

Why Informatica sequences & Sybase/SQL server identity columns, should not be used?

Sunday, May 7th, 2006

Informatica provides the sequence object to create surrogate keys during load. These object is also sharable within various mappings hence can be used in parallel. But I will recommend never using it. 
Here’s the Reason why….

1. MIGRATION ISSUE:  The sequence is stored in the Informatica repository. That means it is disconnected from the target database. So during migration of code between environments the sequence has to be reset manually. Bigger problem arise when the tables with data is brought from production environment to QA environment; the mapping cannot be immediately run because the sequences are out of sync.

2. Sequences belong to the target schema and the database and do not belong to the processes as the table object should define & hold values for the attributes and not something external to the system.

3. At first it might seem that Sybase/SQL server does not support sequences but checkout the post & that will solve the problem.

4. Logically it seems that Informatica calls to oracle procedure will slowdown the ETL process but in real world, I have found it not to be true. Additionally this process is only called when a new reference data/Dimension is added. New reference data is not as volatile as transactions; so the adverse effect is nullified.

5. If Identity columns are used instead, it causes more problems as you loose programmatic control on it. Example any type II dimension changes are become a nightmare to manage.

Simulating Oracle Sequences in Sybase & SQL Server

Wednesday, April 26th, 2006

Programmatic control is lost when identity columns are used in Sybase and SQL Server. I do not recommend using Identity columns to create surrogate keys during ETL process. There are many more reasons for that. Oracle has the sequence feature which is used extensively by Oracle programmers. I have no clue why other vendors are not providing the same. This custom code has been used extensively by me and thoroughly tested. I ran multiple processes simultaneously to check if there is deadlock and also made sure that the process returns different sequences to different client process.

Notes: –

1. The table should have ‘ROW LEVEL LOCKING’

2. The sequence generator process is stateless (See more details in Object Oriented Programming)

3. Create one row for each target table in the sequence master table. Do not try to use one sequence for multiple tables. It will work but probably is not a good idea.

Step 1: -Create a table with following structure.

CREATE TABLE sequence_master (
sequence_nm varchar (55) NOT NULL ,
sequence_num integer NOT NULL
)
GO

Step 2: -Create a stored procedure that will return the next sequence.

CREATE PROCEDURE p_get_next_sequence
@sequence_name varchar(100)
AS
BEGIN
DECLARE @sequence_num INTEGER
— Returns an error if sequence row is entered into the table.
SET @sequence_num = -1

UPDATE sequence_master
SET @sequence_num = sequence_num = sequence_num + 1
WHERE Sequence_name = @sequence_name

RETURN @sequence_num
END
GO