So you really are going to use SSIS for you ETL project, then?

Project approaches on how to make the most of this SSIS 2005 thing.

  • Think Testing!
  • Logging
  • Configuration Control
  • Version Control System
  • Formal deployment
  • Recovery from Screw-Up
  • Multiple environment
  • Manage your 100's of SSIS packages

Think Testing! 

The allure of a GUI tool that allows you to drag-and-drop components that need the merest amount of configuring may just make you a little complacent about testing your SSIS packages properly. Do not fall into this trap! Assume nothing works until tested - especially an immature product like SSIS 2005 (which, now in 2008, still does not seem to have matured). Create a set of generic test container packages through which you can test each ETL package individually (and remember to regenerate the GUID after having cloned a templated test package!)

 

Logging

SSIS has pretty poor default logging for watching progress and diagnosing errors. Add VB-script logging code to all packages - the best place is in the event handlers for OnPreExecute and OnPostExecute events. Log to the database as well as to a flat file - at least least you will have something to go on if the database goes down. 

 

Configuration Control

This may seem obvious but I had recently witnessed again the choas of where one pathetic server at a UK-based travel company (and the equally pathetic IT manager) was by day the development system and by night the production system. No-one was quite certain what was in the production system and incomplete developments of the day surreptitiously became production code at night... What fun we had!

 

Version Control System

Unless you like to indulge inthe complex licensing of VS2005, BIDS, SQLServer, W2K3 and Microsoft's latest attempt at a version control system, Team Foundation Server (TFS) and have lots of money to spend, go straight for the free Subversion version control system and the Ankh tool for VS2005. Install a Subversion client on every SSIS target machine - this way you do not need a TFS client license for each machine (or an expensive TFS server license for that matter) and use Subversion to progress your SSIS packages from development to production. If you apply good version control practises, you will always know what you have deployed on each target device.

 

Formal deployment

Very obvious stuff again, but the allure of pretty graphical tools and the wrongly-assumed simplicity has now been confirmed to be a major cause of self-applied frontal lobotomies. Make all your SSIS-projects deployable as a matter of course.

 

Recovery from Screw-Up

Frequently when a package fails, the only recourse is to run the entire offending package again, which can be very wasteful. It is possible to resume the execution of a package from where it broke. However, the state of data needs to be persisted and be read back in when required: Dump data checkpoints to Raw File Destinations and when recovering from Screw-up, read data in through Raw File Sources.

 

Multiple environments

If you are in an organisation that has the wherewithal of hosting multiple SSIS environments (Development, Testing, Production), then you need to control the environment-related operations through a central configuration file. Simply change the configuration file when you deploy to a different environment

 

Manage your 100's of SSIS packages

You can easily end up with hundreds of SSIS packages in a large-ish ETL project. Since the MSDB repository does not readily display the deployed packages in a sensible order, you would do well to assign a 3-digit number to the beginning of the package's name, e.g. '081 Stage Daily Exchange Rates.dtsx'. Try to get the numbering scheme to suggest functional business areas, such as say, 101-199 => Accounts, rather than functional SSIS areas like, say, Input data staging. Also indicate container packages (packages that call other packages) by making the last digit 0, e.g. package 110 calls packages 111, 112, etc., package 100 calls packages 110, 120, etc,. and top-level package 000 calls packages 100, 200, etc. If you need to go deeper than 3 calling levels, make your numbering scheme 4 digits. Obviously.