Top Tips for SSIS 2005
My collection of tips for developing SSIS 2005:
- Use flat files for performance on large data
- Use Pipe-delimiters in your flat file
- Give your Packages a sensible order in your project
Use flat files for performance on large data
A rule of thumb is that if an extract is more than 1,000,000 records, the ETL process is guaranteed to be faster if you extract to a flat file first (either on the source server and file-transfer it to your target environment, or on your target environment), and to then bulk-load the file into your staging area on your database in a separate data stream.
Use Pipe-delimiters in your flat file
SSIS 2005 does not handle embedded commas in text fields very well. Do not assume that because Excel can read your flat filethat SSIS can read it too. Even the old DTS2000 copes better with flat files than SSIS 2005 and since there is no real way to fix this, ask your data supplier for pipe-delimited flat files, or convert your flat file to a pipe-delimited file. Here is a batch script to do just this - edit it to suit you:
- :
- : Run this to a create a PIPE-Delimited file from a CSV
- : file to disambigate SSIS's behaviour.
- :
- : Parameters: 1 - Path of file to be cleaned.
- :
- : Trim lines and remove pipes
- : sed -e "s/[ \t]*$//g" -i %1
- : sed -e "s/|//g" -i %1
- : Replace "," with pipes and strip line-begin and -end inverted commas
- : sed -e "s/\",\"/|/g" -e "s/^\"//" -e "s/\"$//" -i %1
- : Unescape inverted commas
- : sed -e "s/\"\"/\"/g" -i %1
- sed -e "s/[ \t]*$//g" -e "s/|//g" -e "s/\",\"/|/g" -e "s/^\"//" -e "s/\"$//" -e "s/\"\"/\"/g" -i %1
You can get a copy of sed bundled with a collection of other useful Unix-like utilities here.
Order your packages in your project
Is it not obvious that SSIS packages are listed in alphabetical order in Visual Studio? It certainly is the case for C# etc,. but here it seems to be listed in an order based on creation time, or some perhaps GUID-ordered - I haven't figured it out yet. You can manually fix the order by editing the project XML-file, [ProjectName].dtproj, in which you move the <DtsPackage>..</DtsPackage> - items around to suit your preferred order:
- <DTSPackages>
- ...
- <DtsPackage>
- <Name>001 StageAccounts.dtsx</Name>
- <FullPath>001 StageAccounts.dtsx</FullPath>
- <References />
- </DtsPackage>
- ...
- </DTSPackages>