Friday, March 9, 2012

Performance of SSIS Vs Other ETL Tools

Hi All,

I'm working on a conversion project and I'm trying to compare performance of SSIS with Other ETL Tools, especially Informatica PowerCenter. Which one do you think is better ETL performer, when source and destination being SQL Server databases. Is there any benchmark available?

Thanks.

I'm sure that you have seen this, but for future reference, here is a link to jamie's article regarding this question....

http://blogs.conchango.com/jamiethomson/archive/2007/08/21/Informatica-_2D00_vs_2D00_-SSIS-pipeline-throughput.aspx

|||I have used Informatica, SSIS, ETI, and CloverETL.

All the above use Microsoft technologies to connect to SQLServer. Some use ODBC, others use ADO.NET and OLEDB. There are ways with a few of the technologies to configure a 'bulk load' process as an ETL target.

Since the ETL tools all use common Windows technologies for connections, there is very little difference in performance for simple data movement.

Data movement with transformations, sorting, aggregation, filtering, insert/update logic, and other 'intermediate actions' is where you see large differences. There is no overall winner on performance for these types of ETL actions in my opinion. The technology used to for each action dictates how fast things go. Also, if the ETL process uses a bulk read or load process, this can assist with performance increases, though it is only for the initial read and/or final load and not the intermediate actions.

Some examples:
ETI outputs source code you compile into a conversion. So, if you have a complex conversion with SQLServer you are going to compile and deploy a C/C++ program which will usually be much faster than any GUI/Engine-based ETL process. Its not really 'faster' than other tools, its that the approach of using a small C program to move the data is just inheriently faster on average.
CloverETL has a few screaming-fast transformation features, and they are probably fast due to the use of a RegEx engine. Having used JDBC though, I would estimate that DB reads and Writes are faster with ADO/OLEDB. This is not a knock, but a comparison of the technologies used.

It should be noted that I have used Informatica and ETI to create conversions so complex that they should not even be attempted in SSIS. This is where I believe that other tools can outshine SSIS - handling complexity with multiple databases for source and destination, lookups, and hundreds of intermediate actions in-between.

The term 'performance' also needs to be defined a bit here.
We test performance of SSIS based on the time for a Package to run.
If you run a package in the BIDS it usually will run a little slower than if you configure a job or execute DTEXEC.
If you run on 64-bit, it will tend to run a little faster than 32-bit.
Since there are a number of ways to execute a SSIS package, there are a number of ways one can perceive the performance.

No comments:

Post a Comment