Wednesday, March 7, 2012

Performance of Fuzzy Lookup?

Are the any ways to improve the performance of Fuzzy Lookup transformation?

I've already "sharpened" my data flow as much as possible in that I'm performing an exact lookup prior to the fuzzy and only passing the rows I need to compare in the inputs and in the reference table.

In one of my address match flows I have around 2,000 input rows matching against about 600,000 reference rows and its taking several HOURS to complete on a 2 proc, 4GB win2003 server (32 bit).

I've looked at perfmon and isolated that the operation is bogged during the actual lookups against tempdb - not the building of fuzzy match indexes or loading the reference data. Buffers on TempDB show the same statement being executed every few seconds?

I'm using the June CTP because we have too much code to migrate and decided to wait until Nov. 7 - has this component been improved in releases after June?

Any help on this would be greatly appreciated.

There is a whitepaper about Fuzzy Lookup and Fuzzy Grouping. It has a section for performance analysis. You might find it useful. http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsql90/html/FzDTSSQL05.asp

No comments:

Post a Comment