performance monitoring: Performance problems with SQL commands in data flow task

SQL statement within an OLE DB Command component is extremely slow (hours, days). Same SQL statement executed within a query window of SQL Server Management Studio takes only a few seconds. Using a fairly simple SQL UPDATE statement against a table with only 21,000 rows. Query:

UPDATE Pearson_Load
SET Process_Flag = 'E',
Error_Msg = 'Error: Missing address elements Address_Line_1, City, and/or State'
WHERE (Address_Line_1 = ' '
OR City = ' '
OR State = ' ')
AND Process_Flag = ' '

Any suggestions on how to improve the performance of this task or an alternate solution are appreciated. Thank you.

Jeff-B wrote:
SQL statement within an OLE DB Command component is extremely slow (hours, days). Same SQL statement executed within a query window of SQL Server Management Studio takes only a few seconds. Using a fairly simple SQL UPDATE statement against a table with only 21,000 rows. Query:
UPDATE Pearson_Load
SET Process_Flag = 'E',
Error_Msg = 'Error: Missing address elements Address_Line_1, City, and/or State'
WHERE (Address_Line_1 = ' '
OR City = ' '
OR State = ' ')
AND Process_Flag = ' '
Any suggestions on how to improve the performance of this task or an alternate solution are appreciated. Thank you.

You should redirect those rows destined for update to a table and then use an Execute SQL task in the control flow to perform a set-based update. What you've got now is a new, distinct update command for every row on the update path. This is costly.|||

Thank you Phil! I just moved the queries (I actually had 4 separate queries) that I was executing as separate OLE DB Command components in the data flow task into an Execute SQL task in the control flow and the process ran in seconds. I don't think that is exactly what you meant, but I wasn't sure what you meant by the suggestion to "redirect those rows destined for update to a table and then use an Execute SQL task in the control flow to perform a set-based update".

If you have time to comment so I understand the problem correctly, what I was doing wrong by using a data flow task with a table as an OLE DB source was executing the SQL statement in each OLE DB Command component I defined 21,000 times - once for each row in the table. So instead of executing 4 distinct queries, I was really executing 84,000 queries. If that is the case, when is it OK (if ever) to use such a scenario? Should the SQL command being executed be defined to only work on the current table entry? What would the syntax look like?

|||

Jeff-B wrote:
Thank you Phil! I just moved the queries (I actually had 4 separate queries) that I was executing as separate OLE DB Command components in the data flow task into an Execute SQL task in the control flow and the process ran in seconds. I don't think that is exactly what you meant, but I wasn't sure what you meant by the suggestion to "redirect those rows destined for update to a table and then use an Execute SQL task in the control flow to perform a set-based update".
If you have time to comment so I understand the problem correctly, what I was doing wrong by using a data flow task with a table as an OLE DB source was executing the SQL statement in each OLE DB Command component I defined 21,000 times - once for each row in the table. So instead of executing 4 distinct queries, I was really executing 84,000 queries. If that is the case, when is it OK (if ever) to use such a scenario? Should the SQL command being executed be defined to only work on the current table entry? What would the syntax look like?

My suggestion of moving the data to a table was assuming you were doing a parameter-based update query.

Your understanding is correct. You were executing 84,000 updates, and generally there is never a good time to do that. If you need to perform an update in the data flow on all of those rows, it would be best to insert the changes into a separate table, to be used later in a set-based update.|||Thank you for your latest response and your help with this problem.|||

Jeff-B wrote:
Thank you for your latest response and your help with this problem.

Jeff,

According to your post you managed to achieve this with an Execute SQL Task. Am I correct?

If using an Execute SQL Task is an option for you then I would go with that over a data-flow every time. SSIS will almost never be able to perform quicker than a RDBMS engine.

-Jamie

|||

Jamie,

Yes, I did solve this using an Execute SQL Task. It was a rather straightforward solution with this particular package because I wasn't using a parameterized query. I may have to use what Phil initially suggested above for another, similar package but one that one uses parameters in the query. One parameter needs to be referenced in a sub-query which it isn't allowed. That limitation is what led me to use a data flow task. I just wasn't aware of the inefficiency of that tack. Thanks.

|||

Jeff-B wrote:

Jamie,
Yes, I did solve this using an Execute SQL Task. It was a rather straightforward solution with this particular package because I wasn't using a parameterized query. I may have to use what Phil initially suggested above for another, similar package but one that one uses parameters in the query. One parameter needs to be referenced in a sub-query which it isn't allowed. That limitation is what led me to use a data flow task. I just wasn't aware of the inefficiency of that tack. Thanks.

Caveat that with the fact that its efficient in certain circumstances - unfortunetely doing updates is one of those scenarios. That's due to the vary nature of updates.

-Jamie

sql

Friday, March 23, 2012

Performance problems with SQL commands in data flow task

No comments:

Post a Comment

performance monitoring

Blog Archive

About Me