Showing posts with label commands. Show all posts
Showing posts with label commands. Show all posts

Friday, March 23, 2012

Performance problems with SQL commands in data flow task

SQL statement within an OLE DB Command component is extremely slow (hours, days). Same SQL statement executed within a query window of SQL Server Management Studio takes only a few seconds. Using a fairly simple SQL UPDATE statement against a table with only 21,000 rows. Query:

UPDATE Pearson_Load
SET Process_Flag = 'E',
Error_Msg = 'Error: Missing address elements Address_Line_1, City, and/or State'
WHERE (Address_Line_1 = ' '
OR City = ' '
OR State = ' ')
AND Process_Flag = ' '

Any suggestions on how to improve the performance of this task or an alternate solution are appreciated. Thank you.

Jeff-B wrote:

SQL statement within an OLE DB Command component is extremely slow (hours, days). Same SQL statement executed within a query window of SQL Server Management Studio takes only a few seconds. Using a fairly simple SQL UPDATE statement against a table with only 21,000 rows. Query:

UPDATE Pearson_Load
SET Process_Flag = 'E',
Error_Msg = 'Error: Missing address elements Address_Line_1, City, and/or State'
WHERE (Address_Line_1 = ' '
OR City = ' '
OR State = ' ')
AND Process_Flag = ' '

Any suggestions on how to improve the performance of this task or an alternate solution are appreciated. Thank you.

You should redirect those rows destined for update to a table and then use an Execute SQL task in the control flow to perform a set-based update. What you've got now is a new, distinct update command for every row on the update path. This is costly.|||

Thank you Phil! I just moved the queries (I actually had 4 separate queries) that I was executing as separate OLE DB Command components in the data flow task into an Execute SQL task in the control flow and the process ran in seconds. I don't think that is exactly what you meant, but I wasn't sure what you meant by the suggestion to "redirect those rows destined for update to a table and then use an Execute SQL task in the control flow to perform a set-based update".

If you have time to comment so I understand the problem correctly, what I was doing wrong by using a data flow task with a table as an OLE DB source was executing the SQL statement in each OLE DB Command component I defined 21,000 times - once for each row in the table. So instead of executing 4 distinct queries, I was really executing 84,000 queries. If that is the case, when is it OK (if ever) to use such a scenario? Should the SQL command being executed be defined to only work on the current table entry? What would the syntax look like?

|||

Jeff-B wrote:

Thank you Phil! I just moved the queries (I actually had 4 separate queries) that I was executing as separate OLE DB Command components in the data flow task into an Execute SQL task in the control flow and the process ran in seconds. I don't think that is exactly what you meant, but I wasn't sure what you meant by the suggestion to "redirect those rows destined for update to a table and then use an Execute SQL task in the control flow to perform a set-based update".

If you have time to comment so I understand the problem correctly, what I was doing wrong by using a data flow task with a table as an OLE DB source was executing the SQL statement in each OLE DB Command component I defined 21,000 times - once for each row in the table. So instead of executing 4 distinct queries, I was really executing 84,000 queries. If that is the case, when is it OK (if ever) to use such a scenario? Should the SQL command being executed be defined to only work on the current table entry? What would the syntax look like?

My suggestion of moving the data to a table was assuming you were doing a parameter-based update query.

Your understanding is correct. You were executing 84,000 updates, and generally there is never a good time to do that. If you need to perform an update in the data flow on all of those rows, it would be best to insert the changes into a separate table, to be used later in a set-based update.|||Thank you for your latest response and your help with this problem.|||

Jeff-B wrote:

Thank you for your latest response and your help with this problem.

Jeff,

According to your post you managed to achieve this with an Execute SQL Task. Am I correct?

If using an Execute SQL Task is an option for you then I would go with that over a data-flow every time. SSIS will almost never be able to perform quicker than a RDBMS engine.

-Jamie

|||

Jamie,

Yes, I did solve this using an Execute SQL Task. It was a rather straightforward solution with this particular package because I wasn't using a parameterized query. I may have to use what Phil initially suggested above for another, similar package but one that one uses parameters in the query. One parameter needs to be referenced in a sub-query which it isn't allowed. That limitation is what led me to use a data flow task. I just wasn't aware of the inefficiency of that tack. Thanks.

|||

Jeff-B wrote:

Jamie,

Yes, I did solve this using an Execute SQL Task. It was a rather straightforward solution with this particular package because I wasn't using a parameterized query. I may have to use what Phil initially suggested above for another, similar package but one that one uses parameters in the query. One parameter needs to be referenced in a sub-query which it isn't allowed. That limitation is what led me to use a data flow task. I just wasn't aware of the inefficiency of that tack. Thanks.

Caveat that with the fact that its efficient in certain circumstances - unfortunetely doing updates is one of those scenarios. That's due to the vary nature of updates.

-Jamie

sql

Friday, March 9, 2012

Performance of UPDATE commands on individual records

Ok, I'm at a crossroads in my program.
I've got a program that needs to throw an SQL update command to update
some individual records.

>From an efficiency standpoint, does SQL handle the UPDATE command
differently if the field is the same as the old field?
IE
Is it worth doing a string comparision on old vs new data in the
program, or should I just code to update all fields regardless and then
will the server optimize based on whether or not the data actually
changed?
Thanks,
Josh McFarlaneLogically SQL Server doesn't consider the diff at all.
If you want to only update small part of a large volume rows, doing string
comparision mostly yields better performance since that consumes less log
space.
For single line, I still suggest you "update when necessary", since some
triggers may sit there to enforce biz logic.
James
"Josh McFarlane" wrote:

> Ok, I'm at a crossroads in my program.
> I've got a program that needs to throw an SQL update command to update
> some individual records.
>
> differently if the field is the same as the old field?
> IE
> Is it worth doing a string comparision on old vs new data in the
> program, or should I just code to update all fields regardless and then
> will the server optimize based on whether or not the data actually
> changed?
> Thanks,
> Josh McFarlane
>|||Josh McFarlane (darsant@.gmail.com) writes:
> Ok, I'm at a crossroads in my program.
> I've got a program that needs to throw an SQL update command to update
> some individual records.
>
> differently if the field is the same as the old field?
> IE
> Is it worth doing a string comparision on old vs new data in the
> program, or should I just code to update all fields regardless and then
> will the server optimize based on whether or not the data actually
> changed?
Since it's a bit of work, I'm not sure that it's worth the effort, but
there are at least two scenarios where you can gain some performance.
One case is if you use merge replication. I'm not into replication myself,
but I got a question from a guy who is very good at replication, and he
wanted to reduce an update, so that only columns that were actually
changed were to be updated. Apparently, this made replication more
effective.
The other case concerns indexed columns. Consider this:
SELECT * INTO Orders FROM Northwind..Orders
create unique clustered on Orders
create index ix On Orders(CustomerID)
go
BEGIN TRANSACTION
UPDATE Orders
SET EmployeeID = 18
WHERE OrderID = 11000
At this point runs query from another window:
select count(*) from Orders WHERE CustomerID = 'RATTC'
RATTC is the customer id for order 11000. This query returns the
value 18 instantly, was not blocked. Now in the first window do this:
UPDATE Orders
SET EmployeeID = 118,
CustomerID = 'RATTC'
WHERE OrderID = 11000
and now try the SELECT COUNT(*) again. This time it will block.
If you are generatnig the UPDATE statement dynamically, and in client
code, then filtering on columns that have actually changed is probably
manageable. In a stored procedure it is just painful.
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx