Friday, March 30, 2012

Performance tuning for a stored procedure

I have an SP that is big, huge, 700-800 lines.

I am not an expert but I need to figure out every possible way that
I can improve the performance speed of this SP.

In the next couple of weeks I will work on preparing SQL statements
that will create the tables, insert sample record and run the SP.
I would hope people will look at my SP and give me any hints on how
I can better write the SP.

In the meantime, after looking at the SP briefly, my first observations are:

1- use SET NOCOUNT ON
2- avoid using UNION statements
3- use WITH (NOLOCK) with each SELECT statement
4- avoid using NESTED Select statements
5- use #temp tables
6- avoid renaming tables in SELECT statements, for example SELECT * FROM
tblClients C

Am i correct to consider the above 6 points as valid things in terms of
causing
performance problems?

I would appreciate any comments/help

Thank you very muchHi Serge,

On Thu, 9 Sep 2004 00:13:31 -0400, serge wrote:

>I have an SP that is big, huge, 700-800 lines.
>I am not an expert but I need to figure out every possible way that
>I can improve the performance speed of this SP.
>In the next couple of weeks I will work on preparing SQL statements
>that will create the tables, insert sample record and run the SP.
>I would hope people will look at my SP and give me any hints on how
>I can better write the SP.
>In the meantime, after looking at the SP briefly, my first observations are:
>1- use SET NOCOUNT ON

This will not affect performance, but it is good practice. Many clients
choke on the numerous "(n row(s) affected)" messages returned by an SP
without this setting.

>2- avoid using UNION statements

Depends. Sometimes, a UNION can be quicker than the alternative. I've seen
cases where a query with a complicated WHERE clause that was evaluated
with a table scan was rewritten as two (or more) queries with UNION; each
of those queries was resolved with an index and the UNION version ran a
lot quicker.

You can gain performance by using UNION ALL instead of UNION where
possible. Sometimes, this can be made possible by extending the WHERE
clause of one of the queries in the UNION (though you have to be carefull
that the cost of this extension doesn't outweigh the gain of UNION ALL
over UNION!).

Run the following examples and check execution plans and statistics on I/O
and time used for execution:

USE pubs
SELECT au_id, au_lname, au_fname
FROM authors
WHERE au_id LIKE '2%'
OR au_lname = 'Ringer'
GO
SELECT au_id, au_lname, au_fname
FROM authors
WHERE au_id LIKE '2%'
UNION
SELECT au_id, au_lname, au_fname
FROM authors
WHERE au_lname = 'Ringer'
GO
SELECT au_id, au_lname, au_fname
FROM authors
WHERE au_id LIKE '2%'
UNION ALL
SELECT au_id, au_lname, au_fname
FROM authors
WHERE au_lname = 'Ringer'
AND au_id NOT LIKE '2%'
GO

>3- use WITH (NOLOCK) with each SELECT statement

This can gain you some performance (especially if concurrency is high on
your database), but you run the risk of getting dirty reads. If that risk
is acceptable, go ahead. If you don't know what a dirty read is, then
don't use this option.

>4- avoid using NESTED Select statements

Again: depends. If you can safely and easily replace the nested select (or
subquery, as it's usually called) with other code, do so, then test both
versions to see if performance actually has improved (if often won't
improve, as SQL Server's optimizer already uses the same execution plan).

Similar, if you can safely replace a correlated subquery with a
non-correlated, do so and test both versions.

But if removing the subquery means that you have to code lots more SQL, it
might hurt performance instead of improving it. And if you can gain some
performance by replacing an intuitive subquery with a contrived and hard
to understand query, then you might want to reconsider if you really value
performance higher than maintainability. One day, you will find youself
staring at that query, wondering what the %$# that ^%#$&%# query is
supposed to do.

>5- use #temp tables

At the risk of repeating myself: depends. If you find the same subquery
used over and over in the procedure, it MIGHT help performance if you
execute that subquery into a #temp table and use that for the rest of the
execution. It MIGHT also help further to index the temp table. But, again,
it might also hurt performance - creating the temp table and storing the
data induces some overhead as well and if you're not careful, you might be
faced with numerous recompilationms of the stored procedure that wouldn't
be needed without the temp table.

If you use a temp table to break a complicated query down in steps, you
have a good chance of degrading performance. In one complicated query, the
optimizer may choose an execution plan that you would never think of but
that's faster than the obvious way to execute it; if you dictate the steps
by executing them seperateely with a temp table for intermediate results,
you take a lot of options from the optimizer. Of course, there is also the
consideration of maintainability and readability of your code, so you
might choose to accept the performance degradation, just so that you will
understand your code when (not if!!) you (or someone else) have to get
back to it later.

>6- avoid renaming tables in SELECT statements, for example SELECT * FROM
>tblClients C

I've never heard that using a table alias (as this is called) would hurt
performance. If you have any evidence of this, please point me to it. I
would be highly surprised.

In fact, using an alias is absolutely needed when you use the same table
more than once in a query and when you use derived table; in all other
cases (except for single-table queries or very short table names) I'd also
heartily recommend using an alias. Do choose a mnemonic alias, not just a
random one or two letter combination!

>Am i correct to consider the above 6 points as valid things in terms of
>causing
>performance problems?

See above. And you might also want to take a look at this site:
http://www.sql-server-performance.com/

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)|||As always, Hugo has already given an excellent response. Just some
additions here...

> 3- use WITH (NOLOCK) with each SELECT statement

Don't use this, unless there is a specific performance problem with
respect to the statement in question. And even then you should only use
it after you have assessed whether the risks are acceptable.

> 6- avoid renaming tables in SELECT statements, for example SELECT * FROM
> tblClients C

Nonsense.

> 2- avoid using UNION statements

This is not a goal. Your goal should be to write statements that are
easy to read and understand and are effective. It starts there. In
general, you should not use GROUP BY, DISTINCT or UNION (without ALL) if
you don't need it. But if you do need it, then go for it, SQL-Server is
optimized for their use.

> 4- avoid using NESTED Select statements

Start with a set orientation in mind. That way you will automatically
avoid most unnecessary nested selects. The ones that remain are probably
your best choice.

> 5- use #temp tables

Don't use intermediate tables unless you have to. If you have to then
#temp tables are usually better than creating a permanent table and
dropping it at the end.

Gert-Jan
--
(Please reply only to the newsgroup)|||serge (sergea@.nospam.ehmail.com) writes:
> I have an SP that is big, huge, 700-800 lines.

Hey, you should see some of our procedures. We have one that is 3000 lines
long!

But, OK, 700 lines is no small size for a stored procedure.

Hugo and Gert-Jan has already pointed out weakness in your observations,
but I like to make some supplemental comments.

> 2- avoid using UNION statements

This is a bad rule. Sometimes UNION may be the wrong solution. Sometimes
it is the right. Even if we are talking from a performance perspective.

> 3- use WITH (NOLOCK) with each SELECT statement

Only do this, if there are unavioadble table scans, and you are really
are experience contention problems - *and* if you can accept that the
results are not consistent.

> 4- avoid using NESTED Select statements

Again, not a very good rule. But it depends a little on what you mean.
Say that you have:

SELECT a, b, (SELECT SUM(c) FROM B WHERE B.col = A.col) = c
FROM A
WHERE col2 BETWEEN 12 AND 19

My observation is that a rewrite using a derived table often gives better
performance:

SELECT A.a, A.b, B.c
FROM A
JOIN (SELECT c = SUM(c), col FROM B GROUP BY col) AS B ON A.col = B.col
WHERE col2 BETWEEN 12 AND 19

This may look expensive if B is large and there are only a handful of
values between 12 and 19 in A. But the above is only a logical description
of the query. The optimizer may recast computation order, as long as the
result is the same, and often does with a very good result.

The same thing applies to update queries:

UPDATE A
SET c = (SELECT SUM(c) FROM B WHERE A.col = B.col)
WHERE col2 BETWEEN 12 AND 19

While the above is ANSI-compliant, this is usually more effective:

UPDATE A
SET c = B.c
FROM A
JOIN (SELECT c = SUM(c), col FROM B GROUP BY col) AS B ON A.col = B.col
WHERE col2 BETWEEN 12 AND 19

You must benchmark all such changes. It may not always be the best thing
to do.

> 5- use #temp tables

This is a very complex topic. Yes, it can sometimes be a good thing to
save intermediate results in a temp table. But temp tables can also
cause performance problems, since if you fill up a temp table, SQL Server
may opt to recompile the procedure. And recompiling a 700 line stored
procedure can easily take a few seconds. This can be evaded, by using
table variables instead. Table variabels never causes recompilations.
But then again, you may want those recompilations, because it can slash
the execution time of the procedure from three hours to two minutes.

Here is a real-life story about a procedure optimization that I did
some time ago. The code originated from a stored procedure that I had
written in 1997 for MS SQL 6.0, and used a couple of temp tables on
which a bunch of operations were performed. At a customer site, this
procedure took too long time. I tried a lot of tricks in the book,
but few gave any effect.

Eventually, I replaced the most of the temp-table manipulation with a
50+ line SELECT statement which performed a FULL JOIN of three table
sources, whereof at least one was a derived table. (And a three-way
full join requires at least one derived table in itself, because you
have to full-join two by two.)

So why I did not write it this way in 1997? Well, at that time SQL Server
did not have FULL JOIN or derived tables. Also, the requirements of the
original procedure was different, and more complex that the current one.

> 6- avoid renaming tables in SELECT statements, for example SELECT * FROM
> tblClients C

No, use aliases. The impact on execution on performance is neglible, but
the impact on developer performance should not be ignored.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||>>1- use SET NOCOUNT ON
> This will not affect performance, but it is good practice. Many clients
> choke on the numerous "(n row(s) affected)" messages returned by an SP
> without this setting.

From the SQL 2000 BOL:

<Excerpt href="http://links.10026.com/?link=tsqlref.chm::/ts_set-set_3ed0.htm">
For stored procedures that contain several statements that do not return
much actual data, this can provide a significant performance boost because
network traffic is greatly reduced.
</Excerpt
--
Hope this helps.

Dan Guzman
SQL Server MVP|||On Fri, 10 Sep 2004 11:30:32 GMT, Dan Guzman wrote:

>>>1- use SET NOCOUNT ON
>>
>> This will not affect performance, but it is good practice. Many clients
>> choke on the numerous "(n row(s) affected)" messages returned by an SP
>> without this setting.
>
>From the SQL 2000 BOL:
><Excerpt href="http://links.10026.com/?link=tsqlref.chm::/ts_set-set_3ed0.htm">
>For stored procedures that contain several statements that do not return
>much actual data, this can provide a significant performance boost because
>network traffic is greatly reduced.
></Excerpt
Hi Dan,

Yes, you're right. My bad. Thanks for pointing out my error!

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)|||Thank you all for your responses.

A lot of new things for me in your answers, I'll have to read them many
times
and take the time to fully understand them.

Thanks again.|||> 3- use WITH (NOLOCK) with each SELECT statement

This post is for everyone who gave me answers about the WITH (NOLOCK).

I've been using WITH (NOLOCK) in my own applications for a few years.
And i do know about Dirty Reads. It's basically the SELECT statement will
not
wait for a transaction with a lock to finish before getting the data. For
example,
if a transaction is at the stage of modifying a column from 'USA' to
'Canada',
the SELECT statement will fetch me USA even though the record
is locked and in the process of having the column's value being changed to
Canada.

Am i correct with my explanation?

Now, i have an SP with 300 lines with a lot tables, joins, left joins and
one or
a few unions.

If i run this SP with parameters in the SQL Query Analyzer, it takes 1
second
to return 11 rows. However, the front-end application which is ASP based,
when calls this SP, it takes 23-25 seconds for the ASP page to load!

If i use WITH (NOLOCK) on all the tables, the ASP page loads in 4-5
seconds!

I am planning on understanding this SP and all other SPs and find all the
ways possible to better write it in order to improve the code and have
better performance results. This will be a slow process.

Until then, if my explanation of WITH (NOLOCK) is valid and that there
is only 1 side-effect, then maybe we can temporarily use WITH (NOLOCK)
for the short term.

As always, I appreciate any feedback.

Thank you very much|||Hi Serge,

On Tue, 14 Sep 2004 00:23:04 -0400, serge wrote:

>I've been using WITH (NOLOCK) in my own applications for a few years.
>And i do know about Dirty Reads. It's basically the SELECT statement will
>not
>wait for a transaction with a lock to finish before getting the data. For
>example,
>if a transaction is at the stage of modifying a column from 'USA' to
>'Canada',
>the SELECT statement will fetch me USA even though the record
>is locked and in the process of having the column's value being changed to
>Canada.
>Am i correct with my explanation?

Only partially. It depends on how "far" the processing of the other
transaction has gone. If the update has already performed but the
transaction is not yet finished, the SELECT statement with nolock hint
will return the value 'Canada'.

This can cause quite unexpected side effects when the other transaction
later has to be rolled back - you'll have read a value that logically
never even existed in the database. It might even be that the rollback
occured BECAUSE the value 'Canada' violates a business rule - it's
inserted in the table first, then the trigger starts that checks the
business rules and a rollback is initiated if the business rules were
violated. Normal locking behaviour ensures that nobody ever sees the
"illegal" value 'Canada', as this row is locked until the transaction
finishes; performing dirty read means that you run the risk of returning
this "illegal" value.

>If i run this SP with parameters in the SQL Query Analyzer, it takes 1
>second
>to return 11 rows. However, the front-end application which is ASP based,
>when calls this SP, it takes 23-25 seconds for the ASP page to load!
>If i use WITH (NOLOCK) on all the tables, the ASP page loads in 4-5
>seconds!

If the SP executes in 1 second from QA, then you should be able to see
comparable performance from any other client. I suspect that this
difference is caused by something in the ASP code. As I have never used
ASP myself, I'll leave it to others to comment on this.

Your solution to use dirty reads looks more like a workaround than like a
fix. I hope someone more ASP-savvy then me can help you find the real
cause of the delay.

And I suggest you think very long and very hard about what harm might be
caused if your SP reads and uses data changed by an unfinished transaction
that might even be rolled back.

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)|||Hi Hugo,

Your reply was very informative for me.

I liked your explanation of the "illegal" value. It's true, i didn't think
about Dirty Reads like you explained, where the value is actually 'Canada'
but that it hasn't been finalized and the record had been committed.

As far ASP having a problem, yes it has a serious problem the ASP page.
When i run the profiler, bunch of SPs get executed THREE straight times!!!
This i find very bad and I'll have to investigate why the repeat (i guess
the page
is being refreshed 3 times thus triggering all SPs to be executed three
times).

And yes I am considering WITH (NOLOCK) as a workaround for now and
not a fix.

I'll keep investigating.

Thank you very much Hugo.

> >I've been using WITH (NOLOCK) in my own applications for a few years.
> >And i do know about Dirty Reads. It's basically the SELECT statement will
> >not
> >wait for a transaction with a lock to finish before getting the data. For
> >example,
> >if a transaction is at the stage of modifying a column from 'USA' to
> >'Canada',
> >the SELECT statement will fetch me USA even though the record
> >is locked and in the process of having the column's value being changed
to
> >Canada.
> >Am i correct with my explanation?
> Only partially. It depends on how "far" the processing of the other
> transaction has gone. If the update has already performed but the
> transaction is not yet finished, the SELECT statement with nolock hint
> will return the value 'Canada'.
> This can cause quite unexpected side effects when the other transaction
> later has to be rolled back - you'll have read a value that logically
> never even existed in the database. It might even be that the rollback
> occured BECAUSE the value 'Canada' violates a business rule - it's
> inserted in the table first, then the trigger starts that checks the
> business rules and a rollback is initiated if the business rules were
> violated. Normal locking behaviour ensures that nobody ever sees the
> "illegal" value 'Canada', as this row is locked until the transaction
> finishes; performing dirty read means that you run the risk of returning
> this "illegal" value.
>
> >If i run this SP with parameters in the SQL Query Analyzer, it takes 1
> >second
> >to return 11 rows. However, the front-end application which is ASP based,
> >when calls this SP, it takes 23-25 seconds for the ASP page to load!
> >If i use WITH (NOLOCK) on all the tables, the ASP page loads in 4-5
> >seconds!
> If the SP executes in 1 second from QA, then you should be able to see
> comparable performance from any other client. I suspect that this
> difference is caused by something in the ASP code. As I have never used
> ASP myself, I'll leave it to others to comment on this.
> Your solution to use dirty reads looks more like a workaround than like a
> fix. I hope someone more ASP-savvy then me can help you find the real
> cause of the delay.
> And I suggest you think very long and very hard about what harm might be
> caused if your SP reads and uses data changed by an unfinished transaction
> that might even be rolled back.|||On Tue, 14 Sep 2004 08:02:05 -0400, serge wrote:

>As far ASP having a problem, yes it has a serious problem the ASP page.
(snip)
>I'll keep investigating.

Hi Serge,

You might consider looking in the microsoft.public hierarchy of
newsgroups. There are lots of them with "asp" somewhere in the name, so
there's a good chance you'll find an answer to this in one of those
groups. I know too little (read: nothing <g>) about asp to recognise which
group might be the best suited.

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)|||serge (sergea@.nospam.ehmail.com) writes:
> If i run this SP with parameters in the SQL Query Analyzer, it takes 1
> second to return 11 rows. However, the front-end application which is
> ASP based, when calls this SP, it takes 23-25 seconds for the ASP page
> to load!

If the ASP code calls it three times, there is something to be fixed.

But exactly how was it run? Was any of the calls preceded by
SET FMTONLY ON? FMTONLY it a sort of NOEXEC thing, SQL Server only
returns information about result sets.

Assuming that your ASP code uses, ADO, make sure that you use
CommantType = adStoredProcedure. Do not use adCommandText with an
EXEC statement or a ODBC call syntax.

If there there is an indexed view or indexed computed column involved
somewhere, this can explain the difference between ASP and QA. In QA,
the setting ARITHABORT is ON by default, but it's off when you use
ADO. Thus, issuing SET ARITHABORT ON (or setting it default for the
database with ALTER DATABASE or for the server with sp_configure could
give some effect.)

Also, SET NOCOUNT ON is good.

> If i use WITH (NOLOCK) on all the tables, the ASP page loads in 4-5
> seconds!

This could indicate that there is some blocking. Are you alone on the
server when it takes 25 seconds to run? Unless there are concurrency
issues, I find it difficult to believe that (NOLOCK) has that drastic
effect. There is some cost for locks, but I find it difficult to believe
that it is that huge.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||A lot of good information you've provided me here. I will have to look at
the ASP
code and investigate all your points.

To answer your question. Yes i am alone on the server when it takes 25
seconds to run.
Checking SQL Profiler, i see 3 sets of the same SPs being called and the
main SP
that is slow takes about 6.5 seconds to run, multiply that by 3, that's 20
seconds.
The asp page is also calling other SPs, totaling 23-25 seconds.

I'll have to investigate much further and go through your points and other
people's points
i've received and find out what i can fix/improve.

Thanks again!

> If the ASP code calls it three times, there is something to be fixed.
> But exactly how was it run? Was any of the calls preceded by
> SET FMTONLY ON? FMTONLY it a sort of NOEXEC thing, SQL Server only
> returns information about result sets.
> Assuming that your ASP code uses, ADO, make sure that you use
> CommantType = adStoredProcedure. Do not use adCommandText with an
> EXEC statement or a ODBC call syntax.
> If there there is an indexed view or indexed computed column involved
> somewhere, this can explain the difference between ASP and QA. In QA,
> the setting ARITHABORT is ON by default, but it's off when you use
> ADO. Thus, issuing SET ARITHABORT ON (or setting it default for the
> database with ALTER DATABASE or for the server with sp_configure could
> give some effect.)
> Also, SET NOCOUNT ON is good.
> > If i use WITH (NOLOCK) on all the tables, the ASP page loads in 4-5
> > seconds!
> This could indicate that there is some blocking. Are you alone on the
> server when it takes 25 seconds to run? Unless there are concurrency
> issues, I find it difficult to believe that (NOLOCK) has that drastic
> effect. There is some cost for locks, but I find it difficult to believe
> that it is that huge.

No comments:

Post a Comment