performance monitoring: Performance mystery -- SP vs script

I have encountred situations like this before, but this one
has me stumped.

I have a pretty simple SP that collects information about
residential properties from a large database. First step is to
query on the basis of address or location, and collect a temp
table of property IDs. Second step is to populate a composite
table of property information by joining the IDs to a table of
characterisitics. Third step is to update some fields by finding
a single value from multiple candidates in large tables -- one
has 275 million, another 325 million rows -- e.g., the price of
the most recent sale for a property.

As an SP, this takes absolutely forever, and it seems it's doing
endless scans of the large tables. So to analyze it, I took the
code and ran it as a script -- turned the parameter definition at
the top into a DECLARE statement, set values for the variables
that are the input parameters, no other changes, and go. Presto!
It runs in no time flat, and the query plans reveal it's using
the indexes just like it's supposed to. But the SP might take
an hour to do the same thing.

Any suggestions about what to look for? I believe both versions
have fresh query plans -- I have recompiled (and dropped and
recreated) the SP, and the plain script should have a fresh plan.
Maybe it's because the parameter values are known when the script
runs, but not when the SP is complied? I would really appreciate
any pointers, and can provide more information as needed.

Thanks,
Jim GeissmanPS -- it's the update statements that are the problem.
Even if the temp table has only one record, four parallel
threads are spawned to search through a large table
hunting for the ten or so records for that property -- even
though thereare indexes aplenty allowing it to go directly
to the appropriate records.|||You might want to try recompiling the stored procedure. It's possible
that when the stored procedure was first run and a query plan was
created, there were different indexes or the tables did not hold the
same data. The stored procedure might still be trying to use that same
query plan even though the data and/or indexes have changed.

HTH,
-Tom.|||That sure sounds reasonable, Tom, however nothing has changed
with the tables for a week or more, and the SP has only existed
for two days. One change I did make that seems to have led to
this was:

I had approximately this formerly:

update t set amt=e.amt, date=e.date,name=n.name
from #temp t join event e on e.propid=t.propid
join name n on n.propid=e.propid and n.eventid=e.eventid
where e.eventtype=... and e.date=(select max(date)
from event e1 where e1.propid=t.propid and...)

I split it into two separate update queries, one for the
date & amount, another for the name, because the two
should each be the most recent (with some conditions),
but don't have to be from the same event. The joins
dropped from three tables to two. That change broke it.|||There's an index that has everything needed to locate the
amount -- propid, date, eventtype, and amount, in that order.
However, it is being scanned on the basis of date and event
type (millions of rows) instead of going directly to the
correct handful of rows by adding propid. Is there a way to
change this behavior?|||On 9 Mar 2005 15:38:30 -0800, jim_geissman@.countrywide.com wrote:

>PS -- it's the update statements that are the problem.
>Even if the temp table has only one record, four parallel
>threads are spawned to search through a large table
>hunting for the ten or so records for that property -- even
>though thereare indexes aplenty allowing it to go directly
>to the appropriate records.

Which version of SQL Server are you using? I've had problems with SQL Server
7 grossly misestimating the time to execute query plans, thinking it has
exceeded the parallelism threshold, and choosing to use parallelism when it
actually hurts performance significantly. Try changing your server's
parallelism threshold to 5x or 10x the default value.|||(jim_geissman@.countrywide.com) writes:
> I have encountred situations like this before, but this one
> has me stumped.
> I have a pretty simple SP that collects information about
> residential properties from a large database. First step is to
> query on the basis of address or location, and collect a temp
> table of property IDs. Second step is to populate a composite
> table of property information by joining the IDs to a table of
> characterisitics. Third step is to update some fields by finding
> a single value from multiple candidates in large tables -- one
> has 275 million, another 325 million rows -- e.g., the price of
> the most recent sale for a property.
> As an SP, this takes absolutely forever, and it seems it's doing
> endless scans of the large tables. So to analyze it, I took the
> code and ran it as a script -- turned the parameter definition at
> the top into a DECLARE statement, set values for the variables
> that are the input parameters, no other changes, and go. Presto!
> It runs in no time flat, and the query plans reveal it's using
> the indexes just like it's supposed to. But the SP might take
> an hour to do the same thing.

Without seeing the code, and not having information about the table etc,
the best I can offer is wild guesses and standard recommendations.

Since replacing the parameters with variables appeared to give effect,
it seems you have a workaround. (Copy the parameters to local variables.)
Then again, there are a couple of more possibilities.

If the issue is indeed parameters vs. variables, then it could be a
case of psrameter sniffing when you don't want it. "Parameter sniffing"
is when SQL Server uses the values of the input parameters on the
first invocation to build the query plan for the stored procedure.
When it comes to variables, SQL Server does not know their values
before and builds the plan from standard assumptions. Usually parameter
sniffing is good, but say that you have:

CREATE PROCEDURE sniff_sp @.a datetime ... AS
...
IF @.a IS NULL SELECT @.a = convert(char(8), @.a, 112)

And on first invocation, you call the procedure with @.a = NULL. That's
a value you don't use, so it will not be good for the plan.

Another alternative is that the stored procedure was created with
any of ANSI_NULLS or QUOTED_IDENTIFIER off, and the procedure involves
access to an indexed view or an indexed computed columns. The two
mentioned settings must be ON for such indexes to be used, and these
two settings are saved with the stored procedure. You can use
the Objectproperty() function to investigate this.

There may be other reasons as well, as the optimizer being to optimistic
about the benefits of parallelism, or too pessimistic about the usefulness
of a non-clustered index. You can use OPTION (MAXDOP 1) to turn of
parallelism. You can also try to force the use of a certain index, but
you should be careful, because a using a non-clustered index when
there are too many hits and be real disaster performancewise.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp

Saturday, February 25, 2012

Performance mystery -- SP vs script

No comments:

Post a Comment

performance monitoring

Blog Archive

About Me