Wednesday, March 28, 2012

Performance recommendation

Please give me some advice. In my application I calculate a list of identifiers (Guids) that are primary keys in my table and I have to retrieve those rows from the database. So my first approach is like

Code Snippet

SELECT id, c2 FROM t1 WHERE id IN (@.id1, @.id2, @.id3,....)

where @.idn are the calculated identifies as parameters. This approach does not scale well since there is a limit of parameters that can be used. So one possibility might be to use several SELECT statements, each with the maximum number of parameters. I can't believe that this is a good solution. A temporary table may be a better solution - I don't know. Are there any better ways to retrieve performantly - any recommandations?

Thanks a lot

Hans-Peter


Code Snippet

SELECT id, c2
FROM
t1 a
Join
(Select @.Id1 Index#
union all
Select @.Id2
union all
select @.Id3
...
...
union all
select @.Idn) as b
On
a.Id = b.Index#


|||

First, GUIDs as primary keys is not usually a good idea. See these articles for more information.

GUID -Identity and Primary Keys
http://sqlteam.com/item.asp?ItemID=2599

GUID -Is not Always GOOD
http://bloggingabout.net/blogs/wellink/archive/2004/03/15/598.aspx

GUID -The Cost of GUIDs as Primary Keys
http://www.informit.com/articles/article.asp?p=25862&rl=1

GUID -Uniqueidentifier vs. IDENTITY
http://sqlteam.com/item.asp?ItemID=283

Secondly, using a table variable could be useful.

|||

Arnie is correct that using GUID's for a Primary Key is not a good idea from a performance and maintenance point of view.

It is also a bad idea to use big IN clauses in the WHERE clause of a SELECT statement, since that will give you a large ad-hoc plan with a use count of 1 that will bloat your procedure cache. One work-around to avoid this is to add OPTION (RECOMPILE) to the end of your query, so SQL Server does not cache the plan (which won't be re-used anyway).

You can run this DMV to see if you are experiencing this problem:

Code Snippet

-- Find the ad-hoc queries that are bloating the plan cache

SELECT TOP(1000) *

FROM sys.dm_Exec_cached_plans

CROSS APPLY sys.dm_exec_sql_text(plan_handle)

WHERE cacheobjtype = 'Compiled Plan'

AND objtype = 'Adhoc' AND usecounts = 1

--AND size_in_bytes < 200000

ORDER BY size_in_bytes DESC

I recommend that you build a stored procedure that has say 20 or 50 input parameters, then have a SELECT statement that uses an OR for each parameter. Then call the SP as many times as you need to. You can just use duplicates if you have less than 20 or 50 values for a call.

Code Snippet

SELECT id, c2

FROM t1

WHERE id = @.id1

OR id = @.id2

OR id = @.id3....

|||

Thanks a lot for your recommendations! Let me explain my insight:

There are different ways to avoid the IN with many arguments. One way is that Bushan shows (select and union all parameters internally) and

one way to use ORs. I will try both ways to check performance improvement.

But I still have some questions:

Limits of parameters / command length|||

The best way to compare two alternative ways of doing it is to run the queries back to back in SSMS, with SET STATISTICS IO enabled, and the graphical execution plan turned on. Then you can compare the cost the batches, and you will be able see the percentage cost of each batch, and compare the IO cost also.

If you want to take caching out of it, you can run DBCC DROPCLEANBUFFERS and DBCC FREEPROCCACHE before each testing run, (but don't do that on a Production system).

Using a very big IN clause will really bloat your Procedure cache with single-use, ad-hoc query plans, which can really eat up some memory.

|||

Thanks for your advice, Glenn!

Hans-Peter

|||

Hi Hans-Peter,

are you building a DW or an OLTP system....

The usual case in DWs is to use generated integer keys all the time....

In OLTP using generated integer keys is often a good idea but it is by no means a hard and fast rule...

if you use integers like this then most queries are where clauses on attributes that link back to the integer...when you want to group things together and shorten where clauses you add attributes to group things.....

Best Regards

No comments:

Post a Comment