performance monitoring: primary

Wednesday, March 28, 2012

Performance recommendation

Please give me some advice. In my application I calculate a list of identifiers (Guids) that are primary keys in my table and I have to retrieve those rows from the database. So my first approach is like

Code Snippet

SELECT id, c2 FROM t1 WHERE id IN (@.id1, @.id2, @.id3,....)

where @.idn are the calculated identifies as parameters. This approach does not scale well since there is a limit of parameters that can be used. So one possibility might be to use several SELECT statements, each with the maximum number of parameters. I can't believe that this is a good solution. A temporary table may be a better solution - I don't know. Are there any better ways to retrieve performantly - any recommandations?

Thanks a lot

Hans-Peter

Code Snippet

SELECT id, c2
FROM
t1 a
Join
(Select @.Id1 Index#
union all
Select @.Id2
union all
select @.Id3
...
...
union all
select @.Idn) as b
On
a.Id = b.Index#

|||

First, GUIDs as primary keys is not usually a good idea. See these articles for more information.

GUID -Identity and Primary Keys
http://sqlteam.com/item.asp?ItemID=2599

GUID -Is not Always GOOD
http://bloggingabout.net/blogs/wellink/archive/2004/03/15/598.aspx

GUID -The Cost of GUIDs as Primary Keys
http://www.informit.com/articles/article.asp?p=25862&rl=1

GUID -Uniqueidentifier vs. IDENTITY
http://sqlteam.com/item.asp?ItemID=283

Secondly, using a table variable could be useful.

|||

Arnie is correct that using GUID's for a Primary Key is not a good idea from a performance and maintenance point of view.

It is also a bad idea to use big IN clauses in the WHERE clause of a SELECT statement, since that will give you a large ad-hoc plan with a use count of 1 that will bloat your procedure cache. One work-around to avoid this is to add OPTION (RECOMPILE) to the end of your query, so SQL Server does not cache the plan (which won't be re-used anyway).

You can run this DMV to see if you are experiencing this problem:

Code Snippet

-- Find the ad-hoc queries that are bloating the plan cache

SELECT TOP(1000) *

FROM sys.dm_Exec_cached_plans

CROSS APPLY sys.dm_exec_sql_text(plan_handle)

WHERE cacheobjtype = 'Compiled Plan'

AND objtype = 'Adhoc' AND usecounts = 1

--AND size_in_bytes < 200000

ORDER BY size_in_bytes DESC

I recommend that you build a stored procedure that has say 20 or 50 input parameters, then have a SELECT statement that uses an OR for each parameter. Then call the SP as many times as you need to. You can just use duplicates if you have less than 20 or 50 values for a call.

Code Snippet

SELECT id, c2

FROM t1

WHERE id = @.id1

OR id = @.id2

OR id = @.id3....

|||

Thanks a lot for your recommendations! Let me explain my insight:

There are different ways to avoid the IN with many arguments. One way is that Bushan shows (select and union all parameters internally) and

one way to use ORs. I will try both ways to check performance improvement.

But I still have some questions:

Limits of parameters / command length|||

The best way to compare two alternative ways of doing it is to run the queries back to back in SSMS, with SET STATISTICS IO enabled, and the graphical execution plan turned on. Then you can compare the cost the batches, and you will be able see the percentage cost of each batch, and compare the IO cost also.

If you want to take caching out of it, you can run DBCC DROPCLEANBUFFERS and DBCC FREEPROCCACHE before each testing run, (but don't do that on a Production system).

Using a very big IN clause will really bloat your Procedure cache with single-use, ad-hoc query plans, which can really eat up some memory.

|||

Thanks for your advice, Glenn!

Hans-Peter

|||

Hi Hans-Peter,

are you building a DW or an OLTP system....

The usual case in DWs is to use generated integer keys all the time....

In OLTP using generated integer keys is often a good idea but it is by no means a hard and fast rule...

if you use integers like this then most queries are where clauses on attributes that link back to the integer...when you want to group things together and shorten where clauses you add attributes to group things.....

Best Regards

Monday, March 26, 2012

Performance Question

SQL Server 2000, MSDE

I have a table containing simple financial transactions. The 3 primary fields are:

TranSysID INT(PK)
TranDate DATETIME
TranAmount DECIMAL

TranAmount contains both positive (Credit) and negative (Debit) values.

I need to perform a query that returns the beginning balance and ending balance for a particular day.

BegBal = SELECT SUM(TranAmount) WHERE MONTH(TranDate) < 2 AND DAY(TranDate) < 1 AND YEAR(TranDate) < 2004

EndBal = SELECT SUM(TranAmount) WHERE MONTH(TranDate) < 2 AND DAY(TranDate) < 2 AND YEAR(TranDate) < 2004

SHOULD give me the beggining and ending balance for Feb 1, 2004.

My question is what indexes should I create to give me the best possible performance? Right now I have an index on the TranDate field. I do not on the TranAmount field.

TIA

--
Tim Morrison

------------------------

Vehicle Web Studio - The easiest way to create and maintain your vehicle related website.
http://www.vehiclewebstudio.comTim Morrison (sales@.kjmsoftware.com) writes:
> I have a table containing simple financial transactions. The 3 primary
> fields are:
> TranSysID INT(PK)
> TranDate DATETIME
> TranAmount DECIMAL
> TranAmount contains both positive (Credit) and negative (Debit) values.
> I need to perform a query that returns the beginning balance and ending
> balance for a particular day.
> BegBal = SELECT SUM(TranAmount) WHERE MONTH(TranDate) < 2 AND
> DAY(TranDate) < 1 AND YEAR(TranDate) < 2004
> EndBal = SELECT SUM(TranAmount) WHERE MONTH(TranDate) < 2 AND
> DAY(TranDate) < 2 AND YEAR(TranDate) < 2004
> SHOULD give me the beggining and ending balance for Feb 1, 2004.
> My question is what indexes should I create to give me the best possible
> performance? Right now I have an index on the TranDate field. I do not
> on the TranAmount field.

And that index on TranDate is not likely to be used, the way you have
written the query. This is because the column figures in expressions,
SQL Server cannot seek the index. I can possibly scan.

So you should write the query as:

SELECT @.date = '20020204'
SELECT BegBal = SUM(CASE WHEN TranDate < dateadd(DAY, -1, @.date)
THEN TranAmount
ELSE 0
END),
EndBal = SUM(Tranamount)
FROM tbl
WHERE TranDate < @.date

And you should have a non-clustered index on (TranDate, TranAmount)

The reason that for the somewhat complicated expression on BegBal, is
that I would expect most queries to be on recent dates, so in fact,
you will have to traverse most rows. Better then to only do it once.

Since you are traversing allmost all rows, you are in fact scanning the
data. But, by having both the date and the amount in the index, SQL Server
does not have to read the data pages, but only the narrower non-
clustered index. If there are more columns than you are showing, for
instance an account number, you need to add that column to the index
as well.

In the end you may find that you get better performance, by having this
data computed in advance. This requires more work to maintain the data.
The system I work is about financial transactions (securities trading),
and we have tables with all balances pre-computed.

--
Erland Sommarskog, SQL Server MVP, sommar@.algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||Thank you. It may be possible that I am worrying for nothing. I expect about
10-15 rows added per day. It may be 3 years before I start to notice a
slowdown...

Tim

"Erland Sommarskog" <sommar@.algonet.se> wrote in message
news:Xns947A6DF7DA17Yazorman@.127.0.0.1...
> Tim Morrison (sales@.kjmsoftware.com) writes:
> > I have a table containing simple financial transactions. The 3 primary
> > fields are:
> > TranSysID INT(PK)
> > TranDate DATETIME
> > TranAmount DECIMAL
> > TranAmount contains both positive (Credit) and negative (Debit) values.
> > I need to perform a query that returns the beginning balance and ending
> > balance for a particular day.
> > BegBal = SELECT SUM(TranAmount) WHERE MONTH(TranDate) < 2 AND
> > DAY(TranDate) < 1 AND YEAR(TranDate) < 2004
> > EndBal = SELECT SUM(TranAmount) WHERE MONTH(TranDate) < 2 AND
> > DAY(TranDate) < 2 AND YEAR(TranDate) < 2004
> > SHOULD give me the beggining and ending balance for Feb 1, 2004.
> > My question is what indexes should I create to give me the best possible
> > performance? Right now I have an index on the TranDate field. I do not
> > on the TranAmount field.
> And that index on TranDate is not likely to be used, the way you have
> written the query. This is because the column figures in expressions,
> SQL Server cannot seek the index. I can possibly scan.
> So you should write the query as:
> SELECT @.date = '20020204'
> SELECT BegBal = SUM(CASE WHEN TranDate < dateadd(DAY, -1, @.date)
> THEN TranAmount
> ELSE 0
> END),
> EndBal = SUM(Tranamount)
> FROM tbl
> WHERE TranDate < @.date
> And you should have a non-clustered index on (TranDate, TranAmount)
> The reason that for the somewhat complicated expression on BegBal, is
> that I would expect most queries to be on recent dates, so in fact,
> you will have to traverse most rows. Better then to only do it once.
> Since you are traversing allmost all rows, you are in fact scanning the
> data. But, by having both the date and the amount in the index, SQL Server
> does not have to read the data pages, but only the narrower non-
> clustered index. If there are more columns than you are showing, for
> instance an account number, you need to add that column to the index
> as well.
> In the end you may find that you get better performance, by having this
> data computed in advance. This requires more work to maintain the data.
> The system I work is about financial transactions (securities trading),
> and we have tables with all balances pre-computed.
> --
> Erland Sommarskog, SQL Server MVP, sommar@.algonet.se
> Books Online for SQL Server SP3 at
> http://www.microsoft.com/sql/techin.../2000/books.asp

Wednesday, March 28, 2012

Performance recommendation

Monday, March 26, 2012

Performance Question

performance monitoring

Blog Archive

About Me