Monday, March 26, 2012
Performance question.
I have a table which has arround 30 millions rows.
Table structure is as following..
CREATE TABLE TestTable
(Id INT, --which is PK,
EmpId INT, --There is a non cluster index on it.
DeptName VARCHAR(50),
Hours NUMERIC(5,2),
Tdate DATETIME,
ProjectNumber smallint,
.and few more columns
.
.
)
And i have following query, which is taking arround 1 minute 10 sec to run.
SELECT
DeptName,
EmpId,
SUM(CASE WHEN ProjectNumber = 11 THEN Hours ELSE 0
END) AS FinHours,
SUM(CASE WHEN ProjectId = 12 THEN Hours ELSE 0
END) AS HrHours,
SUM(CASE WHEN ProjectId = 13 THEN Hours ELSE 0
END) AS TaxHours,
FROM TestTable WHERE Tdate between @.Date1 and @.Date2
GROUP BY
DeptName,
EmpId
I do not have index on ProjectNumber column because this column will have
only
200 distinct values.
If i create index on Group by Columns, would it improve performance ?
Pls let me know, how can i imporve performance ?
Thanks.I don't know about the rest of your queries or your usage patterns, but the
most obvious choice in this case is to make the PK nonclustered and create a
clustered index on the Tdate column to support your WHERE clause.
Adam Machanic
Pro SQL Server 2005, available now
http://www.apress.com/book/bookDisplay.html?bID=457
--
"mvp" <mvp@.discussions.microsoft.com> wrote in message
news:569B7FD3-452C-4B8F-9056-06798A32B3EA@.microsoft.com...
> Hello Everybody,
> I have a table which has arround 30 millions rows.
> Table structure is as following..
> CREATE TABLE TestTable
> (Id INT, --which is PK,
> EmpId INT, --There is a non cluster index on it.
> DeptName VARCHAR(50),
> Hours NUMERIC(5,2),
> Tdate DATETIME,
> ProjectNumber smallint,
> .and few more columns
> .
> .
> )
> And i have following query, which is taking arround 1 minute 10 sec to
> run.
>
> SELECT
> DeptName,
> EmpId,
> SUM(CASE WHEN ProjectNumber = 11 THEN Hours ELSE 0
> END) AS FinHours,
> SUM(CASE WHEN ProjectId = 12 THEN Hours ELSE 0
> END) AS HrHours,
> SUM(CASE WHEN ProjectId = 13 THEN Hours ELSE 0
> END) AS TaxHours,
> FROM TestTable WHERE Tdate between @.Date1 and @.Date2
> GROUP BY
> DeptName,
> EmpId
> I do not have index on ProjectNumber column because this column will have
> only
> 200 distinct values.
> If i create index on Group by Columns, would it improve performance ?
> Pls let me know, how can i imporve performance ?
>
> Thanks.|||also you may try an index on all the columns involved in the query,
Tdate first if the interval is narrow, DeptName, EmpId first if the
interval is wide|||Your query indicates that you are using the following columns: DeptId,
EmpId, ProjectNumber, ProjectId, TDate. This means that SQL Server will have
to look at all of the rows being returned regardless of whether an index
exists on your grouped columns. This will be the case unless you were to
create a covering index for all of the columns being returned. In your case,
that's a lot of columns so I don't recommend it.
According to your DDL, I don't see an index on Tdate. I would actually start
with that. However, depending on the number of rows that are being returned
from your query, the optimizer may or may not even choose to use that index
(due to the expense of bookmark lookup). However, I would attempt a
non-clustered index on Tdate first.
Assuming you're not using the data for anything else (or much else), Adam's
method could be the best route. However, this would result in larger indexes
for all of the nonclustered indexes on this table.
Since all nonclustered indexes also include the clustered index key, and
your key is going from a 4-byte data type to an 8-byte data type and add up
114MB to each index on your table. Combined with the fact that a
"uniquifier" is applied to all non-unique clustered indexes, could add
another 4 bytes to your rows and bring each nonclustered index up to 228 MB.
"Adam Machanic" <amachanic@.hotmail._removetoemail_.com> wrote in message
news:%237v%239uXBGHA.892@.TK2MSFTNGP12.phx.gbl...
>I don't know about the rest of your queries or your usage patterns, but the
>most obvious choice in this case is to make the PK nonclustered and create
>a clustered index on the Tdate column to support your WHERE clause.
>
> --
> Adam Machanic
> Pro SQL Server 2005, available now
> http://www.apress.com/book/bookDisplay.html?bID=457
> --
>
> "mvp" <mvp@.discussions.microsoft.com> wrote in message
> news:569B7FD3-452C-4B8F-9056-06798A32B3EA@.microsoft.com...
>|||"Jared Ko" <JaredKo05@.sanbeiji.removethispart.com> wrote in message
news:uPOE8AZBGHA.736@.TK2MSFTNGP10.phx.gbl...
> Since all nonclustered indexes also include the clustered index key, and
> your key is going from a 4-byte data type to an 8-byte data type and add
> up 114MB to each index on your table. Combined with the fact that a
> "uniquifier" is applied to all non-unique clustered indexes, could add
> another 4 bytes to your rows and bring each nonclustered index up to 228
> MB.
Slight correction: The uniquifier is only added to non-unique rows, not
every row. So if the majority are unique (which we might expect from a
DATETIME column), the uniquifier will add very little overhead.
Adam Machanic
Pro SQL Server 2005, available now
http://www.apress.com/book/bookDisplay.html?bID=457
--|||Thanks Adam. I was frantically searching for that information while I was
writing my response. I was thinking that was the case but a couple of web
sites I hit suggested otherwise.
"Adam Machanic" <amachanic@.hotmail._removetoemail_.com> wrote in message
news:OybGaZZBGHA.2356@.tk2msftngp13.phx.gbl...
> "Jared Ko" <JaredKo05@.sanbeiji.removethispart.com> wrote in message
> news:uPOE8AZBGHA.736@.TK2MSFTNGP10.phx.gbl...
> Slight correction: The uniquifier is only added to non-unique rows, not
> every row. So if the majority are unique (which we might expect from a
> DATETIME column), the uniquifier will add very little overhead.
>
> --
> Adam Machanic
> Pro SQL Server 2005, available now
> http://www.apress.com/book/bookDisplay.html?bID=457
> --
>
>|||Best is to test for yourself. Have to similar tables, populate them with bun
ch of rows. One unique,
the other all with same value. Check size of the index. that is how I conclu
ded that uniqifier is
only added for the duplicates.
Tibor Karaszi, SQL Server MVP
http://www.karaszi.com/sqlserver/default.asp
http://www.solidqualitylearning.com/
Blog: http://solidqualitylearning.com/blogs/tibor/
"Jared Ko" <JaredKo05@.sanbeiji.removethispart.com> wrote in message
news:uyOPGjZBGHA.3984@.TK2MSFTNGP14.phx.gbl...
> Thanks Adam. I was frantically searching for that information while I was
writing my response. I
> was thinking that was the case but a couple of web sites I hit suggested o
therwise.
> "Adam Machanic" <amachanic@.hotmail._removetoemail_.com> wrote in message
> news:OybGaZZBGHA.2356@.tk2msftngp13.phx.gbl...
>|||"Tibor Karaszi" <tibor_please.no.email_karaszi@.hotmail.nomail.com> wrote in
message news:OK1JSoZBGHA.3472@.TK2MSFTNGP09.phx.gbl...
> Best is to test for yourself. Have to similar tables, populate them with
> bunch of rows. One unique, the other all with same value. Check size of
> the index. that is how I concluded that uniqifier is only added for the
> duplicates.
I took the lazy way out. _Inside SQL Server 2000_, page 412:
"If your clustered index was not created with the UNIQUE property, SQL
Server adds a 4-byte field when necessary to make each key unique."
Adam Machanic
Pro SQL Server 2005, available now
http://www.apress.com/book/bookDisplay.html?bID=457
--|||Just a question - wouldn't horizontal partitioning be useful in this case?
of course apart from index on the Tdate field.
Peter
Performance Question
I have a table containing simple financial transactions. The 3 primary fields are:
TranSysID INT(PK)
TranDate DATETIME
TranAmount DECIMAL
TranAmount contains both positive (Credit) and negative (Debit) values.
I need to perform a query that returns the beginning balance and ending balance for a particular day.
BegBal = SELECT SUM(TranAmount) WHERE MONTH(TranDate) < 2 AND DAY(TranDate) < 1 AND YEAR(TranDate) < 2004
EndBal = SELECT SUM(TranAmount) WHERE MONTH(TranDate) < 2 AND DAY(TranDate) < 2 AND YEAR(TranDate) < 2004
SHOULD give me the beggining and ending balance for Feb 1, 2004.
My question is what indexes should I create to give me the best possible performance? Right now I have an index on the TranDate field. I do not on the TranAmount field.
TIA
--
Tim Morrison
------------------------
Vehicle Web Studio - The easiest way to create and maintain your vehicle related website.
http://www.vehiclewebstudio.comTim Morrison (sales@.kjmsoftware.com) writes:
> I have a table containing simple financial transactions. The 3 primary
> fields are:
> TranSysID INT(PK)
> TranDate DATETIME
> TranAmount DECIMAL
> TranAmount contains both positive (Credit) and negative (Debit) values.
> I need to perform a query that returns the beginning balance and ending
> balance for a particular day.
> BegBal = SELECT SUM(TranAmount) WHERE MONTH(TranDate) < 2 AND
> DAY(TranDate) < 1 AND YEAR(TranDate) < 2004
> EndBal = SELECT SUM(TranAmount) WHERE MONTH(TranDate) < 2 AND
> DAY(TranDate) < 2 AND YEAR(TranDate) < 2004
> SHOULD give me the beggining and ending balance for Feb 1, 2004.
> My question is what indexes should I create to give me the best possible
> performance? Right now I have an index on the TranDate field. I do not
> on the TranAmount field.
And that index on TranDate is not likely to be used, the way you have
written the query. This is because the column figures in expressions,
SQL Server cannot seek the index. I can possibly scan.
So you should write the query as:
SELECT @.date = '20020204'
SELECT BegBal = SUM(CASE WHEN TranDate < dateadd(DAY, -1, @.date)
THEN TranAmount
ELSE 0
END),
EndBal = SUM(Tranamount)
FROM tbl
WHERE TranDate < @.date
And you should have a non-clustered index on (TranDate, TranAmount)
The reason that for the somewhat complicated expression on BegBal, is
that I would expect most queries to be on recent dates, so in fact,
you will have to traverse most rows. Better then to only do it once.
Since you are traversing allmost all rows, you are in fact scanning the
data. But, by having both the date and the amount in the index, SQL Server
does not have to read the data pages, but only the narrower non-
clustered index. If there are more columns than you are showing, for
instance an account number, you need to add that column to the index
as well.
In the end you may find that you get better performance, by having this
data computed in advance. This requires more work to maintain the data.
The system I work is about financial transactions (securities trading),
and we have tables with all balances pre-computed.
--
Erland Sommarskog, SQL Server MVP, sommar@.algonet.se
Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||Thank you. It may be possible that I am worrying for nothing. I expect about
10-15 rows added per day. It may be 3 years before I start to notice a
slowdown...
Tim
"Erland Sommarskog" <sommar@.algonet.se> wrote in message
news:Xns947A6DF7DA17Yazorman@.127.0.0.1...
> Tim Morrison (sales@.kjmsoftware.com) writes:
> > I have a table containing simple financial transactions. The 3 primary
> > fields are:
> > TranSysID INT(PK)
> > TranDate DATETIME
> > TranAmount DECIMAL
> > TranAmount contains both positive (Credit) and negative (Debit) values.
> > I need to perform a query that returns the beginning balance and ending
> > balance for a particular day.
> > BegBal = SELECT SUM(TranAmount) WHERE MONTH(TranDate) < 2 AND
> > DAY(TranDate) < 1 AND YEAR(TranDate) < 2004
> > EndBal = SELECT SUM(TranAmount) WHERE MONTH(TranDate) < 2 AND
> > DAY(TranDate) < 2 AND YEAR(TranDate) < 2004
> > SHOULD give me the beggining and ending balance for Feb 1, 2004.
> > My question is what indexes should I create to give me the best possible
> > performance? Right now I have an index on the TranDate field. I do not
> > on the TranAmount field.
> And that index on TranDate is not likely to be used, the way you have
> written the query. This is because the column figures in expressions,
> SQL Server cannot seek the index. I can possibly scan.
> So you should write the query as:
> SELECT @.date = '20020204'
> SELECT BegBal = SUM(CASE WHEN TranDate < dateadd(DAY, -1, @.date)
> THEN TranAmount
> ELSE 0
> END),
> EndBal = SUM(Tranamount)
> FROM tbl
> WHERE TranDate < @.date
> And you should have a non-clustered index on (TranDate, TranAmount)
> The reason that for the somewhat complicated expression on BegBal, is
> that I would expect most queries to be on recent dates, so in fact,
> you will have to traverse most rows. Better then to only do it once.
> Since you are traversing allmost all rows, you are in fact scanning the
> data. But, by having both the date and the amount in the index, SQL Server
> does not have to read the data pages, but only the narrower non-
> clustered index. If there are more columns than you are showing, for
> instance an account number, you need to add that column to the index
> as well.
> In the end you may find that you get better performance, by having this
> data computed in advance. This requires more work to maintain the data.
> The system I work is about financial transactions (securities trading),
> and we have tables with all balances pre-computed.
> --
> Erland Sommarskog, SQL Server MVP, sommar@.algonet.se
> Books Online for SQL Server SP3 at
> http://www.microsoft.com/sql/techin.../2000/books.asp
Friday, March 9, 2012
performance of select
Table a
sri int, PK, Clustered
num varchar(7) Nonclustered
.
.
.
p uniqueidentifier Nonclustered
Table s
num varchar(7), PK,Clustered
ssi tinyint --can have either 1 or 2
IF EXISTS ( SELECT sri FROM a WITH (NOLOCK)
INNER JOIN s WITH (NOLOCK)
ON a.num = s.num
WHERE a.p IS NULL AND s.ssi = 1 )
PRINT 'YES'
--uses index s
IF EXISTS ( SELECT sri FROM a WITH (NOLOCK)
INNER JOIN s WITH (NOLOCK)
ON a.num = s.num
WHERE a.p IS NULL AND s.ssi = 2 )
PRINT 'YES'
--uses index scan on num, takes around 8-25 seconds. Have any thoughts why
it takes so long, how to make this use an index s
GROUP BY ssi
How many rows have an ssi value of 2, compared to an ssi value of 1? If
there are a lot more rows that have ssi = 2 than 1, then the optimizer will
choose an index scan, because it would be the better method to find matching
rows through the select statement.
"S" wrote:
> Here is DDL in my way. Other information is masked.
> Table a
> sri int, PK, Clustered
> num varchar(7) Nonclustered
> .
> .
> .
> p uniqueidentifier Nonclustered
> Table s
> num varchar(7), PK,Clustered
> ssi tinyint --can have either 1 or 2
>
> IF EXISTS ( SELECT sri FROM a WITH (NOLOCK)
> INNER JOIN s WITH (NOLOCK)
> ON a.num = s.num
> WHERE a.p IS NULL AND s.ssi = 1 )
> PRINT 'YES'
> --uses index s
>
> IF EXISTS ( SELECT sri FROM a WITH (NOLOCK)
> INNER JOIN s WITH (NOLOCK)
> ON a.num = s.num
> WHERE a.p IS NULL AND s.ssi = 2 )
> PRINT 'YES'
> --uses index scan on num, takes around 8-25 seconds. Have any thoughts why
> it takes so long, how to make this use an index s
2 - few hundreds
1 - millions. Is that what causing the daly to find the records.
"Mark Williams" wrote:
> SELECT ssi, COUNT(*) FROM s
> GROUP BY ssi
> How many rows have an ssi value of 2, compared to an ssi value of 1? If
> there are a lot more rows that have ssi = 2 than 1, then the optimizer wil
l
> choose an index scan, because it would be the better method to find matchi
ng
> rows through the select statement.
>
> --
> "S" wrote:
>|||Could be...
You can update the statistics of table "a", preferably WITH FULL_SCAN,
and see if that makes any difference.
Gert-Jan
S wrote:
> Ratio is few hundreds to millions
> 2 - few hundreds
> 1 - millions. Is that what causing the daly to find the records.
> "Mark Williams" wrote:
>|||Try modifying your queries:
IF EXISTS ( SELECT 1 FROM a WITH (NOLOCK)
INNER JOIN s WITH (NOLOCK)
ON a.num = s.num
WHERE a.p IS NULL AND s.ssi = 1 )
PRINT 'YES'
uses index s
IF EXISTS ( SELECT 1 FROM a WITH (NOLOCK)
INNER JOIN s WITH (NOLOCK)
ON a.num = s.num
WHERE a.p IS NULL AND s.ssi = 2 )
PRINT 'YES'
Since your are using the query in the context of EXISTS, you can use SELECT
1 instead of SELECT sri (credit to Jim Underwood, as seen in a posting
earlier today). Specifying sri in the select list may be affecting which
indexes are chosen.
Other than that, run the statements below and post the text execution plans
here.
SET SHOWNPLAN_TEXT ON
GO
SELECT sri FROM a WITH (NOLOCK)
INNER JOIN s WITH (NOLOCK)
ON a.num = s.num
WHERE a.p IS NULL AND s.ssi = 1
SELECT sri FROM a WITH (NOLOCK)
INNER JOIN s WITH (NOLOCK)
ON a.num = s.num
WHERE a.p IS NULL AND s.ssi = 2 )
-
"S" wrote:
> Ratio is few hundreds to millions
> 2 - few hundreds
> 1 - millions. Is that what causing the daly to find the records.
>|||Table s has 8 million records but only a few hundred have ssi=2, remaining
records
have ssi= 1
|--Compute Scalar(DEFINE:([Expr1003]=If [Expr1004] then 1 else 0))
|--Nested Loops(Left Semi Join, DEFINE:([Expr1004] = [PROBE VALUE]))
|--Constant Scan
|--Filter(WHERE:([a].[p]=NULL))
|--Bookmark Lookup(BOOKMARK:([Bmk1000]),
OBJECT:([DB_NAME].[dbo].[a] ))
|--Nested Loops(Inner Join, OUTER
REFERENCES:([s].[num]))
|--Clustered Index
Scan(OBJECT:([DB_NAME].[dbo].[s].[PK_s]), WHERE:([s].[ssi]=2))
|--Index
S
ORDERED FORWARD)
for ssi= 2
|--Compute Scalar(DEFINE:([Expr1003]=If [Expr1004] then 1 else 0))
|--Nested Loops(Left Semi Join, DEFINE:([Expr1004] = [PROBE VALUE]))
|--Constant Scan
|--Nested Loops(Inner Join, OUTER REFERENCES:([a].[num]) WITH
PREFETCH)
|--Bookmark Lookup(BOOKMARK:([Bmk1000]),
OBJECT:([DB_NAME].[dbo].[a] ))
| |--Index S
SEEK:([a].[p]=NULL) ORDERED FORWARD)
|--Clustered Index
S
WHERE:([s].[ssi]=1) ORDERED FORWARD)
"Mark Williams" wrote:
> Try modifying your queries:
> IF EXISTS ( SELECT 1 FROM a WITH (NOLOCK)
> INNER JOIN s WITH (NOLOCK)
> ON a.num = s.num
> WHERE a.p IS NULL AND s.ssi = 1 )
> PRINT 'YES'
> uses index s
>
> IF EXISTS ( SELECT 1 FROM a WITH (NOLOCK)
> INNER JOIN s WITH (NOLOCK)
> ON a.num = s.num
> WHERE a.p IS NULL AND s.ssi = 2 )
> PRINT 'YES'
> Since your are using the query in the context of EXISTS, you can use SELEC
T
> 1 instead of SELECT sri (credit to Jim Underwood, as seen in a posting
> earlier today). Specifying sri in the select list may be affecting which
> indexes are chosen.
> Other than that, run the statements below and post the text execution plan
s
> here.
> SET SHOWNPLAN_TEXT ON
> GO
> SELECT sri FROM a WITH (NOLOCK)
> INNER JOIN s WITH (NOLOCK)
> ON a.num = s.num
> WHERE a.p IS NULL AND s.ssi = 1
> SELECT sri FROM a WITH (NOLOCK)
> INNER JOIN s WITH (NOLOCK)
> ON a.num = s.num
> WHERE a.p IS NULL AND s.ssi = 2 )
>
> -
>
> "S" wrote:
>
>|||Which order are the plans in? Is the ssi = 1 query first?
"S" wrote:
> Table s has 8 million records but only a few hundred have ssi=2, remaining
> records
> have ssi= 1
> |--Compute Scalar(DEFINE:([Expr1003]=If [Expr1004] then 1 else 0))
> |--Nested Loops(Left Semi Join, DEFINE:([Expr1004] = [PROBE VALUE])
)
> |--Constant Scan
> |--Filter(WHERE:([a].[p]=NULL))
> |--Bookmark Lookup(BOOKMARK:([Bmk1000]),
> OBJECT:([DB_NAME].[dbo].[a] ))
> |--Nested Loops(Inner Join, OUTER
> REFERENCES:([s].[num]))
> |--Clustered Index
> Scan(OBJECT:([DB_NAME].[dbo].[s].[PK_s]), WHERE:([s].[ssi]=2))
> |--Index
> S
> ORDERED FORWARD)
>
> for ssi= 2
> |--Compute Scalar(DEFINE:([Expr1003]=If [Expr1004] then 1 else 0))
> |--Nested Loops(Left Semi Join, DEFINE:([Expr1004] = [PROBE VALUE])
)
> |--Constant Scan
> |--Nested Loops(Inner Join, OUTER REFERENCES:([a].[num]) WITH
> PREFETCH)
> |--Bookmark Lookup(BOOKMARK:([Bmk1000]),
> OBJECT:([DB_NAME].[dbo].[a] ))
> | |--Index S
> SEEK:([a].[p]=NULL) ORDERED FORWARD)
> |--Clustered Index
> S
> WHERE:([s].[ssi]=1) ORDERED FORWARD)
>
> "Mark Williams" wrote:
>|||Yes.
"Mark Williams" wrote:
> Which order are the plans in? Is the ssi = 1 query first?
> --
> "S" wrote:
>|||I am sorry. I think I gave the incorrect order of execution plans.
ssi =1 uses index s
ssi=2 uses index s
"S" wrote:
> Yes.
> "Mark Williams" wrote:
>|||I tried replicating your data set on a much smaller scale. I created and
populated table s with about 100,000 rows with the same distribution of ssi
=
1 and ssi = 2 rows. Created and populated table a with an IDENTITY column fo
r
the PK, foreign key relationship to s on the num column, and nonclustered
indexex on p and num.
Ran both the queries and they had identical execution plans. Don't know why
you are getting different ones.
"S" wrote:
> I am sorry. I think I gave the incorrect order of execution plans.
> ssi =1 uses index s
> ssi=2 uses index s
> "S" wrote:
>