Tuesday, March 20, 2012

Performance problem with Partition tables in SQL Server 2005

I create partition table and regular table, both with the exact structure and
data.
The aim was to check whether there is an improvement in the performance or
not.
I execute the following queries (myPartTable is the partition table):
select * from myTable where instance_id = 3
select * from myPartTable where instance_id = 3
select * from myPartTable where instance_id = 3 and $partition.IntRangePF(3)
= 3
The result:
- Significant difference in the performance of the second and the third query.
- No major difference between the first and second query. In most of the
cases, the first even show better performance.
My question:
1) Why there is a significant improvement between the second and the third
query?
Isnâ't SQL Server supposed to recognize and access the partition without
specify its number?
2) Shouldn't I get some improvement in the performance when using partition
table?
---
Attached a sample code to create the test.
Create Partition Function:
CREATE PARTITION FUNCTION IntRangePF(int)
AS RANGE left FOR VALUES (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
Create Partition Scheme:
CREATE PARTITION SCHEME IntRangePScheme AS PARTITION IntRangePF
ALL TO ([PRIMARY])
Create the tables:
CREATE TABLE myPartTable (
id int NOT NULL,
instance_id int NOT NULL,
description varchar(32) NOT NULL,
temp1 float NULL,
temp2 float NULL,
temp3 float NULL,
constraint table1_P_PK primary key nonclustered(instance_id, id) on
[PRIMARY] ,
) ON IntRangePScheme (instance_id)
CREATE CLUSTERED INDEX index_myPartTable ON myPartTable(
[instance_id] ASC,
[id] ASC )
CREATE TABLE myTable (
id int NOT NULL,
instance_id int NOT NULL,
description varchar(32) NOT NULL,
temp1 float NULL,
temp2 float NULL,
temp3 float NULL,
constraint myTable_PK primary key nonclustered(instance_id, id) on
[PRIMARY] ,
)
CREATE CLUSTERED INDEX index_myTable ON myTable(
[instance_id] ASC,
[id] ASC )
Insert Data
DECLARE @.i int
SET @.i = 1;
WHILE @.i < 100
BEGIN
SET @.i = @.i + 1;
insert into myTable values (@.i, 1, 'ddddd', 1, 1, 1);
insert into myTable values (@.i, 2, 'ddddd', 1, 1, 1);
insert into myTable values (@.i, 3, 'ddddd', 1, 1, 1);
insert into myTable values (@.i, 4, 'ddddd', 1, 1, 1);
insert into myTable values (@.i, 5, 'ddddd', 1, 1, 1);
insert into myTable values (@.i, 6, 'ddddd', 1, 1, 1);
insert into myTable values (@.i, 7, 'ddddd', 1, 1, 1);
insert into myTable values (@.i, 8, 'ddddd', 1, 1, 1);
insert into myTable values (@.i, 9, 'ddddd', 1, 1, 1);
insert into myTable values (@.i, 10, 'ddddd', 1, 1, 1);
END
insert into myPartTable select * from myTable"?" <@.discussions.microsoft.com> wrote in message
news:B2FD061B-378D-4E4D-BC2A-04D5E3B9371A@.microsoft.com...
>I create partition table and regular table, both with the exact structure
>and
> data.
> The aim was to check whether there is an improvement in the performance or
> not.
> I execute the following queries (myPartTable is the partition table):
> select * from myTable where instance_id = 3
> select * from myPartTable where instance_id = 3
> select * from myPartTable where instance_id = 3 and
> $partition.IntRangePF(3)
> = 3
> The result:
> - Significant difference in the performance of the second and the third
> query.
> - No major difference between the first and second query. In most of the
> cases, the first even show better performance.
> My question:
> 1) Why there is a significant improvement between the second and the third
> query?
> Isn't SQL Server supposed to recognize and access the partition without
> specify its number?
> 2) Shouldn't I get some improvement in the performance when using
> partition
> table?
> ---
> Attached a sample code to create the test.
> Create Partition Function:
> CREATE PARTITION FUNCTION IntRangePF(int)
> AS RANGE left FOR VALUES (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
> Create Partition Scheme:
> CREATE PARTITION SCHEME IntRangePScheme AS PARTITION IntRangePF
> ALL TO ([PRIMARY])
> Create the tables:
> CREATE TABLE myPartTable (
> id int NOT NULL,
> instance_id int NOT NULL,
> description varchar(32) NOT NULL,
> temp1 float NULL,
> temp2 float NULL,
> temp3 float NULL,
> constraint table1_P_PK primary key nonclustered(instance_id, id) on
> [PRIMARY] ,
> ) ON IntRangePScheme (instance_id)
> CREATE CLUSTERED INDEX index_myPartTable ON myPartTable(
> [instance_id] ASC,
> [id] ASC )
>
When you create a clustered index on a table you move the table into the
filegroup or partition scheme for the clustered index. So when you created
a clustered index index_myPartTable on the primary filegroup instead of on
your partition scheme you "unpartitioned" the table.
Change like this:
CREATE TABLE myPartTable (
id int NOT NULL,
instance_id int NOT NULL,
description varchar(32) NOT NULL,
temp1 float NULL,
temp2 float NULL,
temp3 float NULL,
constraint table1_P_PK primary key nonclustered(instance_id, id) on
[PRIMARY] ,
) ON IntRangePScheme (instance_id)
CREATE CLUSTERED INDEX index_myPartTable ON myPartTable(
[instance_id] ASC,
[id] ASC ) ON IntRangePScheme (instance_id)
Or, all in one statement
CREATE TABLE myPartTable (
id int NOT NULL,
instance_id int NOT NULL,
description varchar(32) NOT NULL,
temp1 float NULL,
temp2 float NULL,
temp3 float NULL,
constraint table1_P_PK
primary key clustered(instance_id, id)
ON IntRangePScheme (instance_id) ,
)
Then you will see that the partition elimination reduces the IO's required
for your queries.
David|||I change the clustered index (the other one automatically created on the
partition scheme), but the performance are still the same.
When running the 3 query together, I am getting 43% query cost (in the
execution plan panel) for the first query, 42% for the second and 15% for the
last.
The main question is why do I get difference between the 2nd and 3rd query.
"David Browne" wrote:
> "?" <@.discussions.microsoft.com> wrote in message
> news:B2FD061B-378D-4E4D-BC2A-04D5E3B9371A@.microsoft.com...
> >I create partition table and regular table, both with the exact structure
> >and
> > data.
> > The aim was to check whether there is an improvement in the performance or
> > not.
> >
> > I execute the following queries (myPartTable is the partition table):
> > select * from myTable where instance_id = 3
> > select * from myPartTable where instance_id = 3
> > select * from myPartTable where instance_id = 3 and
> > $partition.IntRangePF(3)
> > = 3
> >
> > The result:
> > - Significant difference in the performance of the second and the third
> > query.
> > - No major difference between the first and second query. In most of the
> > cases, the first even show better performance.
> >
> > My question:
> > 1) Why there is a significant improvement between the second and the third
> > query?
> > Isn't SQL Server supposed to recognize and access the partition without
> > specify its number?
> > 2) Shouldn't I get some improvement in the performance when using
> > partition
> > table?
> >
> > ---
> > Attached a sample code to create the test.
> >
> > Create Partition Function:
> > CREATE PARTITION FUNCTION IntRangePF(int)
> > AS RANGE left FOR VALUES (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
> >
> > Create Partition Scheme:
> > CREATE PARTITION SCHEME IntRangePScheme AS PARTITION IntRangePF
> > ALL TO ([PRIMARY])
> >
> > Create the tables:
> > CREATE TABLE myPartTable (
> > id int NOT NULL,
> > instance_id int NOT NULL,
> > description varchar(32) NOT NULL,
> > temp1 float NULL,
> > temp2 float NULL,
> > temp3 float NULL,
> > constraint table1_P_PK primary key nonclustered(instance_id, id) on
> > [PRIMARY] ,
> > ) ON IntRangePScheme (instance_id)
> >
> > CREATE CLUSTERED INDEX index_myPartTable ON myPartTable(
> > [instance_id] ASC,
> > [id] ASC )
> >
> When you create a clustered index on a table you move the table into the
> filegroup or partition scheme for the clustered index. So when you created
> a clustered index index_myPartTable on the primary filegroup instead of on
> your partition scheme you "unpartitioned" the table.
>
> Change like this:
> CREATE TABLE myPartTable (
> id int NOT NULL,
> instance_id int NOT NULL,
> description varchar(32) NOT NULL,
> temp1 float NULL,
> temp2 float NULL,
> temp3 float NULL,
> constraint table1_P_PK primary key nonclustered(instance_id, id) on
> [PRIMARY] ,
> ) ON IntRangePScheme (instance_id)
> CREATE CLUSTERED INDEX index_myPartTable ON myPartTable(
> [instance_id] ASC,
> [id] ASC ) ON IntRangePScheme (instance_id)
>
> Or, all in one statement
> CREATE TABLE myPartTable (
> id int NOT NULL,
> instance_id int NOT NULL,
> description varchar(32) NOT NULL,
> temp1 float NULL,
> temp2 float NULL,
> temp3 float NULL,
> constraint table1_P_PK
> primary key clustered(instance_id, id)
> ON IntRangePScheme (instance_id) ,
> )
>
> Then you will see that the partition elimination reduces the IO's required
> for your queries.
> David
>
>|||does you % are real execution or estimated plan?
when you explicitly reference a partition SQL know which partition to read
directly.
when you go through the entire table, SQL doesn't know which partition is
really used until you run it.
but you are right, sometimes I found a query like your slow. SQL scan all
the partitions instead-of scanning only the required partitions.
This completly slow down the query!
"?" <@.discussions.microsoft.com> wrote in message
news:E515D9C3-B0A6-4E37-BCB6-9128EC4D9BCC@.microsoft.com...
>I change the clustered index (the other one automatically created on the
> partition scheme), but the performance are still the same.
> When running the 3 query together, I am getting 43% query cost (in the
> execution plan panel) for the first query, 42% for the second and 15% for
> the
> last.
> The main question is why do I get difference between the 2nd and 3rd
> query.
> "David Browne" wrote:
>> "?" <@.discussions.microsoft.com> wrote in message
>> news:B2FD061B-378D-4E4D-BC2A-04D5E3B9371A@.microsoft.com...
>> >I create partition table and regular table, both with the exact
>> >structure
>> >and
>> > data.
>> > The aim was to check whether there is an improvement in the performance
>> > or
>> > not.
>> >
>> > I execute the following queries (myPartTable is the partition table):
>> > select * from myTable where instance_id = 3
>> > select * from myPartTable where instance_id = 3
>> > select * from myPartTable where instance_id = 3 and
>> > $partition.IntRangePF(3)
>> > = 3
>> >
>> > The result:
>> > - Significant difference in the performance of the second and the third
>> > query.
>> > - No major difference between the first and second query. In most of
>> > the
>> > cases, the first even show better performance.
>> >
>> > My question:
>> > 1) Why there is a significant improvement between the second and the
>> > third
>> > query?
>> > Isn't SQL Server supposed to recognize and access the partition without
>> > specify its number?
>> > 2) Shouldn't I get some improvement in the performance when using
>> > partition
>> > table?
>> >
>> > ---
>> > Attached a sample code to create the test.
>> >
>> > Create Partition Function:
>> > CREATE PARTITION FUNCTION IntRangePF(int)
>> > AS RANGE left FOR VALUES (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
>> >
>> > Create Partition Scheme:
>> > CREATE PARTITION SCHEME IntRangePScheme AS PARTITION IntRangePF
>> > ALL TO ([PRIMARY])
>> >
>> > Create the tables:
>> > CREATE TABLE myPartTable (
>> > id int NOT NULL,
>> > instance_id int NOT NULL,
>> > description varchar(32) NOT NULL,
>> > temp1 float NULL,
>> > temp2 float NULL,
>> > temp3 float NULL,
>> > constraint table1_P_PK primary key nonclustered(instance_id, id) on
>> > [PRIMARY] ,
>> > ) ON IntRangePScheme (instance_id)
>> >
>> > CREATE CLUSTERED INDEX index_myPartTable ON myPartTable(
>> > [instance_id] ASC,
>> > [id] ASC )
>> >
>> When you create a clustered index on a table you move the table into the
>> filegroup or partition scheme for the clustered index. So when you
>> created
>> a clustered index index_myPartTable on the primary filegroup instead of
>> on
>> your partition scheme you "unpartitioned" the table.
>>
>> Change like this:
>> CREATE TABLE myPartTable (
>> id int NOT NULL,
>> instance_id int NOT NULL,
>> description varchar(32) NOT NULL,
>> temp1 float NULL,
>> temp2 float NULL,
>> temp3 float NULL,
>> constraint table1_P_PK primary key nonclustered(instance_id, id) on
>> [PRIMARY] ,
>> ) ON IntRangePScheme (instance_id)
>> CREATE CLUSTERED INDEX index_myPartTable ON myPartTable(
>> [instance_id] ASC,
>> [id] ASC ) ON IntRangePScheme (instance_id)
>>
>> Or, all in one statement
>> CREATE TABLE myPartTable (
>> id int NOT NULL,
>> instance_id int NOT NULL,
>> description varchar(32) NOT NULL,
>> temp1 float NULL,
>> temp2 float NULL,
>> temp3 float NULL,
>> constraint table1_P_PK
>> primary key clustered(instance_id, id)
>> ON IntRangePScheme (instance_id) ,
>> )
>>
>> Then you will see that the partition elimination reduces the IO's
>> required
>> for your queries.
>> David
>>|||"?" <@.discussions.microsoft.com> wrote in message
news:E515D9C3-B0A6-4E37-BCB6-9128EC4D9BCC@.microsoft.com...
>I change the clustered index (the other one automatically created on the
> partition scheme), but the performance are still the same.
> When running the 3 query together, I am getting 43% query cost (in the
> execution plan panel) for the first query, 42% for the second and 15% for
> the
> last.
> The main question is why do I get difference between the 2nd and 3rd
> query.
>
The second and third both hit a single partition for me. Look at the actual
execution plan. Hover the index scan and you should see a PartitionID
or save the plan and view it with a text editor. You should see the
PartitionId uder the IndexScan, something like
<IndexScan Ordered="true" ScanDirection="FORWARD"
ForcedIndex="false" NoExpandHint="false">
<DefinedValues> . . . </DefinedValues>
<Object Database="[Demo]" Schema="[dbo]"
Table="[myPartTable]" Index="[table1_P_PK]" />
<SeekPredicates>
<SeekPredicate>
<Prefix ScanType="EQ">
<RangeColumns>
<ColumnReference Database="[Demo]" Schema="[dbo]"
Table="[myPartTable]" Column="instance_id" />
</RangeColumns>
<RangeExpressions>
<ScalarOperator ScalarString="(3)">
<Const ConstValue="(3)" />
</ScalarOperator>
</RangeExpressions>
</Prefix>
</SeekPredicate>
</SeekPredicates>
<PartitionId>
<ColumnReference Column="ConstExpr1007">
<ScalarOperator ScalarString="(3)">
<Const ConstValue="(3)" />
</ScalarOperator>
</ColumnReference>
</PartitionId>
</IndexScan>
David|||I can see that it scan the partition, but i'm not getting the same performance.
"David Browne" wrote:
> "?" <@.discussions.microsoft.com> wrote in message
> news:E515D9C3-B0A6-4E37-BCB6-9128EC4D9BCC@.microsoft.com...
> >I change the clustered index (the other one automatically created on the
> > partition scheme), but the performance are still the same.
> >
> > When running the 3 query together, I am getting 43% query cost (in the
> > execution plan panel) for the first query, 42% for the second and 15% for
> > the
> > last.
> >
> > The main question is why do I get difference between the 2nd and 3rd
> > query.
> >
> The second and third both hit a single partition for me. Look at the actual
> execution plan. Hover the index scan and you should see a PartitionID
> or save the plan and view it with a text editor. You should see the
> PartitionId uder the IndexScan, something like
> <IndexScan Ordered="true" ScanDirection="FORWARD"
> ForcedIndex="false" NoExpandHint="false">
> <DefinedValues> . . . </DefinedValues>
> <Object Database="[Demo]" Schema="[dbo]"
> Table="[myPartTable]" Index="[table1_P_PK]" />
> <SeekPredicates>
> <SeekPredicate>
> <Prefix ScanType="EQ">
> <RangeColumns>
> <ColumnReference Database="[Demo]" Schema="[dbo]"
> Table="[myPartTable]" Column="instance_id" />
> </RangeColumns>
> <RangeExpressions>
> <ScalarOperator ScalarString="(3)">
> <Const ConstValue="(3)" />
> </ScalarOperator>
> </RangeExpressions>
> </Prefix>
> </SeekPredicate>
> </SeekPredicates>
> <PartitionId>
> <ColumnReference Column="ConstExpr1007">
> <ScalarOperator ScalarString="(3)">
> <Const ConstValue="(3)" />
> </ScalarOperator>
> </ColumnReference>
> </PartitionId>
> </IndexScan>
> David
>
>|||"?" <@.discussions.microsoft.com> wrote in message
news:987784C1-C165-40A9-B76E-B7C1F4E99F07@.microsoft.com...
>I can see that it scan the partition, but i'm not getting the same
>performance.
>
What performance difference are you expecting? And how are you measuring it?
Partitioning is fundamentally a performance feature, but it's designed for
tables a lot of rows. Partitioning makes it practical to have have tables
with billions of rows. Small databases you won't always see a large
difference with partitioning.
David|||When I run the queries, I include Actual Execution plan, and then I examine
the performance in the Execution Plan tab.
Iâ'm expecting to get better performance when accessing the partition table
as oppose to the regular table.
I run the same test on tables with 1,000,000 records per instance, and the
results are the same â' the performance are the usually the same.
The biggest question is why I get difference between these 2 queries (when
accessing the partition table):
(Query 1) select * from myPartTable where instance_id = 3
(Query 2) select * from myPartTable where instance_id = 3 and
$partition.IntRangePF(3) = 3
When running these queries on partition table with 7251452 records, I get
the following performance:
Query 1: Query Cost = 76%, Subtree Cost = 4.26687
Plan:
Clustered Index Seek, Estimate I/O Cost = 3.46905
Query 2: Query Cost = 24%, Subtree Cost = 1.34275
Plan:
Clustered Index Seek, Estimate I/O Cost = 1.04239; Subtree Cost = 1.28184
Filter, Subtree Cost = 1.34275
*** In both cases the plan show access to partition 3.
Why adding the statement â'$partition.IntRangePF(3) = 3â' had such an effect
on the performance, while this statement is not suppose to effect filter in
the index. I will get the same performance if I will add
â'$partition.IntRangePF(1) = 1â' on the same query.
Another improvement I get with the following query:
(Query 3)select * from myPartTable where instance_id = 3 and
$partition.IntRangePF(instance_id) = $partition.IntRangePF(3)
Running this query with the other show the following cost:
Query 1: Query Cost = 70%, Subtree Cost = 4.26687
Plan:
Clustered Index Seek, Estimate I/O Cost = 3.46905
Query 2: Query Cost = 22%, Subtree Cost = 1.34275
Plan:
Clustered Index Seek, Estimate I/O Cost = 1.04239; Subtree Cost = 1.28184
Filter (Cost 5%), Subtree Cost = 1.34275
Query 3: Query Cost = 8%, Subtree Cost = 0.501189
Plan:
Clustered Index Seek, Estimate I/O Cost = 0.317536; Subtree Cost = 0.390208
Filter (Cost 22%), Subtree Cost = 0.500957
"David Browne" wrote:
> "?" <@.discussions.microsoft.com> wrote in message
> news:987784C1-C165-40A9-B76E-B7C1F4E99F07@.microsoft.com...
> >I can see that it scan the partition, but i'm not getting the same
> >performance.
> >
>
> What performance difference are you expecting? And how are you measuring it?
> Partitioning is fundamentally a performance feature, but it's designed for
> tables a lot of rows. Partitioning makes it practical to have have tables
> with billions of rows. Small databases you won't always see a large
> difference with partitioning.
> David
>
>|||"?" <@.discussions.microsoft.com> wrote in message
news:34C036B2-A6C7-48C4-BF41-C9BA15C94E82@.microsoft.com...
> When I run the queries, I include Actual Execution plan, and then I
> examine
> the performance in the Execution Plan tab.
> I'm expecting to get better performance when accessing the partition table
> as oppose to the regular table.
> I run the same test on tables with 1,000,000 records per instance, and the
> results are the same - the performance are the usually the same.
> The biggest question is why I get difference between these 2 queries (when
> accessing the partition table):
> (Query 1) select * from myPartTable where instance_id = 3
> (Query 2) select * from myPartTable where instance_id = 3 and
> $partition.IntRangePF(3) = 3
> When running these queries on partition table with 7251452 records, I get
> the following performance:
> Query 1: Query Cost = 76%, Subtree Cost = 4.26687
> Plan:
> Clustered Index Seek, Estimate I/O Cost = 3.46905
> Query 2: Query Cost = 24%, Subtree Cost = 1.34275
> Plan:
> Clustered Index Seek, Estimate I/O Cost = 1.04239; Subtree Cost => 1.28184
> Filter, Subtree Cost = 1.34275
>
Ok I see what you mean.
The estimated cost of the second query is incorrect, leading to an apparent
performance difference. The difference is not real. It is only in the
estimated cost of the queries. In particular the "Estimated Number of Rows"
for the Clustered Index Seek operator in the second query is not very
accurate. The actual cost of the queries is almost identical.
Run this to see that the actual cost of the queries is similar:
set statistics IO on
set statistics time on
select * from myPartTable where instance_id = 3
select * from myPartTable where instance_id = 3 and $partition.IntRangePF(3)
= 3
David

No comments:

Post a Comment