Wednesday, March 7, 2012

Performance of a billion-row table

Hello All!
I am building an application in VB.Net along with an underlying db in
SQL 2000. I am not too experienced with SQL performance issues and am
looking for some advice. I am about done designing the database (about
150 tables) and am looking at one table--the mother of all tables--that
will hold approximately 500 million rows more than any other table in my
db.
ISSUE:
I am wondering whether I need to break this table down into several
smaller tables or look into using a partitioned table. A lot of VB code
and SQL procedure code depends on how the data intended for this table
is ultimately stored.
BACKGROUND:
The table has four columns and is indexed (clustered) on all four
columns. (Can't reduce the # of indicies since this table is on the
"many" side of a one-to-many relationship with two other tables.) All
four columns hold numeric datatypes.
In the first year the related application is rolled out, this "main"
table will quickly grow to about 500 million rows. Over the next 9
years, the table will likely grow to about 1 billion rows.
Users will have on their laptops a "filtered" local/subscription copy of
the db which will replicate with the main publisher db (using an
anonymous pull subscription approach for replication). This "main"
table will hold about 200,000 to 500,000 rows of data in the "filtered"
subscription db.
Also, assume users' laptops have @.1.8ghz processors 512MB RAM. I don't
know about the server hardware specs other than this firm spends a lot
of cash on maintaining/building their computer infrastructure.
FOLLOW-UP QUESTIONS
Based on this structure, will the execution of SELECT, UPDATE, INSERT
and DELETE queries against the "main" table be noticebly slower (3
seconds or more) than the execution of *similar* queries against other
tables that hold only 20% to 30% as many rows?
When users are working directly off the publisher db on the server, will
queries against the "main" table perform just as fast as (if not faster
than) when they perform against the filtered "main" table in the
subscription db?
Thanks in advance for any advice!
Charles
*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!Charles,
That depends on how many rows you are going to return and what your WHERE
caluse will look like. By the way I don't get your comment:
> The table has four columns and is indexed (clustered) on all four
> columns. (Can't reduce the # of indicies since this table is on the
> "many" side of a one-to-many relationship with two other tables.) All
You can only have 1 clustered index. Are you saying you have the clustered
index as a compound index of all 4 columns and then you have an individual
non-clustered index on each column? What is the PK?
Can you post hte DDL for this table?
--
Andrew J. Kelly
SQL Server MVP
"Charles Wolfersberger" <cwolfersberger@.prodigy.net> wrote in message
news:u%23SMldlRDHA.2332@.TK2MSFTNGP10.phx.gbl...
> Hello All!
> I am building an application in VB.Net along with an underlying db in
> SQL 2000. I am not too experienced with SQL performance issues and am
> looking for some advice. I am about done designing the database (about
> 150 tables) and am looking at one table--the mother of all tables--that
> will hold approximately 500 million rows more than any other table in my
> db.
> ISSUE:
> I am wondering whether I need to break this table down into several
> smaller tables or look into using a partitioned table. A lot of VB code
> and SQL procedure code depends on how the data intended for this table
> is ultimately stored.
>
> BACKGROUND:
> The table has four columns and is indexed (clustered) on all four
> columns. (Can't reduce the # of indicies since this table is on the
> "many" side of a one-to-many relationship with two other tables.) All
> four columns hold numeric datatypes.
> In the first year the related application is rolled out, this "main"
> table will quickly grow to about 500 million rows. Over the next 9
> years, the table will likely grow to about 1 billion rows.
> Users will have on their laptops a "filtered" local/subscription copy of
> the db which will replicate with the main publisher db (using an
> anonymous pull subscription approach for replication). This "main"
> table will hold about 200,000 to 500,000 rows of data in the "filtered"
> subscription db.
> Also, assume users' laptops have @.1.8ghz processors 512MB RAM. I don't
> know about the server hardware specs other than this firm spends a lot
> of cash on maintaining/building their computer infrastructure.
> FOLLOW-UP QUESTIONS
> Based on this structure, will the execution of SELECT, UPDATE, INSERT
> and DELETE queries against the "main" table be noticebly slower (3
> seconds or more) than the execution of *similar* queries against other
> tables that hold only 20% to 30% as many rows?
> When users are working directly off the publisher db on the server, will
> queries against the "main" table perform just as fast as (if not faster
> than) when they perform against the filtered "main" table in the
> subscription db?
> Thanks in advance for any advice!
> Charles
> *** Sent via Developersdex http://www.developersdex.com ***
> Don't just participate in USENET...get rewarded for it!|||lets assume that your table has four numeric types of 9
bytes each for a total of 36 bytes per row
the overhead per row is 10-12 bytes, but the net row size
for 4 x 9 byte columns with a primary key clustered is ~45
bytes per row
this means 179 rows fit per page (99% fill)
then your index depth is:
depth rows
1 179
2 32,041
3 5,735,339
4 1,026,625,681
so your 500M-1B row table will have an index depth of 4,
you could increase the row capacity per index depth if you
did not make all 4 columns part of the primary key, but i
wouldn't worry about that right now
you are looking at ~22GB for the table, plus ~20GB per
index
I don't see the point of partitioning for the purpose of
reducing query cost. The size is not excessively large,
but you may consider using file groups
i would recommend creating the indexes with sort in
tempdb, so you don't burn space in the main db while
creating the indexes, meaning you will be more likely to
get contiguous space for each index
My inclination is to speculate that there is not a
substantial difference in the cost for a single row index
seek from a table with index depth 4 versus 2,
When selecting more than 1 row, cost of the subsequent
rows are the same regardless of the index depth
you should notice that insert/update/deletes will be take
~2X longer with 3-4 indexes compared with just the primary
key.
if know the number of rows queried, what types of joins
are involved, and platform, i can probably give you
estimates of query times
>--Original Message--
>Hello All!
>I am building an application in VB.Net along with an
underlying db in
>SQL 2000. I am not too experienced with SQL performance
issues and am
>looking for some advice. I am about done designing the
database (about
>150 tables) and am looking at one table--the mother of
all tables--that
>will hold approximately 500 million rows more than any
other table in my
>db.
>ISSUE:
>I am wondering whether I need to break this table down
into several
>smaller tables or look into using a partitioned table. A
lot of VB code
>and SQL procedure code depends on how the data intended
for this table
>is ultimately stored.
>
>BACKGROUND:
>The table has four columns and is indexed (clustered) on
all four
>columns. (Can't reduce the # of indicies since this
table is on the
>"many" side of a one-to-many relationship with two other
tables.) All
>four columns hold numeric datatypes.
>In the first year the related application is rolled out,
this "main"
>table will quickly grow to about 500 million rows. Over
the next 9
>years, the table will likely grow to about 1 billion rows.
>Users will have on their laptops a "filtered"
local/subscription copy of
>the db which will replicate with the main publisher db
(using an
>anonymous pull subscription approach for replication).
This "main"
>table will hold about 200,000 to 500,000 rows of data in
the "filtered"
>subscription db.
>Also, assume users' laptops have @.1.8ghz processors 512MB
RAM. I don't
>know about the server hardware specs other than this firm
spends a lot
>of cash on maintaining/building their computer
infrastructure.
>FOLLOW-UP QUESTIONS
>Based on this structure, will the execution of SELECT,
UPDATE, INSERT
>and DELETE queries against the "main" table be noticebly
slower (3
>seconds or more) than the execution of *similar* queries
against other
>tables that hold only 20% to 30% as many rows?
>When users are working directly off the publisher db on
the server, will
>queries against the "main" table perform just as fast as
(if not faster
>than) when they perform against the filtered "main" table
in the
>subscription db?
>Thanks in advance for any advice!
>Charles
>*** Sent via Developersdex http://www.developersdex.com
***
>Don't just participate in USENET...get rewarded for it!
>.
>

No comments:

Post a Comment