Monday, March 26, 2012

Performance question: Separate database or additional table?

We have a database that has about 50 tables, each with approximately 800K
rows. These tables are imports of data from another system that are updated
daily. All the tables have the same column, Id, for a primary key. The table
s
are accessed via views that serve mainly to assign meaningful names to the
columns and to insulate the applications from changes to the underlying
tables. In other words, the views are not restricting the users to a subset
of the columns in the tables, nor are they doing multi-table joins, etc. Thi
s
database is accessed by many users using many different applications across
the enterprise.
We have an application that queries a 12-view subset of the views, allowing
the user to do ad-hoc what-if queries with an application that generates SQL
queries from the user requests. The queries are often very time consuming. W
e
are attempting to improve the performance of this application. One thing we
know is that due to the nature of this application, 25% of the rows in the
views will never be selected and the application has to include SQL to
explicitly exlude this 25% of the rows.
One thought is to eliminate the scanning of the 25% of the rows that the
application never uses. There appear to be two ways of doing this:
1) Create another database containing only the 10-table/view subset and to
eliminate the unwanted rows at import/update time.
2) Create another table/view in the existing database with the Ids of the
75% of the rows that are used by this application and have the application d
o
a join on this table for all the queries and eliminate the SQL designed to
weed out the 25% of the rows that are not used by the application.
Whatever we choose to do, we have to live within the constraint of our
current hardware, so if we choose to create a separate database it will have
to share the current hardware with the other applications.
Given that the processing required to either load the alternate database or
maintain the 75% table is not an issue, which choice is likely to give us th
e
best increase in performance for the application?
Thanks,
BobHi
If you are currently seeing conflict between the two types of usage then it
is probably a good idea to create a second database for this to work on.
There will be a latency in the data unless depending on how you implement th
e
propogation of data.
I think you should look at the index usage on the database first, before you
implement any complex task to reduce the data.
John
"Bob" wrote:

> We have a database that has about 50 tables, each with approximately 800K
> rows. These tables are imports of data from another system that are update
d
> daily. All the tables have the same column, Id, for a primary key. The tab
les
> are accessed via views that serve mainly to assign meaningful names to the
> columns and to insulate the applications from changes to the underlying
> tables. In other words, the views are not restricting the users to a subse
t
> of the columns in the tables, nor are they doing multi-table joins, etc. T
his
> database is accessed by many users using many different applications acros
s
> the enterprise.
> We have an application that queries a 12-view subset of the views, allowin
g
> the user to do ad-hoc what-if queries with an application that generates S
QL
> queries from the user requests. The queries are often very time consuming.
We
> are attempting to improve the performance of this application. One thing w
e
> know is that due to the nature of this application, 25% of the rows in the
> views will never be selected and the application has to include SQL to
> explicitly exlude this 25% of the rows.
> One thought is to eliminate the scanning of the 25% of the rows that the
> application never uses. There appear to be two ways of doing this:
> 1) Create another database containing only the 10-table/view subset and to
> eliminate the unwanted rows at import/update time.
> 2) Create another table/view in the existing database with the Ids of the
> 75% of the rows that are used by this application and have the application
do
> a join on this table for all the queries and eliminate the SQL designed to
> weed out the 25% of the rows that are not used by the application.
> Whatever we choose to do, we have to live within the constraint of our
> current hardware, so if we choose to create a separate database it will ha
ve
> to share the current hardware with the other applications.
> Given that the processing required to either load the alternate database o
r
> maintain the 75% table is not an issue, which choice is likely to give us
the
> best increase in performance for the application?
> Thanks,
> Bob
>|||John,
"John Bell" wrote:

> Hi
> If you are currently seeing conflict between the two types of usage then
it
> is probably a good idea to create a second database for this to work on.
> There will be a latency in the data unless depending on how you implement
the
> propogation of data.
>
It's not really a conflict. The 75% part of the data represents currently
active items while the other 25% is effectively history. Items move from
current to history and new items are added, on a daily basis as the result o
f
batch processing on another system. This also eliminates data latency as an
issue.

> I think you should look at the index usage on the database first, before y
ou
> implement any complex task to reduce the data.
>
The column that determines whether an item is current or history is already
indexed. In addition, all the other columns used by the queries are also
indexed as appropriate.
Bob|||Assuming all other factors (indexing, joins, etc.) remain the same, I would
not expect removing 25% of the rows to impact the total runtime of the
queries that much. If 75% of the rows could be archived elsewhere, then that
would be significant. Rather than guessing, there are methods to know for
sure where the bottleneck is:
Investigate to what extent your indexes may be fragmented and defragment if
needed.
http://www.microsoft.com/technet/pr...n/ss2kidbp.mspx
Use the Show Execution Plan feature in Query Analyzer to see exactly how the
query optimizer is using your indexes.
http://support.microsoft.com/defaul...;243589&sd=tech
SQL Server hardware configuration and monitoring memory usage, IO
performance, etc:
/url]
[url]http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/sqlIObasics.mspx#EE
AA" target="_blank">http://www.microsoft.com/technet/pr...px#EE
AA
"Bob" <notrainsley@.worldsavings.com> wrote in message
news:B41801C0-B505-4A2C-9E3B-CF4D5E7C5C9A@.microsoft.com...
> We have a database that has about 50 tables, each with approximately 800K
> rows. These tables are imports of data from another system that are
> updated
> daily. All the tables have the same column, Id, for a primary key. The
> tables
> are accessed via views that serve mainly to assign meaningful names to the
> columns and to insulate the applications from changes to the underlying
> tables. In other words, the views are not restricting the users to a
> subset
> of the columns in the tables, nor are they doing multi-table joins, etc.
> This
> database is accessed by many users using many different applications
> across
> the enterprise.
> We have an application that queries a 12-view subset of the views,
> allowing
> the user to do ad-hoc what-if queries with an application that generates
> SQL
> queries from the user requests. The queries are often very time consuming.
> We
> are attempting to improve the performance of this application. One thing
> we
> know is that due to the nature of this application, 25% of the rows in the
> views will never be selected and the application has to include SQL to
> explicitly exlude this 25% of the rows.
> One thought is to eliminate the scanning of the 25% of the rows that the
> application never uses. There appear to be two ways of doing this:
> 1) Create another database containing only the 10-table/view subset and to
> eliminate the unwanted rows at import/update time.
> 2) Create another table/view in the existing database with the Ids of the
> 75% of the rows that are used by this application and have the application
> do
> a join on this table for all the queries and eliminate the SQL designed to
> weed out the 25% of the rows that are not used by the application.
> Whatever we choose to do, we have to live within the constraint of our
> current hardware, so if we choose to create a separate database it will
> have
> to share the current hardware with the other applications.
> Given that the processing required to either load the alternate database
> or
> maintain the 75% table is not an issue, which choice is likely to give us
> the
> best increase in performance for the application?
> Thanks,
> Bob
>|||Hi
JT gave you a link regarding the query plans. Just because your archive flag
is indexed does not necessarily mean it is used or if there is a better inde
x
configuration. With ad-hoc queries you may not get brilliant query plans all
of the time, but you may be able to produce an indexing strategy that gives
very good responses most of the time. It could be that with changed indexes
there is no need to separate the system. If the system is only updated by a
batch job, then you can use a strategy where you drop the indexes before
inserting the new data and re-creating the indexes after. If you don't drop
the indexes they should be rebuilt after the bulk load anyhow.
You may want to read some of the article on
http://www.sql-server-performance.c...performance.asp regarding how
to improve performance.
One thing that I don't think has been mentioned is that you may gain some
benefit from having indexed views.
John
"Bob" wrote:

> John,
> "John Bell" wrote:
>
> It's not really a conflict. The 75% part of the data represents currently
> active items while the other 25% is effectively history. Items move from
> current to history and new items are added, on a daily basis as the result
of
> batch processing on another system. This also eliminates data latency as a
n
> issue.
>
> The column that determines whether an item is current or history is alread
y
> indexed. In addition, all the other columns used by the queries are also
> indexed as appropriate.
> Bob
>|||JT and John,
Thanks for your replies. One of the problems I had was trying to actually
grab some of the generated queries to check out the execution plan in Query
Analyzer.
I got one of the DBAs to do a trace on the processing and we grabbed a few
of the generated queries. We discovered that there was a view someone had
written that wasn't part of the standard load and was doing massive amounts
of calculations.
I'm going to write a procedure that will use that view to create another
table loaded with the results of the calculations that will run as part of
the nightly load. The view will then simply be returning data instead of
calculating the values over and over. This will change a series of repetitiv
e
compute intensive calculations into a simple inner join on a primary key.
Thanks again for everyone's help.
Bob

No comments:

Post a Comment