Showing posts with label full-text. Show all posts
Showing posts with label full-text. Show all posts

Monday, March 26, 2012

performance question/multiple keys

Hi All,
The table I'm using has full-text columns and also a date column thats
indexed. If I do a query on a date range and the full-text column does SQL
Server return the full-text rows first, then subset by date, or subset by
date and pass that set of rows to MSSEARCH?
I'm wondering about performance issues on tables with many rows (3 millions
or so). I'm wondering if i should break the data up into tables by day so
that i'm not doing full-text searches if I know that I'll be getting a small
subset according to a date range.
Any insight on this issue?
thanks,
John
Rows are first returned from MSSearch and then trimmed.
Partitioning is a good idea. However, how large are your results sets? If
they are small (i.e. under 500 rows) this should not be a problem.
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"John Mott" <johnmott59@.hotmail.com> wrote in message
news:eBaJxILvFHA.3080@.tk2msftngp13.phx.gbl...
> Hi All,
> The table I'm using has full-text columns and also a date column thats
> indexed. If I do a query on a date range and the full-text column does SQL
> Server return the full-text rows first, then subset by date, or subset by
> date and pass that set of rows to MSSEARCH?
> I'm wondering about performance issues on tables with many rows (3
millions
> or so). I'm wondering if i should break the data up into tables by day so
> that i'm not doing full-text searches if I know that I'll be getting a
small
> subset according to a date range.
> Any insight on this issue?
> thanks,
> John
>
|||John,
First of all, it is always a good idea to get the SQL Server & OS platform
version info. Could you post the full output of SELECT @.@.version ?
Q. If I do a query on a date range and the full-text column does SQL Server
return the full-text rows first, then subset by date, or subset by date and
pass that set of rows to MSSEARCH?
A. SQL Server first queries the MSSearch service for all rows that match the
FTS query, then applies the WHERE clause filter after ALL results are
returned from the FT Catalog.
Yes, there can be performance issues with SQL Server 2000, but on which side
of the equation (FT Indexing &/or FT Search) - running a Full Population vs
running CONTAINS query are you concerned with? If the former, see the below
blog entry detailed resources.Also, review SQL Server 2000 BOL Title
"Full-text Search Recommendations". If the latter, you should review KB
article 240833 (Q240833) "FIX: Full-Text Search Performance Improved via
Support for TOP" and consider using the Top_N_by_Rank with either
CONTAINSTABLE or FREETEXTTABLE. If possible, partitioning the table into
smaller table can be helpful.
SQL Server 2000 Full-Text Search Resources and Links
http://spaces.msn.com/members/jtkane/Blog/cns!1pWDBCiDX1uvH5ATJmNCVLPQ!305.entry
Regards,
John
SQL Full Text Search Blog
http://spaces.msn.com/members/jtkane/
"John Mott" <johnmott59@.hotmail.com> wrote in message
news:eBaJxILvFHA.3080@.tk2msftngp13.phx.gbl...
> Hi All,
> The table I'm using has full-text columns and also a date column thats
> indexed. If I do a query on a date range and the full-text column does SQL
> Server return the full-text rows first, then subset by date, or subset by
> date and pass that set of rows to MSSEARCH?
> I'm wondering about performance issues on tables with many rows (3
> millions
> or so). I'm wondering if i should break the data up into tables by day so
> that i'm not doing full-text searches if I know that I'll be getting a
> small
> subset according to a date range.
> Any insight on this issue?
> thanks,
> John
>

Performance question

Hello,
Our main product catalog is approx. 3.1 million rows, with a full-text index
on 3 (varchar) columns. For the past year as our catalog has grown, we have
experienced continuing performance degredation, to the point that we are
looking at biting the bullet and migrating this application to Oracle Text,
which from our initial testing is several orders of magnitude faster (we're
going to stick with SQL Server for everything else). Obviously we'd like to
avoid that due to cost issues. However, this application is very important
and if we've reached the limit of SQL Server then so be it. Our problem is
that when a customer searches our catalog, we sort the search results based
on their sales rank which does not allow us to use the "top_n" parameter of
containstable or freetexttable. For example, say a customer searches our
catalog for a relatively common word that results in around 72,000 results
(takes approx 11 sec on subsequent runs...over 1 min on first run, which is
the most important statistic). If we were to use (say) n=2000 for the top_n
parameter, our best selling products would not be returned from the FTS
engine. We need to return all the results so we can sort them by sales rank
and display them to customers.
The server is running Windows 2003, with 6GB RAM, 16 x 15,000K RPM SCSI
drive in a RAID 10, in a dual opteron configuration the with transaction log
on a seperate RAID volume. SQL is SQL Server 2000, SP4 (I've included the
output of @.@.version below). Perfmon shows that the server isn't sweating at
all during these queries from a disk, memory, or CPU standpoint, so that
leaves SQL as the performance bottleneck. Our most recent population was
around a month ago, so the catalog is relatively up to date.
We've fooled around with increasing the memory available to FTS, but that
did not seem to make a difference. Perhaps we did not do it right since
mssearch.exe is still only showing about 49,000K in memory--but since we
are using AWE this could be distorted.
We're going to make one last gasp at improving the performance here before
dumping SQL Server and moving to Oracle. Help!
John
@.@.Version:
Microsoft SQL Server 2000 - 8.00.760 (Intel X86) Dec 17 2002 14:22:05
Copyright (c) 1988-2003 Microsoft Corporation Enterprise Edition on Windows
NT 5.2 (Build 3790: )
Sample query:
SELECT distinct <field list>
FROM containstable(<ft-table>, <ft-field>,'"<common term>"') as ct
JOIN <ft-table> t with (nolock) on t.<PK>=ct.[key] //tables are 100%
readonly except during monthly updates, hence the nolocks
join salesRank sr with (nolock) on sr.<PK>=ct.[key]
order by sr.SalesRank
The approach you take for problems like this is to partition your tables,
perhaps in your case by sales rank. For instance you might want to break
your tables into 10 sub tables. One from 1-10, another from 11-20, etc.
Then limit each results set to 100 and union the results. This might end up
more expensive than what you are currently experiencing.
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
"John" <john36356@.community.nospam> wrote in message
news:ebYiMcaWFHA.2572@.TK2MSFTNGP14.phx.gbl...
> Hello,
> Our main product catalog is approx. 3.1 million rows, with a full-text
index
> on 3 (varchar) columns. For the past year as our catalog has grown, we
have
> experienced continuing performance degredation, to the point that we are
> looking at biting the bullet and migrating this application to Oracle
Text,
> which from our initial testing is several orders of magnitude faster
(we're
> going to stick with SQL Server for everything else). Obviously we'd like
to
> avoid that due to cost issues. However, this application is very
important
> and if we've reached the limit of SQL Server then so be it. Our problem
is
> that when a customer searches our catalog, we sort the search results
based
> on their sales rank which does not allow us to use the "top_n" parameter
of
> containstable or freetexttable. For example, say a customer searches our
> catalog for a relatively common word that results in around 72,000 results
> (takes approx 11 sec on subsequent runs...over 1 min on first run, which
is
> the most important statistic). If we were to use (say) n=2000 for the
top_n
> parameter, our best selling products would not be returned from the FTS
> engine. We need to return all the results so we can sort them by sales
rank
> and display them to customers.
> The server is running Windows 2003, with 6GB RAM, 16 x 15,000K RPM SCSI
> drive in a RAID 10, in a dual opteron configuration the with transaction
log
> on a seperate RAID volume. SQL is SQL Server 2000, SP4 (I've included the
> output of @.@.version below). Perfmon shows that the server isn't sweating
at
> all during these queries from a disk, memory, or CPU standpoint, so that
> leaves SQL as the performance bottleneck. Our most recent population was
> around a month ago, so the catalog is relatively up to date.
> We've fooled around with increasing the memory available to FTS, but that
> did not seem to make a difference. Perhaps we did not do it right since
> mssearch.exe is still only showing about 49,000K in memory--but since we
> are using AWE this could be distorted.
> We're going to make one last gasp at improving the performance here before
> dumping SQL Server and moving to Oracle. Help!
> John
> @.@.Version:
> Microsoft SQL Server 2000 - 8.00.760 (Intel X86) Dec 17 2002 14:22:05
> Copyright (c) 1988-2003 Microsoft Corporation Enterprise Edition on
Windows
> NT 5.2 (Build 3790: )
> Sample query:
> SELECT distinct <field list>
> FROM containstable(<ft-table>, <ft-field>,'"<common term>"') as ct
> JOIN <ft-table> t with (nolock) on t.<PK>=ct.[key] //tables are 100%
> readonly except during monthly updates, hence the nolocks
> join salesRank sr with (nolock) on sr.<PK>=ct.[key]
> order by sr.SalesRank
>