performance monitoring: contains

Monday, March 26, 2012

Performance Question

Hi All
I have a database of students which contains a table StoredCvs. This
contains all of their CVs (Resume's) and has a full text index stored in a
"StoredFilesCatalog". It is unlikely that this table will grow to more than
1 million rows although it is possible.
I now want to start storing files (for searching) for other types of records
in our system i.e. company files, project files, other candidate files etc.
etc. I am now faced with a few choices and wandered which would be the best
one. I have decided not to create another catalog at this stage as I feel
this could be done later if performance is really bad. However, I was
wandering what the difference would be between: a) Creating another table
(StoredFiles), which uses the "StoredFilesCatalog", but would have a
seperate fulltextindex. b) Add a FileTypeID column to my StoredCVs table
(would rename this to StoredFiles) and store all files in the one table
(with one fulltextindex).
The main function of the system is to search Candidate Cvs (Resume's) so I
was wandering if there is any performance advantage of having 2 tables or if
the searches would be the same as they both use the same
"StoredFilesCatalog" anyway.
Hope this makes sense and really appreciate any advice.
Cheers
Joe
You will get better performance with two catalogs - one for each table. You
could add a separate column for FileType but this could be problematic if
you are using top_n_by_Rank
For instance suppose you do a query like this
Select * from StoredCvs join containstable(StoredCvs,*,'microsoft',200) as T
on T.[Key]=StoredCvs.PK
where filetype='resume'
order by rank desc
If the first 200 hits returned were all not of the fileType resume, you
would get no hits, even though there could be matches that might not occur
in the first 200 hits.
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
"Joe Zammit" <zammit_joe@.hotmail.com> wrote in message
news:u619ySFBFHA.824@.TK2MSFTNGP11.phx.gbl...
> Hi All
> I have a database of students which contains a table StoredCvs. This
> contains all of their CVs (Resume's) and has a full text index stored in a
> "StoredFilesCatalog". It is unlikely that this table will grow to more
than
> 1 million rows although it is possible.
> I now want to start storing files (for searching) for other types of
records
> in our system i.e. company files, project files, other candidate files
etc.
> etc. I am now faced with a few choices and wandered which would be the
best
> one. I have decided not to create another catalog at this stage as I feel
> this could be done later if performance is really bad. However, I was
> wandering what the difference would be between: a) Creating another table
> (StoredFiles), which uses the "StoredFilesCatalog", but would have a
> seperate fulltextindex. b) Add a FileTypeID column to my StoredCVs table
> (would rename this to StoredFiles) and store all files in the one table
> (with one fulltextindex).
> The main function of the system is to search Candidate Cvs (Resume's) so I
> was wandering if there is any performance advantage of having 2 tables or
if
> the searches would be the same as they both use the same
> "StoredFilesCatalog" anyway.
> Hope this makes sense and really appreciate any advice.
> Cheers
> Joe
>
|||Joe,
So, I can be sure of your environment, could you also post the full output
of -- SELECT @.@.version -- as this is most helpful in understanding your
environment. As I understand it you have one table: "StoredCvs with <1
million rows and one FT Catalog:StoredFilesCatalog", and you've decided not
to create another FT Catalog. Note, you can only have one FT Catalog defined
per table, but that one FT Catalog can support multiple column per table as
well as multiple tables.
Your decision is between the two option (a or b) below. Correct?
a) Creating another table (StoredFiles), which uses the
"StoredFilesCatalog", but would have a separate fulltextindex
b) Add a FileTypeID column to my StoredCVs table (would rename this to
StoredFiles) and store all files in the one table (with one fulltextindex).
If I have your environment correct, I'd recommend option a - create another
table (StoredFiles). Primarily because, SQL Server 2000 FT Catalogs start to
have performance issues with SQL Server 2000 tables at approx. 1 million
rows (still functional, but just need performance tuning) and adding
addition file types and larger files to your existing table (StoredCvs) will
cause it to grow above the 1 million row threshold. See SQL Server 2000 BOL
title "Full-text Search Recommendations" for more information on performance
tuning FT Catalogs on tables with more than 1 million rows.
Another issue/question that you did not mention is whether or not these two
tables will be often (always, sometimes, never) joined together in common
queries or in common FTS queries. If they are seldom or never joined in
frequently used queries, then it makes more sense for their to be separate
tables.
Hope that helps!
John
SQL Full Text Search Blog
http://spaces.msn.com/members/jtkane/
"Joe Zammit" <zammit_joe@.hotmail.com> wrote in message
news:u619ySFBFHA.824@.TK2MSFTNGP11.phx.gbl...
> Hi All
> I have a database of students which contains a table StoredCvs. This
> contains all of their CVs (Resume's) and has a full text index stored in a
> "StoredFilesCatalog". It is unlikely that this table will grow to more
than
> 1 million rows although it is possible.
> I now want to start storing files (for searching) for other types of
records
> in our system i.e. company files, project files, other candidate files
etc.
> etc. I am now faced with a few choices and wandered which would be the
best
> one. I have decided not to create another catalog at this stage as I feel
> this could be done later if performance is really bad. However, I was
> wandering what the difference would be between: a) Creating another table
> (StoredFiles), which uses the "StoredFilesCatalog", but would have a
> seperate fulltextindex. b) Add a FileTypeID column to my StoredCVs table
> (would rename this to StoredFiles) and store all files in the one table
> (with one fulltextindex).
> The main function of the system is to search Candidate Cvs (Resume's) so I
> was wandering if there is any performance advantage of having 2 tables or
if
> the searches would be the same as they both use the same
> "StoredFilesCatalog" anyway.
> Hope this makes sense and really appreciate any advice.
> Cheers
> Joe
>
|||Good Point!
I was thinking along the 2 table line anyway so thanks for your help.
"Hilary Cotter" <hilary.cotter@.gmail.com> wrote in message
news:OJX5EQIBFHA.1392@.TK2MSFTNGP14.phx.gbl...
> You will get better performance with two catalogs - one for each table.
You
> could add a separate column for FileType but this could be problematic if
> you are using top_n_by_Rank
> For instance suppose you do a query like this
> Select * from StoredCvs join containstable(StoredCvs,*,'microsoft',200) as
T[vbcol=seagreen]
> on T.[Key]=StoredCvs.PK
> where filetype='resume'
> order by rank desc
> If the first 200 hits returned were all not of the fileType resume, you
> would get no hits, even though there could be matches that might not occur
> in the first 200 hits.
> --
> Hilary Cotter
> Looking for a SQL Server replication book?
> http://www.nwsu.com/0974973602.html
> "Joe Zammit" <zammit_joe@.hotmail.com> wrote in message
> news:u619ySFBFHA.824@.TK2MSFTNGP11.phx.gbl...
a[vbcol=seagreen]
> than
> records
> etc.
> best
feel[vbcol=seagreen]
table[vbcol=seagreen]
I[vbcol=seagreen]
or
> if
>
sql

Friday, March 23, 2012

Performance problems using CONTAINS clause

Hello,
I have a FT-enabled table "TAB1". The FT-enabled columns (from TAB1) are:
"Col1", "Col2" and "Col3".
Searchs on this table may inlcude one or more columns, ie users may want to
filter only by Col1 and/or Col2 and/or Col3.
I've decided to build a dynamic SQL statement to be executed within a Stored
Procedure, to accomplish this.
Depending on the values passed into the stored procedure, the filters are
added to the statement (look at the below example. Please note that this is
not a complete SP, as many validations are missing, however it's enough to
catch the idea):
create procedure Proc1
@.filter1 varchar,
@.filter2 varchar,
@.filter3 varchar
as
declare @.strSQL nvarchar(4000)
set @.strSQL = 'select * from TAB1 where '
if @.filter1 is not NULL
set @.strSQL = @.strSQL + 'contains(Col1, ''' + @.filter1 + ''') '
if @.filter2 is not NULL
set @.strSQL = @.strSQL + 'and contains(Col2, ''' + @.filter2 + ''') '
if @.filter3 is not NULL
set @.strSQL = @.strSQL + 'and contains(Col3, ''' + @.filter3 + ''') '
exec (@.strSQL)
As you noticed, the result statement may include more than one CONTAINS
clause, and that's the point!
When the statement includes only one clause (ie, only one filter is to be
used), the performance is very good (few seconds to get the result). However,
if 2 or all the three filters are used, the performance is very bad. It seams
that a full search is being made for each of the FT-Columns (besides I think
I've already read it somewhere...I'm not sure!)
Do you know what is in fact wrong in this approach?
The problem happens whenever I use more than one CONTAINS clause. Is there
any other way to do do it?
Tip: I can not use the CONTAINS(*, 'word'), because users may want to filter
by 1 or 2 columns only, and not by ALL (*) of the FT-Columns.
Tip: I'm already using a "TOP X" in the statement to make sure that no more
than X rows are returned to the client application (I use a table variable to
hold the "TOP X" FT-Search results. If X rows have been coppied to the table
variable, then I return a specific error like "too many rows, please
re-define your search"...something like that. If less than X rows are
returned from FT-Search I can then join the table variable with other aux
tables in order to return the expected results data).
The table has +- 3Million rows.
I'm using SQL Server 2000.
Could you please help me to find a workarround to this situation?
Thanks in advance.
BR,
Hugo
Yukon may work better for you.
However, is there are only a limited number of possibilities of combinations
of columns you are going to search.
col1, col2, col3
col1, col2
col1, col3
col2, col3
col1,
col2
col3
Knowing this you could create 4 child tables which have composite columns
Table1 a column comprised of col1, col2, and col3
Table2 a column comprised of col1 and col2
Table3 a column comprised of col1 and col3
Table 4 a column comprised of col2 and col3
Your base table could field queries where the user is only querying on a
single column at a time. The other tables would handle each of the other
combinations.
You will get improved querying performance as there are no and/or clauses in
your queries and more threads will be available to handle querying and
indexing if each of these tables were in different catalogs.
Your performance problems will be solved but now you will have problems with
synchronization.
"Hugo Venancio" <HugoVenancio@.discussions.microsoft.com> wrote in message
news:A737C5B6-56C0-424A-8824-E119B878D861@.microsoft.com...
> Hello,
> I have a FT-enabled table "TAB1". The FT-enabled columns (from TAB1) are:
> "Col1", "Col2" and "Col3".
> Searchs on this table may inlcude one or more columns, ie users may want
> to
> filter only by Col1 and/or Col2 and/or Col3.
> I've decided to build a dynamic SQL statement to be executed within a
> Stored
> Procedure, to accomplish this.
> Depending on the values passed into the stored procedure, the filters are
> added to the statement (look at the below example. Please note that this
> is
> not a complete SP, as many validations are missing, however it's enough to
> catch the idea):
> create procedure Proc1
> @.filter1 varchar,
> @.filter2 varchar,
> @.filter3 varchar
> as
> declare @.strSQL nvarchar(4000)
> set @.strSQL = 'select * from TAB1 where '
> if @.filter1 is not NULL
> set @.strSQL = @.strSQL + 'contains(Col1, ''' + @.filter1 + ''') '
> if @.filter2 is not NULL
> set @.strSQL = @.strSQL + 'and contains(Col2, ''' + @.filter2 + ''') '
> if @.filter3 is not NULL
> set @.strSQL = @.strSQL + 'and contains(Col3, ''' + @.filter3 + ''') '
> exec (@.strSQL)
> As you noticed, the result statement may include more than one CONTAINS
> clause, and that's the point!
> When the statement includes only one clause (ie, only one filter is to be
> used), the performance is very good (few seconds to get the result).
> However,
> if 2 or all the three filters are used, the performance is very bad. It
> seams
> that a full search is being made for each of the FT-Columns (besides I
> think
> I've already read it somewhere...I'm not sure!)
> Do you know what is in fact wrong in this approach?
> The problem happens whenever I use more than one CONTAINS clause. Is there
> any other way to do do it?
> Tip: I can not use the CONTAINS(*, 'word'), because users may want to
> filter
> by 1 or 2 columns only, and not by ALL (*) of the FT-Columns.
> Tip: I'm already using a "TOP X" in the statement to make sure that no
> more
> than X rows are returned to the client application (I use a table variable
> to
> hold the "TOP X" FT-Search results. If X rows have been coppied to the
> table
> variable, then I return a specific error like "too many rows, please
> re-define your search"...something like that. If less than X rows are
> returned from FT-Search I can then join the table variable with other aux
> tables in order to return the expected results data).
> The table has +- 3Million rows.
> I'm using SQL Server 2000.
> Could you please help me to find a workarround to this situation?
> Thanks in advance.
> BR,
> Hugo
|||Hilary,
Thanks for your response, but I think I didn't understand your point,
because it seams that this approach doesn't return the same results as the
very first one I wrote in my previous post. Let's see if I'm wrong:
Supposing we have the BaseTable with one row:
Col1='b'
Col2='a'
Col3='c'
Now, the table Table1 (in your approach) would have one row with a single
column:
Col='b a c'
Now we want to perform the following search:
Filter1='a'
Filter2='b'
Filter3='c'
If I'd use my approach, it wouldn't return any row, right? Because the query
would be:
contains(Col1, 'a') and contains(Col2, 'b') and contains(Col3, 'c'), that
wouldn't return any row at all.
On the other hand, if I'd use your approach, I'd perform a search over the
Table1 like this:
contains(Col, 'a and b and c') that would in fact return ONE row!
Am I right? Or I REALLY didn't understand your approach?!?!

)))
Is this the synchronization problems you were talking about?
Could you please enlight me?
Thanks once again
BR,
Hugo
"Hilary Cotter" wrote:

> Yukon may work better for you.
> However, is there are only a limited number of possibilities of combinations
> of columns you are going to search.
> col1, col2, col3
> col1, col2
> col1, col3
> col2, col3
> col1,
> col2
> col3
> Knowing this you could create 4 child tables which have composite columns
> Table1 a column comprised of col1, col2, and col3
> Table2 a column comprised of col1 and col2
> Table3 a column comprised of col1 and col3
> Table 4 a column comprised of col2 and col3
> Your base table could field queries where the user is only querying on a
> single column at a time. The other tables would handle each of the other
> combinations.
> You will get improved querying performance as there are no and/or clauses in
> your queries and more threads will be available to handle querying and
> indexing if each of these tables were in different catalogs.
> Your performance problems will be solved but now you will have problems with
> synchronization.
>
> "Hugo Venancio" <HugoVenancio@.discussions.microsoft.com> wrote in message
> news:A737C5B6-56C0-424A-8824-E119B878D861@.microsoft.com...
>
>

Monday, March 26, 2012

Performance Question

Friday, March 23, 2012

Performance problems using CONTAINS clause

performance monitoring

Blog Archive

About Me