Friday, March 23, 2012

Performance problems using CONTAINS clause

Hello,
I have a FT-enabled table "TAB1". The FT-enabled columns (from TAB1) are:
"Col1", "Col2" and "Col3".
Searchs on this table may inlcude one or more columns, ie users may want to
filter only by Col1 and/or Col2 and/or Col3.
I've decided to build a dynamic SQL statement to be executed within a Stored
Procedure, to accomplish this.
Depending on the values passed into the stored procedure, the filters are
added to the statement (look at the below example. Please note that this is
not a complete SP, as many validations are missing, however it's enough to
catch the idea):
create procedure Proc1
@.filter1 varchar,
@.filter2 varchar,
@.filter3 varchar
as
declare @.strSQL nvarchar(4000)
set @.strSQL = 'select * from TAB1 where '
if @.filter1 is not NULL
set @.strSQL = @.strSQL + 'contains(Col1, ''' + @.filter1 + ''') '
if @.filter2 is not NULL
set @.strSQL = @.strSQL + 'and contains(Col2, ''' + @.filter2 + ''') '
if @.filter3 is not NULL
set @.strSQL = @.strSQL + 'and contains(Col3, ''' + @.filter3 + ''') '
exec (@.strSQL)
As you noticed, the result statement may include more than one CONTAINS
clause, and that's the point!
When the statement includes only one clause (ie, only one filter is to be
used), the performance is very good (few seconds to get the result). However,
if 2 or all the three filters are used, the performance is very bad. It seams
that a full search is being made for each of the FT-Columns (besides I think
I've already read it somewhere...I'm not sure!)
Do you know what is in fact wrong in this approach?
The problem happens whenever I use more than one CONTAINS clause. Is there
any other way to do do it?
Tip: I can not use the CONTAINS(*, 'word'), because users may want to filter
by 1 or 2 columns only, and not by ALL (*) of the FT-Columns.
Tip: I'm already using a "TOP X" in the statement to make sure that no more
than X rows are returned to the client application (I use a table variable to
hold the "TOP X" FT-Search results. If X rows have been coppied to the table
variable, then I return a specific error like "too many rows, please
re-define your search"...something like that. If less than X rows are
returned from FT-Search I can then join the table variable with other aux
tables in order to return the expected results data).
The table has +- 3Million rows.
I'm using SQL Server 2000.
Could you please help me to find a workarround to this situation?
Thanks in advance.
BR,
Hugo
Yukon may work better for you.
However, is there are only a limited number of possibilities of combinations
of columns you are going to search.
col1, col2, col3
col1, col2
col1, col3
col2, col3
col1,
col2
col3
Knowing this you could create 4 child tables which have composite columns
Table1 a column comprised of col1, col2, and col3
Table2 a column comprised of col1 and col2
Table3 a column comprised of col1 and col3
Table 4 a column comprised of col2 and col3
Your base table could field queries where the user is only querying on a
single column at a time. The other tables would handle each of the other
combinations.
You will get improved querying performance as there are no and/or clauses in
your queries and more threads will be available to handle querying and
indexing if each of these tables were in different catalogs.
Your performance problems will be solved but now you will have problems with
synchronization.
"Hugo Venancio" <HugoVenancio@.discussions.microsoft.com> wrote in message
news:A737C5B6-56C0-424A-8824-E119B878D861@.microsoft.com...
> Hello,
> I have a FT-enabled table "TAB1". The FT-enabled columns (from TAB1) are:
> "Col1", "Col2" and "Col3".
> Searchs on this table may inlcude one or more columns, ie users may want
> to
> filter only by Col1 and/or Col2 and/or Col3.
> I've decided to build a dynamic SQL statement to be executed within a
> Stored
> Procedure, to accomplish this.
> Depending on the values passed into the stored procedure, the filters are
> added to the statement (look at the below example. Please note that this
> is
> not a complete SP, as many validations are missing, however it's enough to
> catch the idea):
> create procedure Proc1
> @.filter1 varchar,
> @.filter2 varchar,
> @.filter3 varchar
> as
> declare @.strSQL nvarchar(4000)
> set @.strSQL = 'select * from TAB1 where '
> if @.filter1 is not NULL
> set @.strSQL = @.strSQL + 'contains(Col1, ''' + @.filter1 + ''') '
> if @.filter2 is not NULL
> set @.strSQL = @.strSQL + 'and contains(Col2, ''' + @.filter2 + ''') '
> if @.filter3 is not NULL
> set @.strSQL = @.strSQL + 'and contains(Col3, ''' + @.filter3 + ''') '
> exec (@.strSQL)
> As you noticed, the result statement may include more than one CONTAINS
> clause, and that's the point!
> When the statement includes only one clause (ie, only one filter is to be
> used), the performance is very good (few seconds to get the result).
> However,
> if 2 or all the three filters are used, the performance is very bad. It
> seams
> that a full search is being made for each of the FT-Columns (besides I
> think
> I've already read it somewhere...I'm not sure!)
> Do you know what is in fact wrong in this approach?
> The problem happens whenever I use more than one CONTAINS clause. Is there
> any other way to do do it?
> Tip: I can not use the CONTAINS(*, 'word'), because users may want to
> filter
> by 1 or 2 columns only, and not by ALL (*) of the FT-Columns.
> Tip: I'm already using a "TOP X" in the statement to make sure that no
> more
> than X rows are returned to the client application (I use a table variable
> to
> hold the "TOP X" FT-Search results. If X rows have been coppied to the
> table
> variable, then I return a specific error like "too many rows, please
> re-define your search"...something like that. If less than X rows are
> returned from FT-Search I can then join the table variable with other aux
> tables in order to return the expected results data).
> The table has +- 3Million rows.
> I'm using SQL Server 2000.
> Could you please help me to find a workarround to this situation?
> Thanks in advance.
> BR,
> Hugo
|||Hilary,
Thanks for your response, but I think I didn't understand your point,
because it seams that this approach doesn't return the same results as the
very first one I wrote in my previous post. Let's see if I'm wrong:
Supposing we have the BaseTable with one row:
Col1='b'
Col2='a'
Col3='c'
Now, the table Table1 (in your approach) would have one row with a single
column:
Col='b a c'
Now we want to perform the following search:
Filter1='a'
Filter2='b'
Filter3='c'
If I'd use my approach, it wouldn't return any row, right? Because the query
would be:
contains(Col1, 'a') and contains(Col2, 'b') and contains(Col3, 'c'), that
wouldn't return any row at all.
On the other hand, if I'd use your approach, I'd perform a search over the
Table1 like this:
contains(Col, 'a and b and c') that would in fact return ONE row!
Am I right? Or I REALLY didn't understand your approach?!?! )))
Is this the synchronization problems you were talking about?
Could you please enlight me?
Thanks once again
BR,
Hugo
"Hilary Cotter" wrote:

> Yukon may work better for you.
> However, is there are only a limited number of possibilities of combinations
> of columns you are going to search.
> col1, col2, col3
> col1, col2
> col1, col3
> col2, col3
> col1,
> col2
> col3
> Knowing this you could create 4 child tables which have composite columns
> Table1 a column comprised of col1, col2, and col3
> Table2 a column comprised of col1 and col2
> Table3 a column comprised of col1 and col3
> Table 4 a column comprised of col2 and col3
> Your base table could field queries where the user is only querying on a
> single column at a time. The other tables would handle each of the other
> combinations.
> You will get improved querying performance as there are no and/or clauses in
> your queries and more threads will be available to handle querying and
> indexing if each of these tables were in different catalogs.
> Your performance problems will be solved but now you will have problems with
> synchronization.
>
> "Hugo Venancio" <HugoVenancio@.discussions.microsoft.com> wrote in message
> news:A737C5B6-56C0-424A-8824-E119B878D861@.microsoft.com...
>
>

No comments:

Post a Comment