开发者

I need to implement wildcard search using % on both ends, how can I improve performance?

I will be implementing this in the sql side, so I would end up having something like

where company.name like '%:pa开发者_如何学编程rameter%'

I've read before that this would then make it impossible for the DB to use the company.name index.

My question then would be, what other options do I have to minimize the performance degradation that this will introduce to the search? Please note that, this is client requirements therefore I don't have the option to not implement (even though I explained to them the performance consequence of this).

The application is using Sybase SE 12.5.3 (based on the driver used in DBArtisan 8.5.5)


I don't know Sybase, but you could do something along the lines below:

  • Create a second field, which is the reverse of the company name.
  • Add an index on this new field
  • Use the following where clause

    WHERE company_name like ':parameter%' or reverse_name like Reverse(parameter)+'%'

Hope this points you in a good direction


You're dead right, that is a performance killer.

One method we've used in the past is to store all possible company names in a separate table, referring back to the main table. This is your classic time/space trade-off for optimisation.

In other words, let's say you have two companies in your main table, ICBM and Microsloth.

What you can do is to create another table containing the following:

TextSegment    varchar(?) indexed
ActualCompany  varchar(?)

and populate it as follows:

TextSegment  ActualCompany
-----------  -------------
ICBM         ICBM
CBM          ICBM
BM           ICBM
M            ICBM
Microsloth   Microsloth
icrosloth    Microsloth
crosloth     Microsloth
rosloth      Microsloth
osloth       Microsloth
sloth        Microsloth
loth         Microsloth
oth          Microsloth
th           Microsloth
h            Microsloth

Then, when you're looking for companies that are like %slo%, you can use:

select ActualCompany from LookupTable where TextSegment like 'slo%'

That allows you to use the index for that table more efficiently that with %...% on the other table.

Now, keep in mind you will need triggers on the original table to ensure the lookup table is consistent. And this will take a fair bit of space (depending on your data) but one thing I've noticed is that few people complain about how big their databases are, most problems are with speed.

The time impact of maintaining the separate table is usually not too bad since thevast majority of databases are read far more often than written. This method moves the cost from the select to the insert/update, where it can be amortised quite well.


The possibility exits that an index could potentially be used, but its highly unlikely (see this comment). If possible, I would query based on another indexed field, and then sub-select against those results, to at least avoid a full table scan.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜