I need to implement wildcard search using % on both ends, how can I improve performance?
I will be implementing this in the sql side, so I would end up having something like
where company.name like '%:pa开发者_如何学编程rameter%'
I've read before that this would then make it impossible for the DB to use the company.name index.
My question then would be, what other options do I have to minimize the performance degradation that this will introduce to the search? Please note that, this is client requirements therefore I don't have the option to not implement (even though I explained to them the performance consequence of this).
The application is using Sybase SE 12.5.3 (based on the driver used in DBArtisan 8.5.5)
I don't know Sybase, but you could do something along the lines below:
- Create a second field, which is the reverse of the company name.
- Add an index on this new field
Use the following where clause
WHERE company_name like ':parameter%' or reverse_name like Reverse(parameter)+'%'
Hope this points you in a good direction
You're dead right, that is a performance killer.
One method we've used in the past is to store all possible company names in a separate table, referring back to the main table. This is your classic time/space trade-off for optimisation.
In other words, let's say you have two companies in your main table, ICBM
and Microsloth
.
What you can do is to create another table containing the following:
TextSegment varchar(?) indexed
ActualCompany varchar(?)
and populate it as follows:
TextSegment ActualCompany
----------- -------------
ICBM ICBM
CBM ICBM
BM ICBM
M ICBM
Microsloth Microsloth
icrosloth Microsloth
crosloth Microsloth
rosloth Microsloth
osloth Microsloth
sloth Microsloth
loth Microsloth
oth Microsloth
th Microsloth
h Microsloth
Then, when you're looking for companies that are like %slo%
, you can use:
select ActualCompany from LookupTable where TextSegment like 'slo%'
That allows you to use the index for that table more efficiently that with %...%
on the other table.
Now, keep in mind you will need triggers on the original table to ensure the lookup table is consistent. And this will take a fair bit of space (depending on your data) but one thing I've noticed is that few people complain about how big their databases are, most problems are with speed.
The time impact of maintaining the separate table is usually not too bad since thevast majority of databases are read far more often than written. This method moves the cost from the select to the insert/update, where it can be amortised quite well.
The possibility exits that an index could potentially be used, but its highly unlikely (see this comment). If possible, I would query based on another indexed field, and then sub-select against those results, to at least avoid a full table scan.
精彩评论