Is Levenshtein slow in MySQL?
Yesterday I had a question where people suggested I use Levenshtein me开发者_JAVA百科thod. Is it a slow query? Maybe I can use something else?
You can use the BENCHMARK
function to test the performance:
SELECT BENCHMARK(10000, LEVENSHTEIN('abc', 'abd'));
Maybe test it with different strings similar to your use case.
It depends on your dataset.
I found I could speed it up considerably by only comparing strings of similar length.
How similar in length your strings need to be will depend on your data.
There is an article on the subject here: http://kerbtier.ch/2008/12/30/levenshtein-to-slow-how-to-speed-it-up
If you want it perform well then normalize your schema.
The problem is that in order to determine how similar other data is, the DBMS has to load that data and compare it with the datum. So it has to read through every single row in the table (except the current one) to find 'similar' values. It cannot use indexes to find data which is close to the datum.
If, on the other hand you used a schema like this:
CREATE TABLE member (
member_id INT(11),
member_data CLOB,
PRIMARY KEY (member_id));
CREATE TABLE about_member (
member_id INT(11),
metric VARCHAR(10),
value MEDIUMINT(9),
PRIMARY KEY (member_id, metric),
KEY by_value (metric, value, member_id));
Note that you about_member (1-1-2-2-1) string should be implemented as seperate rows, e.g.
member_id metric value
1234 lost 2
1234 won 3
1234 drawn 1
1234 handicap 9
Then you can use the indexes affectively, for example with the following query.
SELECT compare.member_id, SUM(ABS(compare.value-datum.value)) AS difference
FROM about_member compare, about_member datum
WHERE datum.member_id=$MEMBER_TO_COMPARE
AND compare.member_id<>datum.member_id
AND compare.metric=datum.metric
AND compare.metric BETWEEN (datum.metric-1) AND (datum.metric+1) /* tweak here */
GROUP BY compare.member_id;
精彩评论