开发者

MyISAM key length limitation, trying to speed things up with md5 of fields

I work on small MyISAM table - around开发者_JS百科 30k entries, size - 10mb. One of the fields is varchar(500+), because I use utf8_unicode_ci I can't index this field(I hit 1000 bytes limit) and at the same time I need to perform lots of "get_or_create" queries based on this field. I am trying to optimize database but things are still to slow.

Is it a good solution to create additional field, which will hold md5 of varchar's value and index it/use for lookup? Has anyone tried this approach?


To me it seems like a bad idea to use such a wide column as a key, but that aside you can definitely do something like what you suggest. You don't even need to use MD5, all you need is a hash function that produces few collisions, but uniqueness is not necessary. CRC32 produces a small value, and is very fast.

Say your table looks like this:

CREATE TABLE data (lots_of_text VARCHAR(500));

change it to this:

CREATE TABLE data (text_hash INT, lots_of_text VARCHAR(500), INDEX (text_hash));

and when you insert rows you do:

INSERT INTO data (lots_of_text, text_hash) 
VALUES ("lots and lots of text", CRC32("lots and lots of text"));

and then you can retrieve rows like this:

SELECT lots_of_text FROM data
WHERE text_hash = CRC32("lots and lots of text")
AND lots_of_text = "lots and lots of text";

the query will use the index on text_hash, but since CRC32 will not produce unique values you still need to check the lots_of_text field for equality -- but the query will still be quick since at most a few rows will have the same hash.

A variant of this is to use the first 50 characters or so as a hash, the number of rows having the same first 50 characters is very likely to be low.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜