MyISAM key length limitation, trying to speed things up with md5 of fields
I work on small MyISAM table - around开发者_JS百科 30k entries, size - 10mb. One of the fields is varchar(500+), because I use utf8_unicode_ci I can't index this field(I hit 1000 bytes limit) and at the same time I need to perform lots of "get_or_create" queries based on this field. I am trying to optimize database but things are still to slow.
Is it a good solution to create additional field, which will hold md5 of varchar's value and index it/use for lookup? Has anyone tried this approach?
To me it seems like a bad idea to use such a wide column as a key, but that aside you can definitely do something like what you suggest. You don't even need to use MD5, all you need is a hash function that produces few collisions, but uniqueness is not necessary. CRC32
produces a small value, and is very fast.
Say your table looks like this:
CREATE TABLE data (lots_of_text VARCHAR(500));
change it to this:
CREATE TABLE data (text_hash INT, lots_of_text VARCHAR(500), INDEX (text_hash));
and when you insert rows you do:
INSERT INTO data (lots_of_text, text_hash)
VALUES ("lots and lots of text", CRC32("lots and lots of text"));
and then you can retrieve rows like this:
SELECT lots_of_text FROM data
WHERE text_hash = CRC32("lots and lots of text")
AND lots_of_text = "lots and lots of text";
the query will use the index on text_hash
, but since CRC32
will not produce unique values you still need to check the lots_of_text
field for equality -- but the query will still be quick since at most a few rows will have the same hash.
A variant of this is to use the first 50 characters or so as a hash, the number of rows having the same first 50 characters is very likely to be low.
精彩评论