开发者

How to write a php search script in which words with diacritics match search terms without diacritics, and the results are underlined?

I've got this site where there are lots of texts with diacritics in them (ancillary glyphs added to letters, according to wikipedia) and most people search these texts using words without the glyphs. Now it shouldn't be challenging to do this by having a copy of the texts without diacritics. How开发者_如何学Goever, I want to highlight the matches in the original text. What's the best way to do it?


You should try changing the collation setting in your MySQL DB.

There are three that seem to come up often in discussions of this topic:

  1. utf8_general_ci

  2. utf8_unicode_ci

  3. utf8_bin ← You probably want this one.

I have found that #3 will match accents in search. This answer gives a bit of background on the differences, but it doesn't mention the fact that utf8_bin is also sensitive to accents. You might want to try all three so you can test for yourself if it's working with the language/script you're dealing with.

To be really sure that things are going to match correctly, you will have to look into Unicode Normalization as well, which is really a whole different ball of wax. It is possible that your user could type in a query with an accent in a different normalization from the one your data is stored in, and thus it might fail to match. I've had that problem with Sqlite, not sure if it applies to MySQL or not.

FWIW, here's a CREATE TABLE I'm currently using, where I needed to match accents, that is setting the COLLATION:

CREATE TABLE `glosses` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `morphemes` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  `labels` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  `phrase_id` int(11) DEFAULT NULL,
  `nth_word` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=8 DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

You can see the COLLATE=utf8_bin tacked on at the end.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜