mysql accent insensitive and dotted insensitive search

2023-04-11 16:50 问答作者：

The Problem: I am trying to implement a search algorithm that shows the results even when dotted chars are provided. In other words: SELECT 'über' = 'uber' or SELECT 'mas' = 'maş' these results will return true. This would apply for every single char in the following array:

$arr = array('ş' => 's', 'ç' => 'c', 'ö' => 'o', 'ü' => 'u' and so on ...);

The Solution In My Mind: Along with the original column, I can have a particular column that stores the English names. So before storing 'über' to database, I will also convert it to 'uber' in php and then will store both 'über' (as the original) and 'uber' (as the searchable) to the database.

But then, even though I've searched for this the whole day, I still believe that there should be a simplier and cleaner way to accomplish the task since this would mean (more or less) to store the same data twice in the database. So guys, what do you think is the solution the only way to go or you know a better approach?

EDIT

For accent insensitive I've seen the posts on SO, they are working but since I am also considering the dotted chars, I had to ask this question.

EDIT2

I cannot post the whole table structure and code exactly for some reasons but I'll provide a close example.

myusers | CREATE TABLE `myusers` (
id int auto_increment not null primary key,
email varchar(100) COLLATE latin1_general_ci not null,
fullname varchar(75) COLLATE latin1_general_ci not null)
PRIMARY KEY('id')
) ENGINE=MyISAM AUTO_INCREMENET=2 DEFAULT CHARSET=latin1 COLLATE latin1_general_ci |

The above is the structure of the table. Here comes the inserts and selects:

INSERT INTO myusers (fullname) VALUES ('Agüeda');
INSERT INTO myusers (fullname) VALUES ('Agueda');

SELECT * FROM myusers WHERE fullname = 'Agüeda' COLLATE latin1_general_ci 

+----+-------+----------+
| id | email | fullname |
+----+-------+----------+
|  1 |       | Agüeda   |
+----+-------+----------+
1 row in set (0.00 sec)

SELECT * FROM myusers WHERE fullname = 'agueda' COLLATE latin1_general_ci 

+----+-------+-------开发者_JAVA百科---+
| id | email | fullname |
+----+-------+----------+
|  2 |       | Agueda   |
+----+-------+----------+
1 row in set (0.00 sec)

Well, the desired result is obviously when agueda is searched both 'Agueda' and 'Agüeda' will return, but that's not the case. As I mentioned above, I have created a new column and store the whole name in English characters and make the search from there as well. But still, it costs me a two times search (because I am also searching from the original columns which rank higher in the search result). There should be a better way...

1) Write your own collation. latin1_general_diacriticinsensitive. I wouldn't even know where to begin, though :).

2) Use regex and character groups: /[uü]ber/

3) The Solution In Your Mind. I'd personally use this, since design is all about compromise and this is a simple solution with just a 100% space overhead. Granted, the space overhead might eventually turn into a speed overhead, especially with MySQL, but that's to worry about later. This is also very easy to undo if need be.

Just use an appropriate collation. For instance:

create table test(
    foo text
) collate = utf8_unicode_ci;
insert into test values('Agüeda');
insert into test values('Agueda');
select * from test where foo = 'Agueda';

This gives your two rows.

Take a look at this post: https://stackoverflow.com/questions/500826

He has just the opposite issue you're facing. Look at the WHERE clause in the selected answer. Probably you could just use the _ci suffix and it'll work.

Let us know how this is resolved.

Well, instead of trying to replace them and run the search the x-times, I'd suggest using the mysql function LIKE i.e.

SELECT * FROM x WHERE search LIKE '%ber'

Where you have to replace the diacritics with"%.

EDIT: My mistake % replaces any number of characters. Use _ for a single char.

继续阅读：php search

mysql accent insensitive and dotted insensitive search

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？