REGEX in MySQL to find Japanese matches
I need to find all entries in a table that consist only out of certain Japanese UTF-8 characters.
For example, I want all fields that consist only out of 1 (一) and 2 (二).
I am using
SELECT combi_id, keb FROM combi WHERE keb REGEXP '[二一]+'
but it matches lots of other fields that contain different characters, what am I doing wrong?
This is the table:
CREATE TABLE IF NOT EXISTS `combi` (
`combi_id` int(11) NOT NULL auto_increment,
`ent_seq` int(11) NOT NULL,
`reb` text NOT NULL,
`keb` text NOT NULL,
`ant` text NOT NULL,
`ke_pri` text NOT NULL,
`re_pri` text NOT NULL,
`re_restr` text NOT NULL,
`stagr` text NOT NULL,
`s_inf` text NOT NULL,
`lsource` text NOT NULL,
`gloss` text NOT NULL,
`xref` text NOT NULL,
`stagk` text NOT NULL,
PRIMARY KEY (`combi_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=146740 ;
and this is a sample line of data:
(22, 1000225, 'あからさま', '明白|偸閑|白地', '', '', '', '', '', '', '', 'plain|frank|candid|open|direct|straightforward|unabashed|blatant|flagrant', '开发者_如何学C', ''),
thanks a lot for the help!
If you want to match columns with only those characters, you should use
SELECT combi_id, keb FROM combi WHERE keb REGEXP '^[二一]+$'
Note the ^ in the beginning, and the $ in the end, meaning respectively "start of string" and "end of string". Without those, the regexp could match at any position.
Edit: testing it
mysql> select * from test;
+--------+
| f1 |
+--------+
| 二 |
| 東京 |
| 人 |
| 丸 |
+--------+
4 rows in set (0.00 sec)
mysql> select * from test where f1 regexp _utf8'[一二]';
+--------+
| f1 |
+--------+
| 二 |
| 東京 |
| 人 |
| 丸 |
+--------+
4 rows in set (0.00 sec)
Wow, indeed, character classes in mysql regexps sound seriously broken... This works, though:
mysql> select * from test where f1 regexp _utf8'(一|二)';
+------+
| f1 |
+------+
| 二 |
+------+
1 row in set (0.00 sec)
精彩评论