How to do MySQL queries on text with umlauts in Drupal?
I need to differentiate between nodes starting with 'O' and nodes starting with 'Ö' (o umlaut).
The problem is that since the node table and the title column have utf8_general_ci collation, MYSQL does not treat o and ö umlaut differently and this query returns nodes starting with O AND nodes starting with Ö
SELECT node.nid AS nid
FROM node AS node WHERE node.status <> 0 AND SUBSTR( node.title, 1, 1 ) = 'O'Since all Drupal modules use utf8_general_ci collation, I guess changing the collation of the table and entire database is probably not a goo开发者_如何学编程d idea.
What is a good workaround for this?
You can do a comparison using a different collation to the one the columns are using:
mysql> SELECT 'foo'='föo' COLLATE 'utf8_general_ci';
1
mysql> SELECT 'foo'='föo' COLLATE 'utf8_swedish_ci';
0
(o
and ö
being different letters in the Swedish collation. You could also use utf8_bin
if you don't want any different characters to match, even different cases of the same letter.)
However doing a compare of columns where the collation is not the same as the collation of each column means indexes can't be used. So it may be more efficient simply to ALTER
the tables to the more-specific collation you want.
Unfortunately, as you've noticed, Drupal 6 enforces utf8_general_ci
collation.
However, this is a known issue, and people have been working on allowing the collation to be specified in settings.php
: Database default collation is not respected
There's currently a patch in that issue (#90) for Drupal 6 in review that adds this ability. Once patched, all you need to do is add:
$db_collation = 'utf8_swedish_ci`;
to your settings.php
file.
精彩评论