开发者

Sort using utf characters in mysql or php ? best solutions

Using MySQL, i'm selecting a list of songs in spanish that i would like to sort. Here's a list of names that is returned by the query:

开发者_StackOverflow中文版
  • ¡Decirevilla!
  • Alhambra
  • 123 pasitos
  • África
  • Arroz
  • Decir

The sorted list should look like this:

  • 123 pasitos
  • África
  • Alhambra
  • Arroz
  • ¡Decirevilla!
  • Decir

After all of the research i've read, i've concluded that there is no reasonable way to achieve this using MySQL. I've tried collation, charset, etc... but there is no way the character ¡, ?, etc... can by sorted accordingly to my desired result. Even the Á is not sorted the way i want to...

Question 1: Is this a reasonable conclusion?

I believe the only way to achieve this is by passing the results to an array in php and then sort the array using a custom function... all this using the function usort (need to sort by value and i don't care about maintaning the key association). Something similar to this:

function normalize($a, $b) {
  if ($a == $b) {
     return 0;
  }

  return ($a < $b) ? -1 : 1;
}


$tracks = array();

while ($row = $result->fetch_assoc()) {
    $tracks[] = $row;
}

usort($tracks, 'normalize');

Question 2: Is this the best way to achieve a custom sorting?

Here's where i'm hitting a wall:

Question 3: I have no idea how to create the normalize function to sort the names accordingly to my needs. How do i ignore certain characters (¡, ?, ', !, ¿) and how do i replace other characters with the natural equivalent (Á -> A, É -> E, etc..) I believe that by ignoring certain characters and replacing others, i can achieve the sorting i'm loojing for...

Question 4: All this make sense? Am i on the right path?

Thanks in advance for all your advice. Marco


You could add your own collation to MySQL. Then you could ignore whatever characters you don't care about, strip accents as needed, and generally sort things in any consistent way you desire.

Doing the mangled-collation on the client side (i.e. in PHP rather than in the database) won't be as quick as doing it in the database. This approach will also fail miserably as soon as you have to add LIMIT and OFFSET clauses to your query. I'm not sure if custom collations do The Right Thing for MAX() similar functions but doing the mangled-collation in PHP certainly won't unless you want to pull over the whole table, sort it, and then grab just one entry.

So, I would consider doing the collation outside the database as a last resort.

Another option, if you don't want to build your own collation, is to build an artificial column in your table that does sort properly. You could use a normalize() function in PHP-land (something like Jacob's would be reasonable starting point) and keep the result in the database as a column called, say, sortable_title; then ORDER BY sortable_title would do the trick. You'd want a normalize() PHP function that produced a list like this (no punctuation, all lower case, accents stripped, ...):

  • 123 pasitos
  • africa
  • alhambra
  • arroz
  • decirevilla
  • decir

So that a simple ASCII-betical sort will do The Right Thing. Of course, you would have to initialize sortable_title when doing INSERTs and regenerate it during UPDATEs but that should be fairly straight forward if your code is properly encapsulated.

Question 4: I think I'm going to disagree with Jacob and say that you're not going in the right direction by moving the collation out of the database. I'm not saying that you're completely off track but you're better off letting MySQL handle the sorting even though you might end up giving it some help with something like the sortable_title hack outlined above.


Question 2. That is a nice way to achieve custom sorting, then the only real work you have to do is on the comparison function.

Question 3. It could be worth converting the string into its ASCII equivalent using iconv. Which can convert UTF-8 to ASCII and using the translit, it will match character that can't be directly converted to something that looks like it.

i.e. Á -> A, É -> E, etc..

Once its converted then you could remove the characters you don't want to sort with using a preg_replace or str_replace.

Here's an example of a comparison function you could use.

function normalize_string($string) {
    $ascii = iconv("utf-8","ascii//TRANSLIT", $string);
    return str_replace(array('!', "'", '?'), '', $ascii);

    // or

    return preg_replace('/[!\'?]/', '', $ascii);

    // or depending on how much you do want to replace... \W => any "non-word" character

    return preg_replace('/\W/', '', $ascii);
}

function custom_str_cmp($a, $b) {
    return strcmp(normalize_string($a), normalize_string($b));
}

usort($tracks, 'custom_str_cmp');

Question 4. Yes.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜