开发者

comparing strings in PostgreSQL

Is there any way in PostgreSQL to convert UTF-8 characters to "similar" ASCII characters?

String glāžšķūņu rūķīši would have to be converted to glazskunu rukisi. UTF-8 text is not in some specific language, it might be in Latvian, Russian, English, Italian or any other language.

This is needed for using in where clause, so it might be just "comparing strings" rather than "converting strings".

I tried using convert, but it does not give desired results (e.g., select convert('Ā', 'utf8', 'sql_ascii') gives \304\200, not A).

Database is created with:

ENCODING = 'UTF8'
LC_COLLATE = 'Latvian_Latvia.1257'
LC_CTYPE = 'Latvian_Latvia.1257'

These开发者_如何学Python params may be changed, if necessary.


I found different ways to do this on the PostgreSQL Wiki.

In plperl:

CREATE OR REPLACE FUNCTION unaccent_string(text) RETURNS text AS $$
my ($input_string) = @_;
$input_string =~ s/[âãäåāăą]/a;
$input_string =~ s/[ÁÂÃÄÅĀĂĄ]/A;
$input_string =~ s/[èééêëēĕėęě]/e;
$input_string =~ s/[ĒĔĖĘĚ]/E;
$input_string =~ s/[ìíîïìĩīĭ]/i;
$input_string =~ s/[ÌÍÎÏÌĨĪĬ]/I;
$input_string =~ s/[óôõöōŏő]/o;
$input_string =~ s/[ÒÓÔÕÖŌŎŐ]/O;
$input_string =~ s/[ùúûüũūŭů]/u;
$input_string =~ s/[ÙÚÛÜŨŪŬŮ]/U;
return $input_string;
$$ LANGUAGE plperl;

In pure SQL:

CREATE OR REPLACE FUNCTION unaccent_string(text)
RETURNS text
IMMUTABLE
STRICT
LANGUAGE SQL
AS $$
SELECT translate(
    $1,
    'âãäåāăąÁÂÃÄÅĀĂĄèééêëēĕėęěĒĔĖĘĚìíîïìĩīĭÌÍÎÏÌĨĪĬóôõöōŏőÒÓÔÕÖŌŎŐùúûüũūŭůÙÚÛÜŨŪŬŮ',
    'aaaaaaaaaaaaaaaeeeeeeeeeeeeeeeiiiiiiiiiiiiiiiiooooooooooooooouuuuuuuuuuuuuuuu'
);
$$;

And in plpython:

create or replace function unaccent(text) returns text language plpythonu as $$
import unicodedata
rv = plpy.execute("select setting from pg_settings where name = 'server_encoding'");
encoding = rv[0]["setting"]
s = args[0].decode(encoding)
s = unicodedata.normalize("NFKD", s)
s = ''.join(c for c in s if ord(c) < 127)
return s
$$;

In your case, a translate() call with all the characters you can find in the UTF-8 table should be enough.


Use pg_collkey() for ICU supported unicode compare: - http://www.public-software-group.org/pg_collkey - http://russ.garrett.co.uk/tag/postgresql/

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜