开发者

How to use case insensitive pattern matching with PostgreSQL and Umlauts?

I'm trying to get PostgreSQL 8.4.3 to do case insensitive pattern matching with its ~* operator when the strings contain non-ASCII characters like German umlauts开发者_StackOverflow中文版. The database, terminal, and everything else is configured to use UTF-8.

Here's the problem in a nutshell:

SELECT 'Ö' ~* 'ö';      -- false

There are other variants which do work:

SELECT 'Ö' ILIKE 'ö';     -- true
SELECT 'Ö' ~* '[Öö]';     -- true
SELECT LOWER('Ö') ~* 'ö'; -- true

None of these alternatives make me especially happy. ILIKE doesn't use regular expressions. [Öö] involves rewriting the search term. LOWER() is probably the best workaround, but I'd really like to get the ~* operator working like it's supposed to.

Thanks in advance.


This is a bug in PostgreSQL versions prior to 9.0.
It's in the 9.0 changelog: http://www.postgresql.org/docs/9.0/static/release-9-0.html#AEN99075

Here is my test in 9.0 beta2 using Ubuntu:

SELECT 'Ö' ~* 'ö';
 ?column? 
----------
 t
(1 row)


I get true with this query:

SELECT 'Ö' ~* 'ö'; -- true

But I did use version 9.0beta2 at OS X 10.5.8 with these settings:

CREATE DATABASE test
  WITH OWNER = postgres
       ENCODING = 'UTF8'
       TABLESPACE = pg_default
       LC_COLLATE = 'nl_NL.UTF-8'
       LC_CTYPE = 'nl_NL.UTF-8'
       CONNECTION LIMIT = -1;

Edit: Same result on version 8.3.7. Looks like you have a problem with the encoding.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜