How to use case insensitive pattern matching with PostgreSQL and Umlauts?
I'm trying to get PostgreSQL 8.4.3 to do case insensitive pattern matching with its ~*
operator when the strings contain non-ASCII characters like German umlauts开发者_StackOverflow中文版. The database, terminal, and everything else is configured to use UTF-8.
Here's the problem in a nutshell:
SELECT 'Ö' ~* 'ö'; -- false
There are other variants which do work:
SELECT 'Ö' ILIKE 'ö'; -- true
SELECT 'Ö' ~* '[Öö]'; -- true
SELECT LOWER('Ö') ~* 'ö'; -- true
None of these alternatives make me especially happy. ILIKE doesn't use regular expressions. [Öö] involves rewriting the search term. LOWER() is probably the best workaround, but I'd really like to get the ~*
operator working like it's supposed to.
Thanks in advance.
This is a bug in PostgreSQL versions prior to 9.0.
It's in the 9.0 changelog: http://www.postgresql.org/docs/9.0/static/release-9-0.html#AEN99075
Here is my test in 9.0 beta2 using Ubuntu:
SELECT 'Ö' ~* 'ö';
?column?
----------
t
(1 row)
I get true with this query:
SELECT 'Ö' ~* 'ö'; -- true
But I did use version 9.0beta2 at OS X 10.5.8 with these settings:
CREATE DATABASE test
WITH OWNER = postgres
ENCODING = 'UTF8'
TABLESPACE = pg_default
LC_COLLATE = 'nl_NL.UTF-8'
LC_CTYPE = 'nl_NL.UTF-8'
CONNECTION LIMIT = -1;
Edit: Same result on version 8.3.7. Looks like you have a problem with the encoding.
精彩评论