How do I know if my PostgreSQL server is using the "C" locale?

2022-12-10 14:31 问答作者：

I'm trying to optimize my PostgreSQL 8.3 DB tables to the best of my ability, and I'm unsure if I need to use varchar_pattern_ops for certain columns where I'm performing a LIKE against the first N characters of a string. According to this documentation, the use of xxx_pattern_ops is only necessary "...when the server does not use the standard 'C' locale".

Can someone explain what this means? How do I check what locale my 开发者_如何学运维database is using?

Currently some locale [docs] support can only be set at initdb time, but I think the one relevant to _pattern_ops can be modified via SET at runtime, LC_COLLATE. To see the set values you can use the SHOW command.

For example:

SHOW LC_COLLATE

_pattern_ops indexes are useful in columns that use pattern matching constructs, like LIKE or regexps. You still have to make a regular index (without _pattern_ops) to do equality search on an index. So you have to take all this into consideration to see if you need such indexes on your tables.

About what locale is, it's a set of rules about character ordering, formatting and similar things that vary from language/country to another language/country. For instance, the locale fr_CA (French in Canada) might have some different sorting rules (or way of displaying numbers and so on) than en_CA (English in Canada.). The standard "C" locale is the POSIX standards-compliant default locale. Only strict ASCII characters are valid, and the rules of ordering and formatting are mostly those of en_US (US English)

In computing, locale is a set of parameters that defines the user's language, country and any special variant preferences that the user wants to see in their user interface. Usually a locale identifier consists of at least a language identifier and a region identifier.

psql -l

according to handbook

example output:

                               List of databases
    Name     | Owner  | Encoding |   Collate   |    Ctype    | Access privileges
-------------+--------+----------+-------------+-------------+-------------------
 packrd      | packrd | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 postgres    | packrd | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 template0   | packrd | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/packrd        +
             |        |          |             |             | packrd=CTc/packrd
 template1   | packrd | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/packrd        +
             |        |          |             |             | packrd=CTc/packrd
(5 rows)

OK, from my perusings, it appears that this initial setting

initdb --locale=xxx

 --locale=locale
       Specifies the locale to be used in this database. This is equivalent to specifying both --lc-collate and --lc-ctype.

basically specifies the "default" locale for all database that you create after that (i.e. it specifies the settings for template1, which is the default template). You can create new databases with a different locale like this:

Locale is different than encoding, you can manually specify it and/or encoding:

 CREATE DATABASE korean WITH ENCODING 'EUC_KR' LC_COLLATE='ko_KR.euckr' LC_CTYPE='ko_KR.euckr' TEMPLATE=template0;

If you want to manually call it out.

Basically if you don't specify it, it uses the system default, which is almost never "C".

So if your show LC_COLLATE returns anything other than "C" or "POSIX" then you are not using the standard C locale and you will need to specify the xxx_pattern_ops for your indexes. Note also the caveat that if you want to use the <, <=, >, or >= operators you need to create a second index without the xxx_pattern_ops flag (unless you are using the standard C locale on your database, which is rare...). For just == and LIKE (etc.) then you don't need a second index. If you don't need LIKE then you don't need the index with xxx_pattern_ops, possibly, as well.

Even if your indexes are defined to collate with the "default" like

CREATE INDEX my_index_name
  ON table_name
  USING btree
  (identifier COLLATE pg_catalog."default");

This is not enough, unless the default is the "C" (or POSIX, same thing) collation, it can't be used for patterns like LIKE 'ABC%'. You need something like this:

CREATE INDEX my_index_name
  ON table_name
  USING btree
  (identifier COLLATE pg_catalog."default" varchar_pattern_ops);

If you've got the option...

You could recreate the database cluster with the C locale.

You need to pass the locale to initdb when initializing your Postgres instance.

You can do this regardless of what the server's default or user's locale is.

That's a server administration command though, not a database schema designers task. The cluster contains all the databases on the server, not just the one you're optimising.

It creates a brand new cluster, and does not migrate any of your existing databases or data. That'd be additional work.

Furthermore, if you're in a position where you can consider creating a new cluster as an option, you really should be considering using PostgreSQL 8.4 instead, which can have per-database locales, specified in the CREATE DATABASE statement.

There is also another way (assuming you want to check them, not modify them):

Check file /var/lib/postgres/data/postgresql.conf Following lines should be found:

# These settings are initialized by initdb, but they can be changed.
lc_messages = 'en_US.UTF-8'                     # locale for system error message strings
lc_monetary = 'en_US.UTF-8'                     # locale for monetary formatting
lc_numeric = 'en_US.UTF-8'                      # locale for number formatting
lc_time = 'en_US.UTF-8'                         # locale for time formatting

继续阅读：indexing locale postgresql text varchar

How do I know if my PostgreSQL server is using the "C" locale?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？