开发者

How to set thousand separator for PostgreSQL?

I want to format long numbers using thousand separator. It can be done using to_char function just like:

SELECT TO_CHAR(76543210.98, '999G999G990D00')

But when my PostgreSQL server with UTF-8 encoding is on Polish version of Windows such SELECT ends with:

ERROR:  invalid byte sequence for encoding "UTF8": 0xa0
HINT:  This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".

In to_char pattern G is described as: group separator (uses locale). This SELECT works without error when server is running on Linux with Polish locale.

As a workaround I use space instead of G in format string, but I think there should be way to set thousand separator开发者_JS百科 just like in Oracle:

ALTER SESSION SET NLS_NUMERIC_CHARACTERS=', ';

Is such setting available for PostgreSQL?


If you use psql, you can execute this:

\pset numericlocale

Example:

test=# create temporary table a (a numeric(20,10));
CREATE TABLE

test=# insert into a select random() * 1000000 from generate_series(1,3);
INSERT 0 3

test=# select * from a;
         a         
-------------------
 287421.6944910590
 140297.9311533270
 887215.3805568810
(3 rows)

test=# \pset numericlocale
Showing locale-adjusted numeric output.

test=# select * from a;
         a          
--------------------
 287.421,6944910590
 140.297,9311533270
 887.215,3805568810    
(3 rows)


I'm pretty sure the error message is literally true: 0xa0 isn't a valid UTF-8 character.

My home server is running PostgreSQL on Windows XP, SP3. I can do this in psql.

sandbox=# show client_encoding;
 client_encoding
-----------------
 UTF8
(1 row)


sandbox=# show lc_numeric;
  lc_numeric
---------------
 polish_poland
(1 row)


sandbox=# SELECT TO_CHAR(76543210.98, '999G999G990D00');
     to_char
-----------------
   76 543 210,98
(1 row)

I don't get an error message, but I get garbage for the separator. Could this be a code page issue?

As a workaround I use space instead of G in format string

Let's think about this. If you use a space, then on a web page the value might split at the end of a line or at the boundary of a table cell. I'd think a nonbreaking space might be a better choice.

And, in Unicode, a nonbreaking space is 0xa0. In Unicode, not in UTF8. (That is, 0xa0 can't be the first byte of a UTF8 character. See UTF-8 Bit Distribution.)

Another possibility is that your client is expecting one byte order, and the server is giving it a different byte order. Since the numbers are single-byte characters, the byte order wouldn't matter until, well, it mattered. If the client is expecting a big endian MB character, and it got a little endian MB character beginning with 0xa0, I'd expect it to die with the error message you saw. I'm not sure I have a way to test this before I go to work today.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜