开发者

Encoding error PostgreSQL 8.4

I am importing data from a CSV file. One of the fields has an accent(Telefónica O2 UK Limited). The application throws en error while inserting the data to the table.

PGError: ERROR:  invalid byte sequence for encoding "UTF8": 0xf36e6963
HINT:  This error can also happen if the byte sequence does not match the 
encoding expected by the server, which is controlled by "client_encoding".
: INSERT INTO "companies" ("name", "validated") 
    VALUES(E'Telef?nica O2 UK Limited', 't')

The data entry through the forms works when I enter names with accents and umlaut. How do I workaround this issue?

Edit

I addressed the issue by converting the file encoding. I uploaded the CSV file to Google docs and开发者_StackOverflow exported the file to CSV.


The error message is pretty clear: Your client_encoding setting is set to UTF8 and you try to insert a character which isn't encoded in UTF8 (if it's a CSV from MS Excel, your file is probably encoded in Windows-1252 instead).

You could either convert it in your application or you can alter your PostgreSQL connection to match the encoding you want to insert (thus enabling PostgreSQL to do the conversion for you). You can do so by executing SET CLIENT_ENCODING TO 'WIN1252'; on your PostgreSQL connection before trying to insert that data. After the import you should reset it to its original value with RESET CLIENT_ENCODING;

HTH!


I think you can try to use the Ruby gem rchardet, which may be a better solution. Example code:

require ‘rchardet’

cd = CharDet.detect(string_of_unknown_encoding)
encoding = cd['encoding']
converted_string = Iconv.conv(‘UTF-8′, encoding, str_of_unknown_encoding)

Here are some related links:

https://github.com/jmhodges/rchardet

http://www.meeho.net/blog/2010/03/ruby-how-to-detect-the-encoding-of-a-string/

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜