开发者

Funny characters in my db

My web app is breaking when I try edit a certain content type and I'm pretty sure it is开发者_StackOverflow社区 because of some weird characters in my database. So when I do:

SELECT body FROM message WHERE id = 666

it returns:

<p>⢠<span></span></p><p><br /></p><p><em><strong>NOTE:</strong> Please remember to use your to participate in the discussion.</em></p>

However when I try to count how many documents have those characters postgres complains:

foo_450_prod=# SELECT COUNT(*) FROM message WHERE body LIKE'%â¢%';

ERROR:  invalid byte sequence for encoding "UTF8": 0xe2a225
HINT:  This error can also happen if the byte sequence does not match the encodi

Does anybody know what the issue is and how I can query for those funny characters?

Thanks in advance!


It appears that your SELECT statement is being interpreted as ISO-8859-1 or windows-1252. In those encodings, 'â' == 0xE2, '¢' == 0xA2, and '%' == 0x25, which explains the 0xe2a225 byte sequence mentioned in the error message.

What's hard to figure out is why your first SELECT is returning an ⢠to begin with. It's an unlikely character combination to use on purpose, but it's also not a typical case of UTF-8/windows-1252 mojibake because E2 A2 isn't valid UTF-8. It could be the first 2 bytes of a character, but that character would be a Braille dot pattern (U+2880 to U+28BF), which doesn't make sense there either.


there's already a long way between your DB and printing some data from it in your webpage : your DB encoding may be ok, but you're probably trying here to print something originally in UTF-8 in ISO-8859-1 (and not "funny" characters). do you have something like :

<meta content="text/html; charset=UTF-8" http-equiv="content-type" />

in the <head> tag of your HTML page?

also, are you setting SET NAMES 'utf8' when connecting to your DB?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜