UTF8 issues on Linux
I have some code that fetches some data from the database, database codepage is UTF8. When I run the code on a linux box, some characters come out as question marks (?) but 开发者_StackOverflow社区when I run the same code on a windows server, all characters appear correctly.
When I do: $> $LANG Following is returned en_SG.UTF-8
en_SG is something that doesn't look correct, it should be en_US but the latter part of the returned string is UTF-8 which is good. Is there anything else that I can look into to fix the character corruption problem?
Generally, ? appears when the font you have does not have a representation for that Unicode codepoint. What are you viewing in and what font are you using?
Can you please provide information about the environment? What programming language are you working with, what library or methods are you using to connect to and pull information from the database, and what library or methods are you using to output the data to file?
I am assuming that both instances of running your code (on Windows and Linux) are accessing the data from the same physical database.
The culprit I would be looking for is that one of your I/O's is converting the Unicode data to some other (probably ASCII or Latin1) codepage.
It could be that the database itself is converting because the database methods are defaulting to a different encoding. It could be that the database methods are converting the incoming information because the language itself is defaulting to a different codepage. It could be that the output methods are converting.
精彩评论