开发者

certain utf characters do not show up on browsers and fails python script

I generated a SQL script from a C# application on Windows 7. The name entries have utf8 characters. It works find on Windows machine where I use a python script to populate the db. Now the same script fails on Linux platform complaining about those special characters.

Similar things happened when I generated XML file containing utf chars on Windows 7 but fails to show up on browsers (IE, Firefox.).开发者_JAVA百科

I used to generate such scripts on Windows XP and it worked perfect everywhere.


Please give a small example of a script with "utf8 characters" in the "name entries". Are you sure that they are utf8 and not some windows encoding like `cp1252'? What makes you sure? Try this in Python at the command prompt:

... python -c "print repr(open('small_script.sql', 'rb').read())"

The interesting parts of the output are where it uses \xhh (where h is any hex digit) to represent non-ASCII characters e.g. \xc3\xa2 is the UTF-8 encoding of the small a with circumflex accent. Show us a representative sample of such output. Also tell us the exact error message(s) that you get from that sample script.

Update: It appears that you have data encoded in cp1252 or similar (Latin1 aka ISO-8859-1 is as rare as hen's teeth on Windows). To get that into UTF-8 using Python, you'd do fixed_data = data.decode('cp1252').encode('utf8'); I can't help you with C# -- you may like to ask a separate question about that.


Assuming you're using python, make sure you are using Unicode strings.

For example:

s = "Hello world"          # Regular String
u = u"Hello Unicode world" # Unicdoe String

Edit:
Here's an example of reading from a UTF-8 file from the linked site:

import codecs
fileObj = codecs.open( "someFile", "r", "utf-8" )
u = fileObj.read() # Returns a Unicode string from the UTF-8 bytes in the file
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜