certain utf characters do not show up on browsers and fails python script
I generated a SQL script from a C# application on Windows 7. The name entries have utf8 characters. It works find on Windows machine where I use a python script to populate the db. Now the same script fails on Linux platform complaining about those special characters.
Similar things happened when I generated XML file containing utf chars on Windows 7 but fails to show up on browsers (IE, Firefox.).开发者_JAVA百科
I used to generate such scripts on Windows XP and it worked perfect everywhere.
Please give a small example of a script with "utf8 characters" in the "name entries". Are you sure that they are utf8
and not some windows encoding like `cp1252'? What makes you sure? Try this in Python at the command prompt:
... python -c "print repr(open('small_script.sql', 'rb').read())"
The interesting parts of the output are where it uses \xhh
(where h is any hex digit) to represent non-ASCII characters e.g. \xc3\xa2
is the UTF-8 encoding of the small a with circumflex accent. Show us a representative sample of such output. Also tell us the exact error message(s) that you get from that sample script.
Update: It appears that you have data encoded in cp1252
or similar (Latin1
aka ISO-8859-1
is as rare as hen's teeth on Windows). To get that into UTF-8
using Python, you'd do fixed_data = data.decode('cp1252').encode('utf8')
; I can't help you with C# -- you may like to ask a separate question about that.
Assuming you're using python, make sure you are using Unicode strings.
For example:
s = "Hello world" # Regular String
u = u"Hello Unicode world" # Unicdoe String
Edit:
Here's an example of reading from a UTF-8 file from the linked site:
import codecs
fileObj = codecs.open( "someFile", "r", "utf-8" )
u = fileObj.read() # Returns a Unicode string from the UTF-8 bytes in the file
精彩评论