开发者

Non ascii characters in Python again

How can I successfully add non-ASCII character into Python source code?

For example:

a = "ű" - Hungarian letter开发者_开发百科

--------Sample------
#-*- coding: iso8859_2 -*-
a = "ű"
print "ű"
---------------------
output = √   

Not ű, as desired.


It looks like you are using Python on a US Windows console. I say this because the character you actually received is the cp437-encoded character for the iso8859_2 character you printed. Unfortunately, cp437 doesn't support ű. If you want to see that character you can't use the US Windows console. Use Idle or Pythonwin.

You should also use Unicode strings. When you use coding: iso8859_2 you are declaring "the characters in this source file were encoded to disk using the ISO 8859-2 translation table." If you follow that link you'll find the the value 251 represents ű. If you write that same byte value to a cp437 terminal, you'll get a square root symbol. With a Unicode string, Python will translate the byte 251 into the Unicode character U+0171, which uniquely identifies the character LATIN SMALL LETTER U WITH DOUBLE ACUTE. When printing to a terminal, Python will translate a Unicode character into the terminal encoding if possible, or throw an error instead of writing garbage for an unsupported character.

Example

# coding: iso8859_2
import unicodedata as ud
s = 'ű'
u = u'ű'
print 'ISO 8859-2 value:',ord(s)
print 'Unicode value:   ',ord(u)
print 'Unicode name:    ',ud.name(u)
print 'Unicode name of cp437 value %d: %s' % (ord(s),ud.name(s.decode('cp437')))

print s
print u

Output

ISO 8859-2 value: 251
Unicode value:    U+0171
Unicode name:     LATIN SMALL LETTER U WITH DOUBLE ACUTE
Unicode name of cp437 value 251: SQUARE ROOT
√
Traceback (most recent call last):
  File "C:\ex.py", line 11, in <module>
    print u
  File "C:\Python27\lib\encodings\cp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u0171' in position 0: character maps to <undefined>

So you see the byte string printed the incorrect square root symbol (√) while the Unicode string correctly showed the character is not supported.


You have to use utf-8 instead of iso8859-2 like :

#-*- coding: utf-8 -*-
a = "ű"
print "ű"

Ouput:

ű

In addition, be sure that you have saved your source code file with the right encoding (utf-8 here).


The other answers mention specifying the coding and making sure the source file is encoded with it, which are both correct. In addition, using unadorned quotes as you are only works in Python 3. If you're using Python 2, insert a u in front to make it a Unicode string:

a = u"ű"
print u"ű"

Edit: It's also possible that your console simply isn't displaying the Unicode characters properly; I've run into that. Since your editor appears to display them ok, try redirecting the output to a file and open it with your editor.


Changing the enconding to utf-8 works.

#-*- coding: utf-8 -*-
a = "ű"
print "ű"
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜