Non ascii characters in Python again

2023-02-22 02:26 问答作者：

How can I successfully add non-ASCII character into Python source code?

For example:

--------Sample------
#-*- coding: iso8859_2 -*-
a = "ű"
print "ű"
---------------------
output = √

Not ű, as desired.

It looks like you are using Python on a US Windows console. I say this because the character you actually received is the cp437-encoded character for the iso8859_2 character you printed. Unfortunately, cp437 doesn't support ű. If you want to see that character you can't use the US Windows console. Use Idle or Pythonwin.

You should also use Unicode strings. When you use coding: iso8859_2 you are declaring "the characters in this source file were encoded to disk using the ISO 8859-2 translation table." If you follow that link you'll find the the value 251 represents ű. If you write that same byte value to a cp437 terminal, you'll get a square root symbol. With a Unicode string, Python will translate the byte 251 into the Unicode character U+0171, which uniquely identifies the character LATIN SMALL LETTER U WITH DOUBLE ACUTE. When printing to a terminal, Python will translate a Unicode character into the terminal encoding if possible, or throw an error instead of writing garbage for an unsupported character.

Example

# coding: iso8859_2
import unicodedata as ud
s = 'ű'
u = u'ű'
print 'ISO 8859-2 value:',ord(s)
print 'Unicode value:   ',ord(u)
print 'Unicode name:    ',ud.name(u)
print 'Unicode name of cp437 value %d: %s' % (ord(s),ud.name(s.decode('cp437')))

print s
print u

Output

ISO 8859-2 value: 251
Unicode value:    U+0171
Unicode name:     LATIN SMALL LETTER U WITH DOUBLE ACUTE
Unicode name of cp437 value 251: SQUARE ROOT
√
Traceback (most recent call last):
  File "C:\ex.py", line 11, in <module>
    print u
  File "C:\Python27\lib\encodings\cp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u0171' in position 0: character maps to <undefined>

So you see the byte string printed the incorrect square root symbol (√) while the Unicode string correctly showed the character is not supported.

You have to use utf-8 instead of iso8859-2 like :

#-*- coding: utf-8 -*-
a = "ű"
print "ű"

Ouput:

ű

In addition, be sure that you have saved your source code file with the right encoding (utf-8 here).

The other answers mention specifying the coding and making sure the source file is encoded with it, which are both correct. In addition, using unadorned quotes as you are only works in Python 3. If you're using Python 2, insert a u in front to make it a Unicode string:

a = u"ű"
print u"ű"

Edit: It's also possible that your console simply isn't displaying the Unicode characters properly; I've run into that. Since your editor appears to display them ok, try redirecting the output to a file and open it with your editor.

Changing the enconding to utf-8 works.

#-*- coding: utf-8 -*-
a = "ű"
print "ű"

继续阅读：python

Non ascii characters in Python again

Example

Output

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

Example

Output

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？