开发者

How to handle special characters in comments and hard coded strings in python file?

This question aims at the following two scenarios:

  1. You want to add a string with special characters to a variable:

    special_char_string = "äöüáèô"

  2. You want to allow special characters in comments.

    # This a comment with special characters in it: äöà etc.

At the moment I handle this this way:

# -*- encoding: utf-8 -*-
special_char_string = "äöüáèô".decode('utf8')
# This a comment with special characters in it: äö开发者_如何学JAVAà etc.

Works fine.

Is this the recommended way? Or is there a better solution for this?


Python will check the first or second line for an emacs/vim-like encoding specification.

More precisely, the first or second line must match the regular expression "coding[:=]\s*([-\w.]+)". The first group of this expression is then interpreted as encoding name. If the encoding is unknown to Python, an error is raised during compilation.

Source: PEP 263

(A BOM would also make Python interpret the source as UTF-8.

I would recommend, you use this over .decode('utf8')

# -*- encoding: utf-8 -*-
special_char_string = u"äöüáèô"

In any case, special_char_string will then contain a unicode object, no longer a str. As you can see, they're both semantically equivalent:

>>> u"äöüáèô" == "äöüáèô".decode('utf8')
True

And the reverse:

>>> u"äöüáèô".encode('utf8')
'\xc3\xa4\xc3\xb6\xc3\xbc\xc3\xa1\xc3\xa8\xc3\xb4'
>>> "äöüáèô"
'\xc3\xa4\xc3\xb6\xc3\xbc\xc3\xa1\xc3\xa8\xc3\xb4'

There is a technical difference, however: if you use u"something", it will instruct the parser that there is a unicode literal, it should be a bit faster.


Yes, this is the recommended way for Python 2.x, see PEP 0263. In Python 3.x and above, the default encoding is UTF-8 and not ASCII, so you don't need this there. See PEP 3120.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜