开发者

How to make unicode string with python3

I used this :

u = unicode(text, 'utf-8')

But getting error with Python 3 (or... maybe I just forgot to include something) : 开发者_如何转开发

NameError: global name 'unicode' is not defined

Thank you.


Literal strings are unicode by default in Python3.

Assuming that text is a bytes object, just use text.decode('utf-8')

unicode of Python2 is equivalent to str in Python3, so you can also write:

str(text, 'utf-8')

if you prefer.


What's new in Python 3.0 says:

All text is Unicode; however encoded Unicode is represented as binary data

If you want to ensure you are outputting utf-8, here's an example from this page on unicode in 3.0:

b'\x80abc'.decode("utf-8", "strict")


As a workaround, I've been using this:

# Fix Python 2.x.
try:
    UNICODE_EXISTS = bool(type(unicode))
except NameError:
    unicode = lambda s: str(s)


This how I solved my problem to convert chars like \uFE0F, \u000A, etc. And also emojis that encoded with 16 bytes.

example = 'raw vegan chocolate cocoa pie w chocolate & vanilla cream\\uD83D\\uDE0D\\uD83D\\uDE0D\\u2764\\uFE0F Present Moment Caf\\u00E8 in St.Augustine\\u2764\\uFE0F\\u2764\\uFE0F '
import codecs
new_str = codecs.unicode_escape_decode(example)[0]
print(new_str)
>>> 'raw vegan chocolate cocoa pie w chocolate & vanilla cream\ud83d\ude0d\ud83d\ude0d❤️ Present Moment Cafè in St.Augustine❤️❤️ '
new_new_str = new_str.encode('utf-16', errors='surrogatepass').decode('utf-16')
print(new_new_str)
>>> 'raw vegan chocolate cocoa pie w chocolate & vanilla cream
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜