开发者

What encoding does the unicode function in BeautifulSoup convert from?

When I use the unicode function in BeautifulSoup - what encoding does it convert to Unicode from? Does it automatically use the soup.originalEncoding?

from BeautifulSoup import BeautifulSoup
doc = "<html><h1>Heading</h1><p>Text"
soup = Be开发者_StackOverflow中文版autifulSoup(doc)
print unicode(soup)

Thanks


unicode() is a Python builtin, not part of BeautifulSoup. See the docs here.

unicode([object[, encoding[, errors]]])

If encoding and/or errors are given, unicode() will decode the object which can either be an 8-bit string or a character buffer using the codec for encoding. The encoding parameter is a string giving the name of an encoding; if the encoding is not known, LookupError is raised. Error handling is done according to errors; this specifies the treatment of characters which are invalid in the input encoding. If errors is 'strict' (the default), a ValueError is raised on errors, while a value of 'ignore' causes errors to be silently ignored, and a value of 'replace' causes the official Unicode replacement character, U+FFFD, to be used to replace input characters which cannot be decoded. See also the codecs module.

If you don't specify the encoding, sys.getdefaultencoding() will be used by default.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜