开发者

Codes are in scraped text instead of unicode characters

I'm using Beautiful Soup for extracting some texts. The program works on the command line, and when I run it it displays codes like í é etc.

How can I correct开发者_StackOverflow中文版 this behavior?


These codes are called HTML/XML character entities.

I haven't used Beautiful Soup before, but according to the documentation it looks like there's an option for converting character entities into Unicode characters: http://www.crummy.com/software/BeautifulSoup/documentation.html#Entity%20Conversion

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜