Codes are in scraped text instead of unicode characters
I'm using Beautiful Soup for extracting some texts. The program works on the command line, and when I run it it displays codes like í é
etc.
How can I correct开发者_StackOverflow中文版 this behavior?
These codes are called HTML/XML character entities.
I haven't used Beautiful Soup before, but according to the documentation it looks like there's an option for converting character entities into Unicode characters: http://www.crummy.com/software/BeautifulSoup/documentation.html#Entity%20Conversion
精彩评论