How can I guess the charset of an html document?
Some malformed and incomplete HTML pages have no charset information assigned to them, and I have to figure out how to display them. Since there are dozens of encoding systems,开发者_C百科 I wonder if there is an algorithm I can use to correctly perform this task. Is there such thing?
Thanks!
Try jchardet or chsdet. Character set detection is probabilistic so it may go wrong in some cases, I have used jchardet with success few years back.
精彩评论