Parsing html using lxml and html5lib, getting "TypeError: insertDoctype() takes exactly 4 arguments (2 given)"
I'm getting the error TypeError: insertDoctype() takes exactly 4 arguments (2 given)
when using lxml and html5lib together. It seems that the insertDoctype
method in lxml.html._html5builder.TreeBuilder
(link) takes 4 args, while the html5lib code (link) calls it with 2 args. Am I somehow using this wrong?
These are the versions I'm using:
$ pip freeze
BeautifulSoup==3.2.0
distribute==0.6.14
html5lib==0.90
lxml==2.3
mechanize==0.2.4
wsgiref==0.1.2
My source code:
from lxml.html import html5parser
html5parser.document_fromstring('''<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://ww开发者_StackOverflow社区w.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html><head><title>t</title><body></body></html>''')
And the error:
Traceback (most recent call last):
File "/tmp/t.py", line 4, in <module>
<html><head><title>t</title><body></body></html>''')
File "/Users/me/.virtualenvs/myenv/lib/python2.6/site-packages/lxml/html/html5parser.py", line 54, in document_fromstring
return parser.parse(html, useChardet=guess_charset).getroot()
File "/Users/me/.virtualenvs/myenv/lib/python2.6/site-packages/html5lib/html5parser.py", line 211, in parse
parseMeta=parseMeta, useChardet=useChardet)
File "/Users/me/.virtualenvs/myenv/lib/python2.6/site-packages/html5lib/html5parser.py", line 111, in _parse
self.mainLoop()
File "/Users/me/.virtualenvs/myenv/lib/python2.6/site-packages/html5lib/html5parser.py", line 189, in mainLoop
self.phase.processDoctype(token)
File "/Users/me/.virtualenvs/myenv/lib/python2.6/site-packages/html5lib/html5parser.py", line 482, in processDoctype
self.tree.insertDoctype(token)
TypeError: insertDoctype() takes exactly 4 arguments (2 given)
精彩评论