Pyquery invalidates html code
I was using pyquery to construct a webpage:
> page = PyQuery('<html><head><script type="text/javascript" src="jquery-1.4.min.js"></script><script type="text/javascript" src="tools.min.js"></script></head><body></body></html>')
> print page
Output: <html><head><script type="text/javascript" src="jquery-1.4.min.js"/><script type="text/javascript" src="tools.min.js"/></head><body/></html>
The script (and body) tags aren't supposed end like that though. Firefox ignores the rest of the header.
I tried breaking the above up into single elements (ie adding one script tag at a time), but to no avail:
> page = PyQuery('<html><head></head></html>')
> page.find('head').append('<script type="text/javascript" src="jquery-1.4.min.js"/></script>')
> page.find('head').append('<script type="text/javascript" src="tools.min.js"></script>')
Output: <html><head><script type="text/javascript" src="jquery-1.4.min.js"/><script type="text/javascript" src="tools.min.js"/></head><body/></html>
The same thing happens with <iframe/>
tags (forced to use these due to youtube), they don't get closed by firefox and all proceeding code is ignored.
How can I force pyquery to close these using a separate close tag, as I believe, is according to html standards.
Oh and if anyone's wondering, I'm not doing it all in beautifulsoup because (1) I get beautifulsoup errors and (2) it's a deprecated packa开发者_如何学JAVAge, the author stopped supporting it about a year or two ago.
Try:
page = PyQuery('<html><head><script type="text/javascript" src="jquery-1.4.min.js">\n</script><script type="text/javascript" src="tools.min.js">\n</script></head><body></body></html>')
It also works with iframe.
You should use print page.__html__()
to dump a html or, better, print page.html(method='html')
精彩评论