Html element position in Python
I'm using lxml.html for some html parsing in python. I'd like to get a rough estimate of the location of elements within the page after it would be rendered by a browser. It does not have to be exact, but generally correct. For simplicity I will ignore the effects of Javascript on element location. As an end r开发者_高级运维esult, I would like to be able to iterate over the elements (e.g., via lxml) and find their x/y coordinates. Any thoughts on how to do this? I don't need to stay with lxml and am happy to try other libraries.
PyQt with webkit:
import sys
from PyQt4.QtCore import *
from PyQt4.QtGui import *
from PyQt4.QtWebKit import *
class MyWebView(QWebView):
def __init__(self):
QWebView.__init__(self)
QObject.connect(self,SIGNAL('loadFinished(bool)'),self.showelements)
def showelements(self):
html=self.page().currentFrame().documentElement()
for link in html.findAll('a'):
print(link.toInnerXml(),str(link.geometry())[18:])
if __name__=='__main__':
app = QApplication(sys.argv)
web = MyWebView()
web.load(QUrl("http://www.google.com"))
web.show()
sys.exit(app.exec_())
As stated by Sven, you need an HTML rendering engine. A question on rendering HTML was asked before, you could refer to that.
Python library for rendering HTML and javascript
精彩评论