开发者

Html element position in Python

I'm using lxml.html for some html parsing in python. I'd like to get a rough estimate of the location of elements within the page after it would be rendered by a browser. It does not have to be exact, but generally correct. For simplicity I will ignore the effects of Javascript on element location. As an end r开发者_高级运维esult, I would like to be able to iterate over the elements (e.g., via lxml) and find their x/y coordinates. Any thoughts on how to do this? I don't need to stay with lxml and am happy to try other libraries.


PyQt with webkit:

import sys
from PyQt4.QtCore import *
from PyQt4.QtGui import *
from PyQt4.QtWebKit import *

class MyWebView(QWebView):
    def __init__(self):
        QWebView.__init__(self)
        QObject.connect(self,SIGNAL('loadFinished(bool)'),self.showelements)

    def showelements(self):
        html=self.page().currentFrame().documentElement()
        for link in html.findAll('a'):
            print(link.toInnerXml(),str(link.geometry())[18:])


if __name__=='__main__':
    app = QApplication(sys.argv)

    web = MyWebView()
    web.load(QUrl("http://www.google.com"))
    web.show()

    sys.exit(app.exec_())


As stated by Sven, you need an HTML rendering engine. A question on rendering HTML was asked before, you could refer to that.

Python library for rendering HTML and javascript

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜