开发者

representation of web page as browser see it

I have some ideas of how to build a more intelligent web spider, which interacts with a web page and extracts information in a manner more similar to how us humans do.

To do this I need a representation of a web page 开发者_如何转开发that is similar or identical to that we see in our browsers

In other words I need access to the data concerning the location, colour and style of all the elements on the page, possibly at a pixel level.

But I don't want just a rendered bitmap, I want to be able to extract text, click links and push buttons and so on

I get the feeling the DOM model may be a starting point but more concrete advice would be appreciated

To clarify, I want to programmatically obtain access to web pages in a form similar to that presented to us by a browser, but for example to check the colour or text at a specific pixel location or region.


You might want to check out Selenium (or other ways of scripting your browser, such as greasemonkey). Since how a web page is displayed depends quite a bit on the particular browser, scripting one is obviously the most precise way of getting to what the user sees.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜