Is it possible to take a crawled html file, and get information about how a browser would render it?
Examples of stuff I'd like to do: - process javascript and produce new DOM - be able to provide information about DOM objects as rendered (e.g. position, si开发者_Python百科ze)
Edit: My main concern is if a page contains a large, central flash object (typically a movie or game).
I guess the only way to do this is to pipe the HTML through a rendering engine, either a real one (like WebKit or Gecko) or something feature-complete enough for your purposes, and then query the resulting DOM about how it looks. Maybe take a look at projects like webkit2png for inspiration.
精彩评论