开发者

Retrieving a javascript processed Web page

What am I asking for is the ability to download a rendered / processed page via Google C开发者_如何学运维hrome or Firefox I think.

For example, I don't want:

hendry@x201 ~$ w3m -dump http://hello.dabase.com
FAIL

I want:

$ $answer http://hello.dabase.com
Hello World


You should be able to do it using PhantomJS. It is running WebKit without the visuals, but you get the same fast and native supports for JavaScript, HTML/DOM, CSS, SVG, Canvas, and many others.

Disclaimer: I started PhantomJS.


Probably too early, but someone ported V8 to Go-lang, so now you could write your own client that makes use of this powerful combo:

http://bravenewmethod.wordpress.com/2011/03/30/embedding-v8-javascript-engine-and-go/

Looks quite straightforward, doesn't require an ugly Java/Rhino stack and adopts the next big programming language.


It looks similar to the problem http://simile.mit.edu/wiki/Crowbar is trying to solve.


You could use jsdom:- https://github.com/tmpvar/jsdom

I'd build a node driver for it, but it's supposed to work with Rhino etc.


I'd take a look at Rhino.

I'd use the excellent env.js library in conjunction with Rhino to simulate the browser environment as much as is technically possible. Once you've implemented some web spider bootstrap code you should be able to obtain the result you want above.

I'd be interested in other solutions to this though.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜