开发者

using grep to capture javascript links

When using wget to create static copies of my site however there are several elements which require external assets that are pulled in via javascript. The pattern of the script should be fairly constant and no urls are dynamically created. The urls I need to extract look like :

oncl开发者_高级运维ick="return ns.homepage.load({e:this, src:'https://mysub.mydomain.tld/somedir/content/123456789.html'})"

I'd like to output the list of these urls to a local file so I can wget them as well.


use perl + HTML::TreeBuilder to pull your side code and then parse it.

You may have to do some regex work, i.e this module may only get you as far as slurping the 'onclick()' event - but it shouldn't be too bad to get the rest.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜