开发者

How to get just the content of a post from a blog?

I have just the url of a post, like http://www.avc.co开发者_开发问答m/a_vc/2011/08/html5-continued.html , is ther any way of get the content of this post? I mean, exclude menus, logos and advertisements.

Thank you very much!


If you want to scrape the site, first consider whether this is legal.

Then, you can do that be getting the innerHTML (or with jQuery - the .html()) of the appropriate element. In your case this is disqus_post_message

As @bensiu noted it would be easier to use the RSS feed.

Since you tagged Java, here are the libraries that can be useful:

  • HtmlParser for parsing the html
  • Rome for RSS
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜