is google news example of html scraping
I need to make web app similar to google new开发者_如何转开发s. Do i need to learn html scraping for that or some more techniques
Most of the stuff which Google News shows is all RSS/ATOM . It's way too easy to get the website content through RSS feeds as compared to scraping.
Other than that if you can use Java, then you can scrape html by yourself using the excellent library Goose . It is similar to what Flipboard/Instapaper uses
The easiest solution would be to get the RSS or ATOM feed of the website you are trying to get data from.
Those are well-known formats, and extracting informations from such XML feeds would be much easier than getting it from an HTML page : with RSS/ATOM, you'll just have to parse the XML feed, and extract the tags that contain informations that interest you.
Not sure which language you're working with, but chances are you can find some library that would help you with that.
If the website doesn't export an RSS/ATOM feed... Well, you'll probably have to fallback to HTML scrapping ; good luck with that, as HTML is not quite as well structured as RSS/ATOM : you'll have to find out, for each website, where in the page are the relevant informations.
精彩评论