开发者

Web Scraper via Web Service API?

How would I go about doing the following...

I want to build a web service for my application to grab a piece of data from an external website, that requires the user to login. The website has no public API , hence the reason for the scraper.

Is there a library to perform the following functions? or what do I do?

  • automate fill-in form, auto click
  • Automate submit button
  • check which URL the user has landed on, and redirect user to URL
  • Grab data from label.

EDIT: what im asking for is there a web service, library etc to make it 开发者_Go百科easier to perform screen scraping/automation functions???


Instead of filling a form and virtually clicking buttons, you should look at the source of the form, and figure out how the data is being submitted. In most cases you can simply send a post request with the log in data. If there is something special besides a simple post request, I use this addon to figure out what requests are being done that you can't see. Using C#, I would use the HttpWebRequest class because it handles cookies for you.


If the website does not ban robots, you can use YQL to simulate everything you need. However, it can be a bit difficult or impossible as you basically have to implement a text-only browser within JS.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜