Is using an API always preferable to scraping? [closed]
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
开发者_如何学Go Improve this questionI am looking for a large amount (>100k at least) of data from web 2.0 sites for a research project. I am thinking of using the exposed API to get the data, but would scrapping work better in this case?
The API is good (less work compared to writting a scraper), but I really have no idea how much time I need to collect that much data, considering there is usually a time/call limit. I'm not saying there is no limit in scraping though; just that I am curious which is a better way of getting the job done.
If the site provides an API, then use it.
It's much simpler, generic, and legal. If the site is kind of popular, you often find wrappers for the language you're using.
Of course, if you develop a scraper, you won't have limitations, but maybe the site doesn't allow being scraped, and that's exactly why they have an API for users/developers.
About Jeffrey04 comment:
Let's see... this is a moral thing. If you want, you can obtain that amount of data several times without being blocked. You can always change User-Agents, change IP after N requests (of course, all of this programatically), and do some tricks with Cookies, but that's not the idea. What I mean is that the advice of not using website scraping is not because of getting banned from the website.
Whenever you can, use APIs. It's just nicer. However, there are certainly cases when you are forced to use scraping. The API might be throttled to few requests a day. But before you do that be respectful to developers, explain what you are trying to do and maybe they will put rules in to help with your project. If you are doing something for long term, definitely talk with the developers and at least make a deal so that you don't get throttled.
If there's an API, use it. Scraping (not scrapping) often seems like a good idea at first, but is a nightmare to maintain.
精彩评论