I want to query whitepages.com 4,000 times, how to save the results?
I have an old customer list of 4,000 businesses. I want to determine if the phone numbers associated with each listing are still working (and therefore the business is probably still open). I can put each number in whitepages.com and check them one by one... but want to automate the results. I have looked at their开发者_开发问答 API and can't digest it. I can form the correct query URL, but trying things like cURL -O doesn't work.
I have access to Mac tools, Unix tools, and could try various javascript stuff if anyone could point me in the right direction... would even pay. Help?
Thx
As per Pekka's comment, most companies with a public API don't allow scraping in their terms of service, so it's quite possible that performing 4k GET requests to their website will flag you as a malicious user and get you blacklisted!
Their API is RESTful and seems simple and pretty well documented, definitely try to get that working instead of going the other way. A good first attempt after getting your API key would be to write a UNIX script to perform a reverse phone number lookup. For example, suppose you had all 4000 10-digit phone numbers in a flat text file, one per line with no formatting, you could write a simple bash script as follows:
#!/bin/bash
INPUT_FILE=phone_numbers.txt
OUTPUT_DIR=output
API_KEY='MyWhitePages.comApiKey'
BASE_URL='http://api.whitepages.com'
# Perform a reverse lookup on each phone number in the input file.
for PHONE in $(cat $INPUT_FILE); do
URL="${BASE_URL}/reverse_phone/1.0/?phone=${PHONE};api_key=${API_KEY}"
curl $URL > "${OUTPUT}/result-${PHONE}.xml"
done
Once you've retrieved all the results you can either parse the XML to analyze the matching businesses, or if you're just interested in existence you could simply grep each output file for the string The search did not find results
which, from the WhitePages.com API, indicates no match. If the grep succeeds then the business doesn't exist (or changed its phone number), otherwise it's probably still around (or another business exists with that phone number).
As others have noted, it is a tos violation to scrape our website or to store the data returned from the api. However, you can get the data you want from our pro service at: https://pro.whitepages.com/list-update/upload_file
Dan
Whitepages API lead.
you can scrape the website. they have limits if you keep coming from the same ip, plus captcha. it's easy enough to get around if you know what you're doing. also, while it might violate the TOS, it's certainly not illegal. You can't copyright phone numbers and addresses says the law, so you don't have much to worry about.
精彩评论