开发者

I want to query whitepages.com 4,000 times, how to save the results?

I have an old customer list of 4,000 businesses. I want to determine if the phone numbers associated with each listing are still working (and therefore the business is probably still open). I can put each number in whitepages.com and check them one by one... but want to automate the results. I have looked at their开发者_开发问答 API and can't digest it. I can form the correct query URL, but trying things like cURL -O doesn't work.

I have access to Mac tools, Unix tools, and could try various javascript stuff if anyone could point me in the right direction... would even pay. Help?

Thx


As per Pekka's comment, most companies with a public API don't allow scraping in their terms of service, so it's quite possible that performing 4k GET requests to their website will flag you as a malicious user and get you blacklisted!

Their API is RESTful and seems simple and pretty well documented, definitely try to get that working instead of going the other way. A good first attempt after getting your API key would be to write a UNIX script to perform a reverse phone number lookup. For example, suppose you had all 4000 10-digit phone numbers in a flat text file, one per line with no formatting, you could write a simple bash script as follows:

#!/bin/bash
INPUT_FILE=phone_numbers.txt 
OUTPUT_DIR=output 
API_KEY='MyWhitePages.comApiKey' 
BASE_URL='http://api.whitepages.com' 

# Perform a reverse lookup on each phone number in the input file. 
for PHONE in $(cat $INPUT_FILE); do 
  URL="${BASE_URL}/reverse_phone/1.0/?phone=${PHONE};api_key=${API_KEY}" 
  curl $URL > "${OUTPUT}/result-${PHONE}.xml"
done 

Once you've retrieved all the results you can either parse the XML to analyze the matching businesses, or if you're just interested in existence you could simply grep each output file for the string The search did not find results which, from the WhitePages.com API, indicates no match. If the grep succeeds then the business doesn't exist (or changed its phone number), otherwise it's probably still around (or another business exists with that phone number).


As others have noted, it is a tos violation to scrape our website or to store the data returned from the api. However, you can get the data you want from our pro service at: https://pro.whitepages.com/list-update/upload_file

Dan
Whitepages API lead.


you can scrape the website. they have limits if you keep coming from the same ip, plus captcha. it's easy enough to get around if you know what you're doing. also, while it might violate the TOS, it's certainly not illegal. You can't copyright phone numbers and addresses says the law, so you don't have much to worry about.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜