How do I collect data from a website that uses AJAX, with Perl?
This might seem a bit backwards, but I want to use Perl (and Curl if possible) to get data from a site that is using Ajax to fill an HTML shell with information. How do I make these Javascript calls to get the data I need?
The website is here: http://www.jigsaw.com/showContactUpdateTab.xhtml?开发者_如何学PythoncompanyId=224230
Remember that AJAX calls are ordinary HTTP requests, so you always should be able to perform them.
Open Firebug or Web Inspector on the website you're talking about, you'll see some XHR calls:
XHR finished loading: "http://www.jigsaw.com/dwr/interface/UserActionAPI.js". "http://www.jigsaw.com/dwr/call/plaincall/UserActionAPI.getMostPurchasedContacts.dwr". "http://www.jigsaw.com/dwr/call/plaincall/UserActionAPI.getRecentlyGraveyardedContacts.dwr "http://www.jigsaw.com/dwr/call/plaincall/UserActionAPI.getRecentlyAddedContacts.dwr". "http://www.jigsaw.com/dwr/call/plaincall/UserActionAPI.getRecentlyTitleChangedContacts.dwr"
Yay! Now you know where to get that data. Their scripts use POST
HTTP request to the URLs above, so if you open them in your browser, you'll see various engine errors.
When you sniff (via Web Inspector debugger, for example) their AJAX POST requests, you'll see the next body:
"callCount=1 page=/showContactUpdateTab.xhtml?companyId=224230 httpSessionId=F5E7EC4A45DFCE87B969A9F4FA06C361 scriptSessionId=D020EFF4333283B907402687182D03E034 c0-scriptName=UserActionAPI c0-methodName=getRecentlyGraveyardedContacts c0-id=0 c0-param0=number:224230 c0-param1=boolean:false c0-param2=boolean:false batchId=1 "
I'm pretty sure, they're generating a bunch of security session IDs to avoid data miners. You may need to dive into their JavaScript codes to learn more about those generators.
Some applications have code in place to check that the client is a real AJAX client. They simply the check for the presence of the header X-Requested-With: XMLHttpRequest
. So it's easy to circumvent:
curl -H 'X-Requested-With: XMLHttpRequest' ...
use HTTP::Request::Common;
GET $url, 'X-Requested-With' => 'XMLHttpRequest', ...
Of course, you might have to deal with the usual stuff, like required cookies (for the session), nonce parameters, the occasional complexity. Firebug or the like for other browsers will help you reverse-engineer the required headers and parameters.
精彩评论