开发者

Scraping Book Prices

I'm trying to write a scrape app, and I'm running in to problems. My PHP Curl code isn't pulling up the pages with the price of the books. It's returning me to the web root of the domain.

I'm trying to search the site by ISBN.

I've been bashing my head against the wall for days. Any help will be most appreciated!

Code:

<form method="post" for="new-search" name="SearchTerm" class='form-validate' id="SearchTerm" action="index.php">
    <textarea rows="3" name="SearchTerm" id="SearchTerm" cols="40" class="validate-requ开发者_StackOverflow社区ired error"></textarea><div class="error" id="SearchTerm-error">
    <br>                        
    <button class="search primary" type="submit">continue</button>

</form>


<?php

/*
echo("<pre>");print_r($_GET);echo("</pre>");
echo("<pre>");print_r($_POST);echo("</pre>");
*/

$isbn = $_POST['SearchTerm'];


$userAgent = 'User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;rv:1.8.1.16) Gecko/20080702 Firefox/2.0.0.16';

$fields = array(
    'url' => ("http://www.bookleberry.com/Search/SearchKeyword"),
    'qurl' => ("http://www.bookleberry.com/Search/SearchKeyword/" . $_POST['SearchTerm']),
    'SearchTerm' => ($_POST['SearchTerm']),
    'Page' => ('1'),
    'class' => ('textfield validate-required'),
    'for' => ('new-search'),
    'result-count' => ('1'),
    'status' => 'success',
);

$SearchTerm = ($fields['SearchTerm']);
$url = ($fields['url']);
$Page = ($fields['Page']);


echo("<pre>");
print_r($fields);
echo("</pre>");

if ($isbn != NULL){

    //open connection
    $ch = curl_init($url);
    //set the url, number of POST vars, POST data
    curl_setopt($ch, CURLOPT_HEADER, $userAgent);
    curl_setopt($ch, CURLOPT_URL, $url);
        echo "before curl_exec:<br>";
        echo "curl_errno=". curl_errno($ch) ."<br>";
        echo "curl_error=". curl_error($ch) ."<br>";
    curl_setopt($ch,CURLOPT_POST,count($fields));
    curl_setopt($ch, CURLOPT_POST, 1);
    curl_setopt($ch, CURLOPT_POSTFIELDS, "?SearchTerm=$SearchTerm");
    curl_setopt($ch, CURLOPT_HTTPGET, 1);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt($ch, CURLOPT_TIMEOUT, 9999999);
     curl_setopt($ch,CURLOPT_HTTPHEADER,array (
        "Accept: application/json"
    ));




    $info = curl_getinfo($ch);

    //execute post
    $result = curl_exec($ch);
    print $result;


print "<pre>\n";
print_r(curl_getinfo($ch));  // get error info

?>


Don't hurt your head, use it!

  • Install fiddler.
  • Do a request using the browser, look in fiddler to exactly what is posted. This includes all headers, cookies and form variables.
  • Do a post using your code, examine fiddler again
  • Compare the differences between the two and adjust your script.
  • Repeat.

Also it helps to install firebug. Using the copy Xpath, and putting that into a php DOM xpath query makes scraping fun and easy!

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜