php-curl script that saves images---actually captcha images
I have a curl class, called Curl.
Let's presume i have this code:
$url = 'http://www.google.com'
$fields = array('q'=>'search term'); //maybe some other arguments. but let's keep it simple.
$curl = new Curl();
$page = $curl->post($url,$fields);
$page will have some images wich curl doesn't load them by default. I need to know how i can save a specific image without using curl. Once I use $page = $curl->post(..) I need to know how I can have that image saved without using another $curl->post(_image_location_) to get that file.
The reason why need this is to save a captcha image from a form. I need to access the form and get that specific image that's being loaded. If i try to access the URL of the image, i开发者_如何学Pythont will be a different captcha image.
What you are describing isn't possible. For every external resource inside a web page (ie anything that's not part of the HTML content itself, such as images, scripts, stylesheets, etc), you have to make a separate request to retrieve it. This is how all browsers operate.
Many captchas work on a session basis. You initial request to the HTML page is likely creating a session cookie which would be sent back as part of the response headers. This cookie will be expected when the image is requested. If you just do a plain curl request for the image, you won't be sending that cookie, and thus you'll get a different image.
You will have to analyze the page and determine exactly what kind of session management is going on, and modify your Curl request appropriately, but as I mentioned, I suspect it'll be cookie based. You'll probably want to look at the CURLOPT_COOKIEJAR curl_setopt() parameter to get things started. You can google for pretty straightforward examples as well.
This is the entiree Class, if you have questions i can explain you better.
<?php
/**
* Description of class-curl
*
* @author NEO
*/
class cURL {
public $headers;
public $user_agent;
public $compression;
public $cookie_file;
public $proxy;
public $process;
public $url;
public $hash;
public $content;
public function __construct($url) {
$this->url = $url;
$this->process = curl_init($this->url);
$cookiename = uniqid('cookie_') . '.txt';
$this->cURL($cookies = TRUE, $cookiename);
}
public function cURL($cookies = TRUE, $cookie = 'cookie.txt', $compression = 'gzip', $proxy = '') {
$this->headers[] = 'Accept: image/gif, image/x-bitmap, image/jpeg, image/pjpeg';
$this->headers[] = 'Connection: Keep-Alive';
$this->headers[] = 'Content-type: application/x-www-form-urlencoded;charset=UTF-8';
$this->user_agent = 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; Media Center PC 4.0)';
$this->compression = $compression;
$this->proxy = $proxy;
$this->cookies = $cookies;
if ($this->cookies == TRUE)
$this->cookie($cookie);
}
public function cookie($cookie_file) {
if (file_exists($cookie_file)) {
$this->cookie_file = $cookie_file;
} else {
fopen($cookie_file, 'w') or $this->error('The cookie file could not be opened. Make sure this directory has the correct permissions');
$this->cookie_file = $cookie_file;
@fclose($this->cookie_file);
}
}
//Capturar todo el codigo fuente de la web solicitada
public function get() {
curl_setopt($this->process, CURLOPT_HTTPHEADER, $this->headers);
curl_setopt($this->process, CURLOPT_HEADER, 0);
curl_setopt($this->process, CURLOPT_USERAGENT, $this->user_agent);
if ($this->cookies == TRUE) {
curl_setopt($this->process, CURLOPT_COOKIEFILE, $this->cookie_file);
curl_setopt($this->process, CURLOPT_COOKIEJAR, $this->cookie_file);
}
curl_setopt($this->process, CURLOPT_ENCODING, $this->compression);
curl_setopt($this->process, CURLOPT_TIMEOUT, 90);
if ($this->proxy)
curl_setopt($this->process, CURLOPT_PROXY, $this->proxy);
curl_setopt($this->process, CURLOPT_RETURNTRANSFER, 1);
//curl_setopt($this->process, CURLOPT_FOLLOWLOCATION, 1);
$return = curl_exec($this->process);
//curl_close($this->process);
return $return;
}
public function post($data) {
curl_setopt($this->process, CURLOPT_HTTPHEADER, $this->headers);
curl_setopt($this->process, CURLOPT_HEADER, 1);
curl_setopt($this->process, CURLOPT_USERAGENT, $this->user_agent);
if ($this->cookies == TRUE)
curl_setopt($this->process, CURLOPT_COOKIEFILE, $this->cookie_file);
if ($this->cookies == TRUE)
curl_setopt($this->process, CURLOPT_COOKIEJAR, $this->cookie_file);
curl_setopt($this->process, CURLOPT_ENCODING, $this->compression);
curl_setopt($this->process, CURLOPT_TIMEOUT, 30);
if ($this->proxy)
curl_setopt($this->process, CURLOPT_PROXY, $this->proxy);
curl_setopt($this->process, CURLOPT_POSTFIELDS, $data);
curl_setopt($this->process, CURLOPT_RETURNTRANSFER, 1);
//curl_setopt($this->process, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($this->process, CURLOPT_POST, 1);
$return = curl_exec($this->process);
//curl_close($this->process);
return $return;
}
public function error($error) {
echo "<center><div style='width:500px;border: 3px solid #FFEEFF; padding: 3px; background-color: #FFDDFF;font-family: verdana; font-size: 10px'><b>cURL Error</b><br>$error</div></center>";
die;
}
public function grab_image() {
//obener una imagen desde la url especificada, se puede mejorar para que
//se le de una url y me mande todas las imagenes que encuentre
curl_setopt($this->process, CURLOPT_HEADER, 0);
curl_setopt($this->process, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($this->process, CURLOPT_BINARYTRANSFER, 1);
$raw = curl_exec($this->process);
$name = explode("/", $this->url);
$name = array_pop($name);
if (file_exists($name)) {
unlink($name);
}
$fp = fopen($name, 'x');
fwrite($fp, $raw);
fclose($fp);
return $name;
//return $raw;
}
public function cURLclose() {
unlink($this->cookie_file);
curl_close($this->process);
unset($this);
}
public function saveCaptcha($source) {
preg_match('/ipt" src="(h[^"]+)/', $source, $result);
$captcha = $this->get($result[1]);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $result[1]);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$captcha = curl_exec($ch);
curl_close($ch);
$hash = explode("challenge :", $captcha);
$hash1 = explode("'", $hash[1]);
$cont = $hash1[1];
$img = 'http://www.google.com/recaptcha/api/image?c=' . $cont;
$ch = curl_init($img);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_BINARYTRANSFER,1);
$rawdata=curl_exec($ch);
curl_close($ch);
$name = uniqid('captcha_');
$fp = fopen("$name.jpg",'w');
fwrite($fp, $rawdata);
fclose($fp);
//ese cont hay que guardarlo en BD y generar otro para la imagen
//$picturename = $this->grab_image1($img);
$picturename = $name.".jpg";
$picture = array('name' => $picturename, 'hash' => $cont);
return $picture;
}
}
?>
So you only make this call to Curl Class:
include 'class-Curl.php';
//Pedir un nuevo captcha con una nueva cookie
$url = 'http://lok.myvnc.com/insertar-anuncio.html';
//Crear el objeto Curl
$captcha = new cURL($url);
//Capturar el codigo funte de la pagina
$source = $captcha->get();
//Parsear el codigo javascripts del captcha y bajarla al disco
$captchaimg = $captcha->saveCaptcha($source);
//Guardar en Base de Datos las variables ID, picturename, picturehash, cookie
var_dump($captchaimg);
?>
<IMG src="<?php echo $_SERVER['DOCUMENT_ROOT']."/sms/".$captchaimg['name'] ?>">
精彩评论