access password protected website programmatically
The login form used on the site is /login.php?action=process and it uses POST. How would I begin to write something, preferably with php that will login with my username and password. Then I will proceed to crawl and get the info that I need.
This is to monitor/update info for a suppliers e-commerce store so开发者_Python百科 that my inventory and pricing stays up to date on my site.
$loginUrl = 'http://www.remote_site.com/login.php?action=process';
$loginFields = array('username' => 'username', 'password' => 'password');
getUrl($loginUrl, 'post', $loginFields);
//now you're logged in and a session cookie was generated
$remote_page_content = getUrl('http://www.remote_site.com/some_page.php');
function getUrl($url, $method='', $vars='') {
$ch = curl_init();
if ($method == 'post') {
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $vars);
}
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookies.txt');
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookies.txt');
$buffer = curl_exec($ch);
curl_close($ch);
return $buffer;
}
From the login-page, I assume the shopsystem is (some sort of) xt:commerce. It has a function to export product information as CSV, so, as vaidas said in the comments, you should try to get that CSV emailed before trying to 'crawl' the site.
精彩评论