开发者

how can i block some parts of an url while using php cURL?

because of bandwith problem, i'd like to block all images while using cURL on a remote url. Let me give a quick example: A page has 200 jpg images, i wanna get that page without 200 images with cUR开发者_如何学JAVAL.


When cURLing a URL, you're only receiving what is at that URL, which is likely just an HTML document.
cURL does not automatically download all 200 images that are referred to in the HTML document, because cURL does not care about HTML. Quite the contrary; if you wanted to download all 200 images, you'd have to parse the HTML by hand and make further cURL request for each individual image.

Example from the command line:

$ curl -i www.w3.org
HTTP/1.1 200 OK
Date: Mon, 07 Feb 2011 02:46:36 GMT
Server: Apache/2
Content-Location: Home.html
Vary: negotiate,accept,Accept-Encoding
TCN: choice
Last-Modified: Tue, 01 Feb 2011 20:42:28 GMT
ETag: "74f2-49b3e92157500;89-3f26bd17a2f00"
Accept-Ranges: bytes
Content-Length: 29938
Cache-Control: max-age=600
Expires: Mon, 07 Feb 2011 02:56:36 GMT
P3P: policyref="http://www.w3.org/2001/05/P3P/p3p.xml"
Connection: close
Content-Type: text/html; charset=utf-8

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<!-- Generated from data/head-home.php, ../../smarty/{head.tpl} -->
<head>
<title>World Wide Web Consortium (W3C)</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<link rel="Help" href="/Help/" />
<link rel="stylesheet" href="/2008/site/css/minimum" type="text/css" media="handheld, all" />
<style type="text/css" media="print, screen and (min-width: 481px)">
/*<![CDATA[*/
@import url("/2008/site/css/advanced");
/*]]>*/
</style>
<link href="/2008/site/css/minimum" rel="stylesheet" type="text/css" media="handheld, only screen and (max-device-width: 480px)" />
<meta name="viewport" content="width=device-width" />
<link rel="stylesheet" href="/2008/site/css/print" type="text/css" media="print" />
<link rel="shortcut icon" href="/2008/site/images/favicon.ico" type="image/x-icon" />
<meta name="description" content="The World Wide Web Consortium (W3C) is an international community where Member organizations, a full-time staff, and the public work together to develop Web standards." />
<link rel="alternate" type="application/atom+xml" title="W3C News" href="/News/atom.xml" />
</head>
<body id="www-w3-org" class="w3c_public w3c_home">
<div id="w3c_container">
<!-- Generated from data/mast-home.php, ../../smarty/{mast.tpl} -->
<div id="w3c_mast"><!-- #w3c_mast / Page top header -->
<h1 class="logo"><a tabindex="2" accesskey="1" href="/"><img src="/2008/site/images/logo-w3c-mobile-lg" width="90" height="53" alt="W3C" /></a> <span class="alt-logo">W3C</span></h1>
<div id="w3c_nav">

...

That's all a cURL request gets. There's one image in there: <img src="/2008/site/images/logo-w3c-mobile-lg" width="90" height="53" alt="W3C" />. That's all you're getting of it, you're not getting the image itself.


You can't get it without the images but you can strip them from the results easy enough with regex or a dom parser...but with curl, you aren't actually making a request for the images, just the html on the page (so you'd be stripping out the tags)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜