how can i block some parts of an url while using php cURL?

2023-02-08 18:36 问答作者：

because of bandwith problem, i'd like to block all images while using cURL on a remote url. Let me give a quick example: A page has 200 jpg images, i wanna get that page without 200 images with cUR开发者_如何学JAVAL.

When cURLing a URL, you're only receiving what is at that URL, which is likely just an HTML document.
cURL does not automatically download all 200 images that are referred to in the HTML document, because cURL does not care about HTML. Quite the contrary; if you wanted to download all 200 images, you'd have to parse the HTML by hand and make further cURL request for each individual image.

Example from the command line:

$ curl -i www.w3.org
HTTP/1.1 200 OK
Date: Mon, 07 Feb 2011 02:46:36 GMT
Server: Apache/2
Content-Location: Home.html
Vary: negotiate,accept,Accept-Encoding
TCN: choice
Last-Modified: Tue, 01 Feb 2011 20:42:28 GMT
ETag: "74f2-49b3e92157500;89-3f26bd17a2f00"
Accept-Ranges: bytes
Content-Length: 29938
Cache-Control: max-age=600
Expires: Mon, 07 Feb 2011 02:56:36 GMT
P3P: policyref="http://www.w3.org/2001/05/P3P/p3p.xml"
Connection: close
Content-Type: text/html; charset=utf-8

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<!-- Generated from data/head-home.php, ../../smarty/{head.tpl} -->
<head>
<title>World Wide Web Consortium (W3C)</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<link rel="Help" href="/Help/" />
<link rel="stylesheet" href="/2008/site/css/minimum" type="text/css" media="handheld, all" />
<style type="text/css" media="print, screen and (min-width: 481px)">
/*<![CDATA[*/
@import url("/2008/site/css/advanced");
/*]]>*/
</style>
<link href="/2008/site/css/minimum" rel="stylesheet" type="text/css" media="handheld, only screen and (max-device-width: 480px)" />
<meta name="viewport" content="width=device-width" />
<link rel="stylesheet" href="/2008/site/css/print" type="text/css" media="print" />
<link rel="shortcut icon" href="/2008/site/images/favicon.ico" type="image/x-icon" />
<meta name="description" content="The World Wide Web Consortium (W3C) is an international community where Member organizations, a full-time staff, and the public work together to develop Web standards." />
<link rel="alternate" type="application/atom+xml" title="W3C News" href="/News/atom.xml" />
</head>
<body id="www-w3-org" class="w3c_public w3c_home">
<div id="w3c_container">
<!-- Generated from data/mast-home.php, ../../smarty/{mast.tpl} -->
<div id="w3c_mast"><!-- #w3c_mast / Page top header -->
<h1 class="logo"><a tabindex="2" accesskey="1" href="/"><img src="/2008/site/images/logo-w3c-mobile-lg" width="90" height="53" alt="W3C" /></a> <span class="alt-logo">W3C</span></h1>
<div id="w3c_nav">

...

That's all a cURL request gets. There's one image in there: <img src="/2008/site/images/logo-w3c-mobile-lg" width="90" height="53" alt="W3C" />. That's all you're getting of it, you're not getting the image itself.

You can't get it without the images but you can strip them from the results easy enough with regex or a dom parser...but with curl, you aren't actually making a request for the images, just the html on the page (so you'd be stripping out the tags)

继续阅读：curl php

how can i block some parts of an url while using php cURL?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？