开发者

Can we only get the web page header information and not the body? (Mechanize)

What if I only need to download the page if it has not changed since the last download? What is the best way? can I get the size of the page first, then compare the decide if it has changed, if so, I ask for download else skip?

I plan to use 开发者_Python百科(python) mechanize.


the request should be a HEAD, not a GET:

9.4 HEAD

The HEAD method is identical to GET except that the server MUST NOT return a message-body in the response. The metainformation contained in the HTTP headers in response to a HEAD request SHOULD be identical to the information sent in response to a GET request. This method can be used for obtaining metainformation about the entity implied by the request without transferring the entity-body itself. This method is often used for testing hypertext links for validity, accessibility, and recent modification.

The response to a HEAD request MAY be cacheable in the sense that the information contained in the response MAY be used to update a previously cached entity from that resource. If the new field values indicate that the cached entity differs from the current entity (as would be indicated by a change in Content-Length, Content-MD5, ETag or Last-Modified), then the cache MUST treat the cache entry as stale.

See here How can I perform a HEAD request with the mechanize library?


yes you can get more information in python mechanize by setting like this

br = mechanize.Browser()
br.set_debug_http(True)
br.set_debug_redirects(True)
... Your code here ...

by doing this, you can get valuable header information of the page

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜