开发者

Calculate web page size in python

How would I go about calculating the size of a web page (url) using Python. I tried urllib2 and grabbing the content-length header but it wasn't present.

import urllib2
url = 'http://www.google.com/'
r = urllib2.urlopen(url)
#Not sure what to do from h开发者_JS百科ere


When you use urlopen, you are going requesting the whole contents (an HTTP GET request) so looking for the optional content-length header is not all that useful, once you've gone that way (it's OK, saves you some time and memory, but you have imposed avoidable load on the server and network). Still, as the existing answer indicates, the len of the read() of the urlopen's result is the way that will work even if content-length is missing.

Alas, urllib2 does not support the HEAD http method. To try HEAD, you have to use the lower-level module httplib (make a Connection to the server, call its request('HEAD', url) method, call its getresponse to get an HttpResponse object, call the getheader method on the latter to get the content length header... you see why I say the module is lower-level;-). If you're dealing with very large pages, and sensible servers (ones that do set the content length header), this, while messy, could be an important optimization.


Content-Length is optional; use it if it's present, to cut down on bandwidth use, but if the server doesn't send it (or you don't trust it for some reason), you'll have to retrieve the entire resource and calculate its length.

print len(r.read())


Here is how I did it. See the code below.

import urllib2
url = 'http://www.ueseo.org'
r = urllib2.urlopen(url)
print len(r.read())
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜