开发者

urllib2 not retrieving entire HTTP response

I'm perplexed as to why I'm not able to download the entire contents of some JSON responses from FriendFeed using urllib2.

>>> import urllib2
>>> stream = urllib2.urlopen('http://friendfeed.com/api/room/the-life-scientists/profile?format=json')
>>> stream.headers['content-length']
'168928'
>>>开发者_开发知识库; data = stream.read()
>>> len(data)
61058
>>> # We can see here that I did not retrieve the full JSON
... # given that the stream doesn't end with a closing }
... 
>>> data[-40:]
'ce2-003048343a40","name":"Vincent Racani'

How can I retrieve the full response with urllib2?


Best way to get all of the data:

fp = urllib2.urlopen("http://www.example.com/index.cfm")

response = ""
while 1:
    data = fp.read()
    if not data:         # This might need to be    if data == "":   -- can't remember
        break
    response += data

print response

The reason is that .read() isn't guaranteed to return the entire response, given the nature of sockets. I thought this was discussed in the documentation (maybe urllib) but I cannot find it.


Use tcpdump (or something like it) to monitor the actual network interactions - then you can analyze why the site is broken for some client libraries. Ensure that you repeat multiple times by scripting the test, so you can see if the problem is consistent:

import urllib2
url = 'http://friendfeed.com/api/room/friendfeed-feedback/profile?format=json'
stream = urllib2.urlopen(url)
expected = int(stream.headers['content-length'])
data = stream.read()
datalen = len(data)
print expected, datalen, expected == datalen

The site's working consistently for me so I can't give examples of finding failures :)


Keep calling stream.read() until it's done...

while data = stream.read() :
    ... do stuff with data


readlines() 

also works

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜