开发者

httplib: incomplete read

I have some python code on both the client and server side. I am getting an IncompleteRead exception thrown for what seems to be no good reason. I can navigate to the URL with Firefox without any error message and also WGET it without any odd results.

The server code is:

import random
import hashlib
print "Content-Type: text/html"     
print                              

m = hashlib.md5()
m.update(str(random.random()))
print m.hexdigest()
print

On the client site, I use a relatively straightforward POST approach:

    data = urllib.urlencode({"username": username,
                     "password" : password})
    #POST in the data.
    req = urllib2.Request(url, data)

    response = urllib2.urlopen(req)
    string =  response.read()

And the response.read() throws the error.

Edit: Further information - Adding explicit CRLF emissions does not alter the change. Checking the error log

[Wed Sep 08 10:36:43 2010] [error] [client 192.168.80.1] (104)Connection reset by peer: ap_content_length_filter: apr_bucket_read() failed

The SSL access log shows(mildly r开发者_StackOverflowedacted):

192.168.80.1 - - [08/Sep/2010:10:38:02 -0700] "POST /serverfile.py HTTP/1.1" 200 1357 "-" "Python-urllib/2.7"


Does terminating the lines with \r\n make any difference? Something like this:

import random
import hashlib
import sys

sys.stdout.write("Content-Type: text/html\r\n\r\n")

m = hashlib.md5()
m.update(str(random.random()))
print m.hexdigest()
print


The problem is a bug in Apache.

Apache throws this particular kind of error when the receiving script does not consume all of the POST request.

Apache developers consider this to be an "As-designed" design.

The fix is to have something like this as soon as possible:

workaround = cgi.FieldStorage()


I got this error when I had failed to completely read the previous response, e.g.:

# This is using an opener from urllib2, but I am guessing similar...
response1 = opener.open(url1)
for line in response1:
    m = re.match("href='(.*)'", line):
    if m:
        url2 = m.group(1) # Grab the URL from line, that's all I want.
        break             # Oops.  Apache is mad because I suck.

response2 = opener.open(url2)
for line in response2:
    print line

The server gave me "200 OK" on the first request, followed by the data up to the link I was looking for, then waited five minutes on the second open, then gave me "200 OK" on the second request, followed by all the data for the second request, then gave me IncompleteRead on the first request!

I am reading between the lines that the Paul's original script logged into two sites and got the problem on the second site.

I can see how reading two pages in parallel might be a nice feature. So what can I do to gracefully tell the server "No more, thanks?" I solved this by reading through and ignoring the rest of the first request (only 200K in this case).

If I were allowed to comment rather than answer, I'd ask Paul Nathan,

What is

workaround = cgi.FieldStorage()

, what do you mean by as soon as possible, and how does it help here? Have pity on a beginner.


I'm guessing the original poster was actually running the request twice, succeeding the first time and failing on the second.

I got IncompleteRead (from Apache) when I had failed to completely read the previous response, e.g.:

# This is using an opener from urllib2, but I am guessing similar...
response1 = opener.open(url1)
for line in response1:
    m = re.match("href='(.*)'", line):
    if m:
        url2 = m.group(1) # Grab the URL from line, that's all I want.
        break             # Oops.  Apache is mad because I suck.

response2 = opener.open(url2)
for line in response2:
    print line

The server gave me "200 OK" on the first request, followed by the data up to the link I was looking for, then waited five minutes on the second open, then gave me "200 OK" on the second request, followed by all the data for the second request, then gave me IncompleteRead ! The error happens (for me) in the second for statement, probably when it hits the end of file there.

I can imagine wanting to have two responses open simultaneously for reading. So the question is, how do I finish with a response? Do I have to read all the data even though I don't need it? No, (urllib.urlopen documentation) the response is like a file, just close it, so for my example,

for line in response1:
    m = re.match("href='(.*)'", line):
    if m:
        url2 = m.group(1) # Grab the URL from line, that's all I want.
        break

response1.close()
response2 = opener.open(url2)
...
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜