开发者

strange urllib2 failures on some systems

I've got a python script that simply grabs a page with urllib2, and then proceeds to use BeautifulSoup to parse that stuff. Code is:

class Foo(Bar):
    def fetch(self):
        t开发者_C百科ry:
            self.mypage = urllib2.urlopen(self.url + 'MainPage.htm', timeout=30).read()
        except urllib2.URLError:
            sys.stderr.write("Error: system at %s not responding\n" % self.url)
            sys.exit(1)

the system I'm trying to access is remote and behind a linux router that does port forwarding between the public static ip and the lan ip of the actual system.

I was getting failures on some systems and at first I thought about a bug in urllib2/python, or some weird TCP stuff (the http server is actually an embedded card in some industrial system). But then I tried other systems and urllib2 works as expected, and I can also correctly access the http server using links2 or wget even on systems where urllib2 fails.

  • Ubuntu 10.04 LTS 32bit behind Apple Airport nat on remote adsl: everythin works
  • Mac OSX 10.6 in LAN with the server, remote behind nat, etc...: everything works
  • Ubuntu 10.04 LTS 64bit with public ip: urllib2 times out, links and wget work
  • Gentoo Linux with public ip: urllib2 times out, links and wget work

I have verified with tcpdump on the linux router (http server end) and urllib2 always completes the tcp handshake even from the problematic systems, but then it seems to hang there. I tried toggling on/off syncookies and ECN but that didn't change anything.

How could I debug and possibly solve this issue?


You could also switch to using httplib2.


After nearly 17 months I don't have access to that specific system anymore, so I won't be able to accept any real answer to this question.

At least I can tell future readers what answers are not good:

  • changing to httplib2
  • no, we're not getting ICMP redirects
  • no, we don't even drop ICMP fragmentation packets

cheers.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜