strange urllib2 failures on some systems
I've got a python script that simply grabs a page with urllib2, and then proceeds to use BeautifulSoup to parse that stuff. Code is:
class Foo(Bar):
def fetch(self):
t开发者_C百科ry:
self.mypage = urllib2.urlopen(self.url + 'MainPage.htm', timeout=30).read()
except urllib2.URLError:
sys.stderr.write("Error: system at %s not responding\n" % self.url)
sys.exit(1)
the system I'm trying to access is remote and behind a linux router that does port forwarding between the public static ip and the lan ip of the actual system.
I was getting failures on some systems and at first I thought about a bug in urllib2/python, or some weird TCP stuff (the http server is actually an embedded card in some industrial system). But then I tried other systems and urllib2 works as expected, and I can also correctly access the http server using links2 or wget even on systems where urllib2 fails.
- Ubuntu 10.04 LTS 32bit behind Apple Airport nat on remote adsl: everythin works
- Mac OSX 10.6 in LAN with the server, remote behind nat, etc...: everything works
- Ubuntu 10.04 LTS 64bit with public ip: urllib2 times out, links and wget work
- Gentoo Linux with public ip: urllib2 times out, links and wget work
I have verified with tcpdump on the linux router (http server end) and urllib2 always completes the tcp handshake even from the problematic systems, but then it seems to hang there. I tried toggling on/off syncookies and ECN but that didn't change anything.
How could I debug and possibly solve this issue?
You could also switch to using httplib2
.
After nearly 17 months I don't have access to that specific system anymore, so I won't be able to accept any real answer to this question.
At least I can tell future readers what answers are not good:
- changing to httplib2
- no, we're not getting ICMP redirects
- no, we don't even drop ICMP fragmentation packets
cheers.
精彩评论