Python library for HTTP support - including Content-Encoding
I have a scraper, which queries different websites. Some of them varyingly use Content-Encoding. And since I'm trying to simulate an AJAX query and need to mimic Mozilla, I need full support. There are multiple HTTP libraries for Python, but neither seems complete:
httplib seems pretty low level, more like a HTTP packet sniffer really.
urllib2 is some sort of elaborate hoax. There are a dozen handlers for various web client functions, but mandatory HTTP features like Content-Encoding appearantly aren't.
mechanize: is nice, already somehwat overkill for my tasks, but only supports CE 'gzip'.
httplib2: sounded most promising, but actually fails on 'deflate' encoding, because of the disparity of raw deflate and zlib streams.
So are there any other options? I can't believe I'm expected to reimplement workarounds for above libraries. And it's not a good idea to distribute patched versions alongside my application, because packagers might remove it again if the according library is available as separate distribution package.
I almost don't dare to say, but the http functions API in PHP is much nicer. And besides Content-Encoding:*, I might somewhen need multipart/form-data too. So, is there a comprehensive 3rd party library for htt开发者_JAVA技巧p retrieval?
I would consider either invoking a child process of cURL or using python bindings for libcurl.
From this description cURL seems to support gzip and deflate.
Beautiful Soup might work. Just throwing it out there.
精彩评论