开发者

Python library for HTTP support - including Content-Encoding

I have a scraper, which queries different websites. Some of them varyingly use Content-Encoding. And since I'm trying to simulate an AJAX query and need to mimic Mozilla, I need full support. There are multiple HTTP libraries for Python, but neither seems complete:

httplib seems pretty low level, more like a HTTP packet sniffer really.

urllib2 is some sort of elaborate hoax. There are a dozen handlers for various web client functions, but mandatory HTTP features like Content-Encoding appearantly aren't.

mechanize: is nice, already somehwat overkill for my tasks, but only supports CE 'gzip'.

httplib2: sounded most promising, but actually fails on 'deflate' encoding, because of the disparity of raw deflate and zlib streams.

So are there any other options? I can't believe I'm expected to reimplement workarounds for above libraries. And it's not a good idea to distribute patched versions alongside my application, because packagers might remove it again if the according library is available as separate distribution package.

I almost don't dare to say, but the http functions API in PHP is much nicer. And besides Content-Encoding:*, I might somewhen need multipart/form-data too. So, is there a comprehensive 3rd party library for htt开发者_JAVA技巧p retrieval?


I would consider either invoking a child process of cURL or using python bindings for libcurl.

From this description cURL seems to support gzip and deflate.


Beautiful Soup might work. Just throwing it out there.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜