开发者

Python Regex Help (httplib2 cookies)

Having the same problem as the poster of this question: httplib2, how to set more than one cookie?

The cookie looks like this..

PHPSESSID=8527b5532b6018aec415开发者_如何转开发9d81f69765bd; path=/; expires=Fri, 19-Feb-2010 13:52:51 GMT, id=1578; expires=Mon, 22-Feb-2010 13:37:51 GMT, password=123456; expires=Mon, 22-Feb-2010 13:37:51 GMT, sid=8527b5532b6018aec4159d81f69765bd

Note how it uses commas as well as semi-colons to separate cookies, but commas are also used in the cookie itself.

This is too complicated for me to write a regex to separate them properly, it would be very much appreciated if anyone wants to give it a shot!


Have you tried cookielib / http.cookiejar?


If you interpret the cookie as this

PHPSESSID=8527b5532b6018aec4159d81f69765bd;
path=/;
expires=Fri, 19-Feb-2010 13:52:51 GMT, id=1578;
expires=Mon, 22-Feb-2010 13:37:51 GMT, password=123456; 
expires=Mon, 22-Feb-2010 13:37:51 GMT, sid=8527b5532b6018aec4159d81f69765bd

Then only the semicolon is the true separator, and the comma separator is only due to an expiration date prepending it.

If you are not interested in the expiration date, then you can use 1 regex to filter out the expiration date e.g.

s/expires=[^,]+,[^,]+, //g

then separate the whole string by ;, and parse them as key=value pairs.


Note how it uses commas as well as semi-colons to separate cookies, but commas are also used in the cookie itself.

As quoted, the ambiguous commas make the string unparseable with regex or any other tool. Where is that string coming from?

As a Set-Cookie: header value it would simply be completely invalid, and wouldn't work in any browser. Browsers would set PHPSESSID as a session cookie (since the expires date format is invalid with the extra comma), and ignore the rest. Multiple cookies have to be set with multiple Set-Cookie headers, not combined into one.

Edit: OK, what seems to be happening is httplib2 is handling the HTTP response data using the stdlib email package to parse the headers. In e-mail, the RFC822 family of standards require that multiple headers with the same name (like, eg. To: addresses) are equivalent to a single header with the values joined by commas.

However, HTTP responses are explicitly not an RFC822-family standard; it is totally inappropriate to handle them this way. It would appear that by using email to parse HTTP responses, httplib2 has made itself unable to handle any multiply-used header correctly, and the Set-Cookie header is very often used like that. For this reason I consider httplib2 fundamentally broken and would advise not using it.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜