Python Regex Help (httplib2 cookies)
Having the same problem as the poster of this question: httplib2, how to set more than one cookie?
The cookie looks like this..
PHPSESSID=8527b5532b6018aec415开发者_如何转开发9d81f69765bd; path=/; expires=Fri, 19-Feb-2010 13:52:51 GMT, id=1578; expires=Mon, 22-Feb-2010 13:37:51 GMT, password=123456; expires=Mon, 22-Feb-2010 13:37:51 GMT, sid=8527b5532b6018aec4159d81f69765bd
Note how it uses commas as well as semi-colons to separate cookies, but commas are also used in the cookie itself.
This is too complicated for me to write a regex to separate them properly, it would be very much appreciated if anyone wants to give it a shot!
Have you tried cookielib / http.cookiejar?
If you interpret the cookie as this
PHPSESSID=8527b5532b6018aec4159d81f69765bd;
path=/;
expires=Fri, 19-Feb-2010 13:52:51 GMT, id=1578;
expires=Mon, 22-Feb-2010 13:37:51 GMT, password=123456;
expires=Mon, 22-Feb-2010 13:37:51 GMT, sid=8527b5532b6018aec4159d81f69765bd
Then only the semicolon is the true separator, and the comma separator is only due to an expiration date prepending it.
If you are not interested in the expiration date, then you can use 1 regex to filter out the expiration date e.g.
s/expires=[^,]+,[^,]+, //g
then separate the whole string by ;
, and parse them as key=value
pairs.
Note how it uses commas as well as semi-colons to separate cookies, but commas are also used in the cookie itself.
As quoted, the ambiguous commas make the string unparseable with regex or any other tool. Where is that string coming from?
As a Set-Cookie:
header value it would simply be completely invalid, and wouldn't work in any browser. Browsers would set PHPSESSID as a session cookie (since the expires date format is invalid with the extra comma), and ignore the rest. Multiple cookies have to be set with multiple Set-Cookie
headers, not combined into one.
Edit: OK, what seems to be happening is httplib2 is handling the HTTP response data using the stdlib email
package to parse the headers. In e-mail, the RFC822 family of standards require that multiple headers with the same name (like, eg. To:
addresses) are equivalent to a single header with the values joined by commas.
However, HTTP responses are explicitly not an RFC822-family standard; it is totally inappropriate to handle them this way. It would appear that by using email
to parse HTTP responses, httplib2
has made itself unable to handle any multiply-used header correctly, and the Set-Cookie
header is very often used like that. For this reason I consider httplib2
fundamentally broken and would advise not using it.
精彩评论