开发者

Python socket data returns <byte> object. How to regexp it?

I'm writing a basic html-proxy in python (3), and up to now I'm not using prebuild classes like http.server.

I'm just starting a socket which accepts connection:

self.listen_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
self.listen_socket.bind((socket.gethostname(), 4321))
self.listen_socket.listen(5)
(a, b) = self.listen_socket.accept()
content = a.recv(100000)

Now content stores data like:

b'GET http://www.google.com/firefox HTTP/1.1\r\nHost: www.google.com\r\nUser-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2) Gecko/20100207 Namoroka/3.6\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\nAccept-Language: en-us,en;q=0.5\r\nAccept-Encoding: gzip,deflate\r\nAccept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\nKeep-Alive: 115\r\nProxy-Connection: keep-alive\r\nCookie: PREF=ID=1ac935f4d893f655:U=73a4849dc5fc23a4:TM=1266851688:LM=1267023171:S=Log1PmXRMlNjX3Of; NID=32=EnrZjTqILuW2_aMLtgsJ96FdEMF3s5FoMJSVq9GMr9dhLhTAd3F5RcQ3ImyVBiO2eYNKKMhzlGg7r8zXmeSq50EigS5sdKtCL9BMHpgCxZazA2NiyB0bTRWhp8-0BObn\r\n\r\开发者_C百科n'

How can I regexp it? Converting to string does not work for me.

Or, eventually, I need to find out the address which is inquired, like http://www.google.com/firefox in this case. Is there a parser that I do not know? How can I achieve the result?

Thanks in advance.


You need to include an encoding when converting to a string, for example use:

>>> str(b'GET http://...', 'UTF-8')
'GET http://...'

If you don't use an encoding then as you've discovered you get something a little less helpful:

>>> str(b'GET http://...')
"b'GET http://...'"


Also, you might want to check the *HTTPServer classes. They provide a wrapper around being HTTP servers and will also parse headers for you.

If you can't, well, at the very least they will provide source code examples on how to do it!


Methods are provided to convert between bytes and strings try str.encode() and bytes.decode()

http://python.about.com/od/python30/ss/30_strings_3.htm

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜