why is python script failing to download webpages through a proxy
i am new to python and am trying my luck at sockets. So i wrote a simple http client but to my surprise it is failing to access webpages that firefox can access, yet they use the same headers
import socket
clientsocket= socket.socket(socket.AF_INET, socket.SOCK_STREAM)
clientsocket.connect(("213.229.83.205",80))#connect to proxy at given address
print "connected to 213.229.83.205"
sdata= """GET http://google.co.ug/ HTTP/1.1
Host: google.co.ug
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20100101 Firefox/6.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Proxy-Connection: keep-alive
Cookie: cookie <-- Real cookie deleted
"""
print "sending request"
clientsocket.send(sdata);
rdata=clientsocket.recv(10240)
if not rdata: print "no data found"
else:
print "receiving data !"
myfile=o开发者_如何学Cpen("c:/users/markdenis/desktop/google.html","w")
myfile.write(str(rdata))
myfile.close()
print "data written to file on desktop"
clientsocket.close()
raw_input()#system(pause)
When i run it, it shows:
connected to 213.229.83.205
sending request
no data found
The HTTP protocol requires \r\n
at the end of each header and an extra on a blank line at the end of the HTTP headers. You aren't explicit about the line endings in your sdata
buffer, and therefore your buffer ends up with just \n
line endings.
Tested on Windows, Linux and OS X, to be sure:
>>> x = """a
b
c"""
>>> x
'a\\nb\\nc\\n'
Where you need:
>>> x = "a\r\nb\r\nc\r\n"
>>> x
'a\\r\\nb\\r\\nc\\r\\n'
Add \r\n
s and give it a shot. Doing it directly in the buffer will get you an extra set of \n
, so split it up:
sdata = "GET http://google.co.ug/ HTTP/1.1\r\n"
sdata += "Host: google.co.ug\r\n"
sdata += "User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20100101 Firefox/6.0\r\n"
sdata += "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n"
sdata += "Accept-Language: en-us,en;q=0.5\r\n"
sdata += "Accept-Encoding: gzip, deflate\r\n"
sdata += "Proxy-Connection: keep-alive\r\n"
sdata += "\r\n"
精彩评论