what's the best way to disguise a urllib2 query as a human request (beyond user-agent)?
What's the best way to disguise a Python program using urllib2? I know how to set the user-agent which is a good start. But what about other items like the referring URL? Any way to set that? Any other suggestions?
Here's what I'm using to add the user-agent:
opener = urllib2.build_opener()
opener.addheaders = [('User-agent','Mozilla/5.0 (X11; Linux x86_64; rv:2.0.1) Gecko/20110开发者_JAVA百科506 Firefox/4.0.1')]
f = opener.open("http://www.domain.com")
There are a lot of specifics that you haven't mentioned. To find your answers, simply go to www.domain.com
in your favorite browser (with good dev tools), and examine the network traffic.
Chrome has built in tools. Firebug is for firefox.
Look at all of the headers that were sent, and replicate as per your specific need.
You could point a real browser to a tool like this one or this one, see the exact contents of all fields that your real browser sends, and mimic that in your Python script.
精彩评论