python webkit with proxy support
I am writing a python script for scraping a webpage. I have created a webkit webview object and used the open method for loading the url. But I want to load the url through a proxy. How can i done 开发者_JS百科this ? How to integrate webkit with proxy? which webkit class support proxy?
try below code snippets. (reference from url)
import gtk, webkit
import ctypes
libgobject = ctypes.CDLL('/usr/lib/libgobject-2.0.so.0')
libwebkit = ctypes.CDLL('/usr/lib/libsoup-2.4.so.1')
libsoup = ctypes.CDLL('/usr/lib/libsoup-2.4.so.1')
libwebkit = ctypes.CDLL('/usr/lib/libwebkit-1.0.so')
proxy_uri = libsoup.soup_uri_new('http://127.0.0.1:8000') # set your proxy url
session = libwebkit.webkit_get_default_session()
libgobject.g_object_set(session, "proxy-uri", proxy_uri, None)
w = gtk.Window()
s = gtk.ScrolledWindow()
v = webkit.WebView()
s.add(v)
w.add(s)
w.show_all()
v.open('http://www.google.com')
Hope, it could help you.
You can use QApplicationProxy if you're on pyqt or this snippet if you're using pygi:
from gi.repository import WebKit
from gi.repository import Soup
proxy_uri = Soup.URI.new("http://127.0.0.1:8080")
session = WebKit.get_default_session().set_property("proxy-uri")
session.set_property("proxy-uri",proxy_uri)
References:
PyGI
PyQt
How about a solution that's already made?
PyPhantomJS is a minimalistic, headless, WebKit-based, JavaScript-driven tool. It is written in PyQt4 and Python. It runs on Linux, Windows, and Mac OS X.
It gives you access to a full headless WebKit browser, controllable via scripts written in JavaScript, with the ability to do various things, amongst which is screen scraping and proxy support. It uses the command line.
You can see the API here.
* When I say screen scraping, I mean you can either scrape page content, or even save page renders to a file. There's even a screen scraping JS library already written here.
精彩评论