How can Python work with javascript
I am working on a scrapy app to scrapte some data on a web page
But there is some data loaded by ajax, and thus python just cannot execute that to get the data.
Is there any lib that simulate the behavior of a browse开发者_开发百科r?
For that you'd have to use a full-blown Javascript engine (like Google V8 in Chrome), to get the real functionality of the browser and how it interacts. However, you could possibly get some information by looking up all URLs in the source and doing a request to each, hoping for some valid data. But in overall, you're stuck without a full Javascript engine.
Something like python-spidermonkey. A wrapper to the Javascript engine of Mozilla. However using it might be rather complicated, but that's dependant on your specific application.
You'd basically have to build a browser, but seems Python-people have made it simple. With PyWebkitGtk you'd get the dom and using either python-spidermonkey mentioned before or PyV8 mentioned by Duncan you'd theoretically get the full functionality needed for a browser/webscraper.
The problem is that you don't just have to be able to execute some Javascript (that's easy), you also have to emulate the browser DOM and that's a lot of work.
If you want to be able to run Javascript then you can use PyV8. Install it with easy_install PyV8
and then you can execute any standalone javascript:
>>> import PyV8
>>> ctxt = PyV8.JSContext()
>>> ctxt.enter()
>>> ctxt.eval("(function(a,b) { return [a+b, a*b, a/b, a-b] })(13,29)")
<_PyV8.JSArray object at 0x01F26A30>
>>> list(_)
[42, 377, 0.4482758620689655, -16]
You can also pass in classes defined in Python, so in principle might be able could emulate enough of the DOM for your purposes.
An AJAX request is a normal web request which is executed asynchronously. All you need is the URL which the JavaScript code sends to the server. Use that URL with urllib to get at the same data.
The simplest way to get work done is by using 'PyExecJS'. https://pypi.python.org/pypi/PyExecJS
PyExecJS is a porting of ExecJS from Ruby. PyExecJS automatically picks the best runtime available to evaluate your JavaScript program, then returns the result to you as a Python object.
I use Macbook and installed node.js so pyexecjs can use node.js javascript runtime.
pip install PyExecJS
Test code:
import execjs execjs.eval("'red yellow blue'.split(' ')")
Good luck!
A little update for 2020
i reviewed the results recommended by Google Search. Turns out the best choice is #4 and then #1 #2 are deprecated!
- #4 https://github.com/sqreen/PyMiniRacer is actually the most straight forward installation.
- #3 https://github.com/kovidgoyal/dukpy is based lightweight JS engine. I could not find any limitations compared with v8. No significant benefit in terms of performance either.
Deprecated
- #1
https://github.com/sony/v8evalhas received very little maintenance for 2 years. takes 20 minutes to build and fails... (filed a bug report for that https://github.com/sony/v8eval/issues/34 - #2
https://github.com/doloopwhile/PyExecJSis discontinued by the owner
精彩评论