Can you suggest any embeddable Javascript engine?
I want to develop one application where it will automatically (based on some logic) will crawl web pages. Automatically clicking and posting on the pages to te开发者_开发百科st them. Think of it like Selenium.
For simple webpages this can be easily done my scrapping the HTML code and then making a new request to server for the next page. The issue is handling Ajax pages. How do I handle the JS code in the HTML?
To break it down, the different parts of the problems will be. The JS engine must:-
- parse HTML code and make server request to fetch referred external JS files. May provide a hook to let user code to fetch it for the engine.
- create a DOM tree of the HTML elements as is done in browsers and let the user code access and manipulate them.
- let the user code hook on to JS events.
A typical JS code does the following tasks:-
- Access DOM elements.
Manipulate existing DOM elements.
a.This can be cosmetic (like changing height, etc.) The user code has no interest in this and this would be very difficult as it would require a layout engine.
b. This can be manipulation of attributes. User code would be interested in this.
Adding new DOM elements.
- Make Http requests for Ajax.
Can you just me any embeddable JS engine that I can use to achieve all these? My choice of language would be Java but, C/C++ or Python would do. I am not sure but does Mozilla Rhino fit the above bill?
Take a look at HtmlUnit.
We used the Cobra project for some work we were doing where we needed to retrieve web pages and have the javascript in them executed. Don't know if you could adapt the project for your needs.
精彩评论