Programmatically analyze CSS layout
I would like to spider a few blogs and programmatic开发者_开发百科ally analyze their html and css-based layouts to see e.g. if the sidebar is to the left or right of the main content, how many columns and how wide they are.
How would I do this the best way? Are there any tools or libraries I can use?
(I would prefer a solution in Python or PHP.)
This sounds like an extremely hard task to do using pure server-side CSS and HTML parsing - you would effectively have to recreate the browser's rendering engine to get reliable results.
Depending on what you need this for, I could think of a way somewhere along these lines:
Fetch pages and style sheets using something like
wget
with--page-requisites
Then either:
Walk through each downloaded page using a tool like Selenium, search for element names and output their positions (if that is possible in Selenium. I assume it is, but I do not know for sure)
Create a piece of jQuery that you inject into each of the downloaded pages. The jQuery searches for elements named "sidebar", "toolbar" etc., gets their positions, saves the results to a local AJAX snippet, and continues to the next downloaded page. You need to only open the first page in the browser, the rest will happen automatically. Not trivial to implement but possible.
If you can use a client side application platform like .NET, you may be easier off building a custom application that incorporates a browser control, whose DOM you can access more freely than using only jQuery.
Are you looking for this?
http://cthedot.de/cssutils/
It was the first hit on a Google search. There were at least four others that looked promising. Perhaps you should try Google, list what you found, and ask for specific advice on specific packages.
It seems like this could be achieved via PhantomJS, with a Javascript something like this:
phantom.viewportSize = { width: 1024, height: 768 };
var page = new WebPage();
page.open("http://mashable.com/", function(status) {
if (status === "success")
{
page.includeJs("https://ajax.googleapis.com/ajax/libs/jquery/1/jquery.min.js", function() {
var position = page.evaluate(function() {
return jQuery('#sidebar').position();
});
// Now position.left and position.top contains the
// position of the #sidebar element. Use other
// jQuery functions to calculate the relative position.
phantom.exit();
});
}
});
精彩评论