Detect the size of an image in HTML, using python
I'm trying to implement a similar functionality to Facebook's thumbnail preview. The idea is, a user enters the URL of a product开发者_开发百科, and selects the best image of that product.
In order to filter out images that obviously aren't a product, I want to filter them based on height and width > 150px.
I'm using python and BeautifulSoup to download the HTML and extract images, but can't find a way to gather the height or width when it is specified in CSS.
GD is a library that's been around for quite some time and it has a pretty easy interface to work with...Here's a link to GD
See the "size" method to get width and height.
EDIT
Ah, how about this?
- Parse the HTML content and retrieve URLs to the CSS file(s) and inline styles
- Download the CSS file(s) and parse CSS files, in order, building a rule-set of the CSS rules.
- Next, parse the rest of the HTML from Step 1, gathering IMG tags and if the IMG tag has a
class
name, look up the class name in your CSS rules and check forwidth
orheight
.
Might sound a little complicated but I bet download a few CSS stylesheets is much lighter than downloading images and having to use an image library on the server-side.
精彩评论