开发者

Detect the size of an image in HTML, using python

I'm trying to implement a similar functionality to Facebook's thumbnail preview. The idea is, a user enters the URL of a product开发者_开发百科, and selects the best image of that product.

In order to filter out images that obviously aren't a product, I want to filter them based on height and width > 150px.

I'm using python and BeautifulSoup to download the HTML and extract images, but can't find a way to gather the height or width when it is specified in CSS.


GD is a library that's been around for quite some time and it has a pretty easy interface to work with...Here's a link to GD

See the "size" method to get width and height.

EDIT

Ah, how about this?

  1. Parse the HTML content and retrieve URLs to the CSS file(s) and inline styles
  2. Download the CSS file(s) and parse CSS files, in order, building a rule-set of the CSS rules.
  3. Next, parse the rest of the HTML from Step 1, gathering IMG tags and if the IMG tag has a class name, look up the class name in your CSS rules and check for width or height.

Might sound a little complicated but I bet download a few CSS stylesheets is much lighter than downloading images and having to use an image library on the server-side.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜