开发者

asp.net: After scraping imageurls from a remote site, how to only show bigger images?

I need help accomplishing the following:

In my web app users should be able to submit products including a product image from a certain product site. They do this by first entering a product url, for exc. www.amazon.com/product1... What I want to do then is give the user an easy way to select the product image by showing him some images from the product url that could be the right product image. So far I开发者_开发问答 managed to scrape all imageurls from the product url (using the webclient class and “Html Agility Pack”), but ALL images from that product site are shown then… (that usually includes many small images). But I only want to show the user SOME images that could be the product image (and then he selects the right one). The only way I can think of to narrow down the amount of possible product images is by their size or width/height, the right product image is usually a bigger picture. (Or does somebody have a better way to determine possible product images from a product site ?).

Oh that was a lot of explanation, here is my actual question:

After scraping all the imageurls from a site how can I get the sizes of the images, in order to only show bigger ones to the user ?

The best would be to get the sizes before downloading every single image and only download the bigger ones, but if that is not possible I guess the only option is to get every picture, determine their sizes and then only show the bigger ones to the user. But how would you do that ?

Thanks a lot for answers


You can't do that without getting the images themselves. Any possible width/height attributes or CSS style rules are not representative of the actual dimensions of the images themselves, because they define the constraints in which to display the image which could be of a different size.

There really is two ways of doing this:

  1. Try to identity a particular element in the HTML which represents the large image, and extract the image displayed there. e.g. <div class="bigImage"><img src="..." /></div>. Not great.

  2. Download all of the images, and work out their size. Out of all the possible images on the website (inc. logos, buttons, adverts etc.), how do you know which are product images?

It's really quite open ended, and no guarunteed solution...


The src attribute of the images you find on this product site might have a clue in there for you. That is these images may have the product name or id in the url or come from a different "folder"/ path that would indicate to you that they are different.

If it's only one site you're scraping you may have a usable solution doing something like this. But if that sites changes anything, you'r app will break.


Use the attributes and the image sizes. Images that too small assume they are icons, images that are too wide assume that banners or a header, images that are too tall assume that they are a skyscraper banner. You're looking for images are rectangular that have similar height, width dimensions.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜