开发者

How Does Facebook Know What Image To Parse Out of An Article?

First off I want to say that I wasn't really sure where to post this 开发者_StackOverflowbut it is very much programming related. If it is in the wrong spot I apologize and please let me know where I should post it instead.

When sharing an article on a friends wall, facebook will grab a thumbnail of the article. How do they always get the right thumbnail from articles?

It doesn't grab the logo img element of of http://www.nytimes.com/2010/06/07/world/asia/07convoys.html?hp for example but rather grabs the correct image element that corresponds with the article.

I'm looking to do something similar and was wondering of a good way to parse the html to find the image given this example. Thanks.


Actually, Facebook's way of finding thumbnails isn't so magical. It searches for a set of <meta> and <link> tags which specify which title, description, and image to use.

If it cannot find any of the <meta> and <link> tags it is looking for, it basically asks the user to choose whichever <img> tag fits.

In the case of the NY Times, it uses the following:

<meta name="thumbnail" content="whatever.jpg" />

Facebook recommends you use a <link> tag instead for the thumbnail.

<meta name="title" content="title" />
<meta name="description" content="description " />
<link rel="image_src" href="thumbnail_image" />

Source: Facebok Share/Specifying Meta Tags


They don't always grab the correct image, even though there's certainly some good logic in place.

In many cases, I've seen a list of thumbnails to choose from, meaning Facebook's parser considered them equally relevant.

I would guess they (probably among other things) look at the dom structure and find images close to content that looks "shareable".

UPDATE:

After some empirical testing, it seems that image dimensions play a big role. Images too small and too wide are not considered thumbnails. If your logo is the right size though, expect it to show up as one of the thumbnails. Try sharing something on http://www.e24.se for example.


These are just guesses as I don't have any knowledge of Facebook's internal operations, but if I were parsing thumbnails from a page I would consider several things:

  • Size of the image, as previously stated
  • Relevant keywords in the href or alt attributes
  • Location of the <img> tag on page, the closer to relevant content the better, but may not always work for complicated layouts
  • Absence of ad-related keywords in the <img> tag or nearby tags (doubleclick comes to mind)

Also, as far as I know the Facebook meta tags are fairly new, so my guess is that the link page scraper is still grabbing images the hard way ;) However if you're running a site and want Facebook to grab the right information when it scrapes your pages I highly suggest implementing them.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜