Create summary from link

2023-03-30 09:46 问答作者：

Many pages (facebook, google+ etc) have a function that creates a summary with header, image and some text from a link. I have tried to find out if there are any libraries or guidelines about how to do this kind of function but my search-results havn't been helpful at all.

I know that I can parse the html of a page and extract the elements I'd like but I think there should be some kind of standard in how to do this (perhaps also how to create pages that are friendly to this kind of functionallity.

Anyone that have a good link that will point me to the rig开发者_高级运维ht direction? Javascript or .Net is my prefered choise but I can implement it myself too.

For the "perhaps also how to create pages that are friendly to this kind of functionallity." part:
You are probably searching for the open graph protocol:

<html xmlns:og="http://ogp.me/ns#">
<head>
<title>The Rock (1996)</title>
<meta property="og:title" content="The Rock" />
<meta property="og:type" content="movie" />
<meta property="og:url" content="http://www.imdb.com/title/tt0117500/" />
<meta property="og:image" content="http://ia.media-imdb.com/images/rock.jpg" />
...
</head>
...
</html>

I think this is the first place facebook will look. But facebook seems to have its own algorithms to detect the most relevant part of the page when these tags are missing.

Many pages (facebook, google+ etc) have a function that creates a summary with header, image and some text from a link. I have tried to find out if there are any libraries or guidelines about how to do this kind of function but my search-results havn't been helpful at all.

Such a function is usually build using some kind of "crawling", meaning your script opens the link and looks at its data. Just like you suggest yourself.

I know that I can parse the html of a page and extract the elements I'd like but I think there should be some kind of standard in how to do this (perhaps also how to create pages that are friendly to this kind of functionallity.

Standard way is the way most search engines do it, like Google. You get the title from the title of the website, description from description if there are any. Most search engines now days ignore description meta data and instead try to make their own summary.

This is usual done by looking for headers (h1, h2, etc) ad then the paragraphs.

And to make a website "Friendly" for these kind of crawls you build your website according to web standards (W3C).

Anyone that have a good link that will point me to the right direction? Javascript or .Net is my prefered choise but I can implement it myself too.

Language really doesn't matter as long as it is capable of doing some basic HTTP-GET.

继续阅读：web-scraping

Create summary from link

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？