开发者

How to retrieve title and summary of web page by programme?

Like what开发者_C百科 digg does,when you submit a news,the title and summary is automatically retrieved,how to do it?


Retrieve the HTML and parse it.

The title comes from the <title> tag. The summary can come from either:

  • The first couple of hundred characters of visible text from inside the <body> tag.
  • The description <meta> tag.

If the site provides an RSS feed (which you'll find in the <link rel="alternate" type="application/rss+xml"> tag) use the fielded information from that instead.

There is no one right answer to this question. There are probably other strategies possible. But this should get you started.


The title is easy just the title tag of the HTML the summary is a bit harder if you are retrieving this with some search or context you should try and generate the summary based on the position of the search term or something relative to the context you are showing this in. For example if you are showing this because I hit an "AI" tag show me some of the page that is about AI.

In the case of Digg title and Description can be edited by the poster before it is pushed out to everyone. But if the page has a meta tag of description it will pre-populate the field. They use the following meta tag <meta name="description" content="blah blah blah"/>

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜