开发者

How do I find whether a document on the web is semantically related to some other document?

My question here is that given a document d1 on the web and a document d2 how do I tell that d1 and d2 are semantically related. Are there some API's that can do some amount of natural language processing that might give me a hint as to d1 is a probably co开发者_开发技巧nnected to d2. I need it badly and uregently.Please Help!!


You can use special microformats. See more at http://microformats.org/

Simple example:

<a href="http://creativecommons.org/licenses/by/2.0/" rel="license">cc by 2.0</a>

Rel-License is one of several microformats. By adding rel="license" to a hyperlink, a page indicates that the destination of that hyperlink is a license for the current page.


For semantically relating documents you can use special vocabularies like SKOS and relate them in an ontology. Or you can use - as silex mentioned - microformats directly in your documents.

For natural language processing, there exist different tools like GATE which can extract information. But this is not a trivial task.

Perhaps you can refine what you want to do? Do you want to define which documents are related? Or do you want a software to find out which documents may be related?


You need to look into "named entity extraction" i.e. natural language processing to extract likely entities that are common to both documents. These are generally people, places, events, times, organisations.

Take a look at OpenCalais http://www.opencalais.com/ for some real-world applications of this type of technology.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜