开发者

unsupervised Named entity recognition (NER) with custom controlled vocabulary for crosslink-suggestions in Java

I'm looking for a Java library that can do Named entity recognition (NER) with a custom controlled vocabulary, without needing labeled training data first. I searched some on SE, but most questions are rather unspecific.

Consider the following use-case:

  • an editor is inputting articles in a CMS (about 500 words).
  • the text may contain references (in plain text) to entities of a specific domain. e.g:
    • names of points of interest, like bars, restaurants, as well as neighborhoods, etc.
  • a controlled vocabulary of these entities exist (about 5.000 entities) .
    • I imagine an entity to be a -tuple in the vocabulary
  • after finishing the text, the user should be able to save the document.
  • This triggers the workflow to scan the piece of text against the vocabulary, by comparing against the name of the entity. It's not required to have a 100% match: 97% on Jarao-winkler or whatever (I'm not familiar with what algo's NER uses) may be enough, I need this to be configurable.
  • Hits are returned to th开发者_如何学JAVAe controller server-side. This in return returns JSON to the client containing of the entities, which are represented as suggested crosslinks to the editor.

Ideally, I'm looking for a project that uses NRE to suggests crosslinks within a CMS-environment to piggyback on. (I'm sure plugins for wordpress exist for example) not so sure if something similar exists in Java.

All other more general pointers to NRE-libraries which work with controlled custom vocabularies are welcome as well.


For people looking this up in the future:

"Approximate Dictionary-Based Chunking" see: http://alias-i.com/lingpipe/demos/tutorial/ne/read-me.html

(URL edited.)


Unsure if these might be helpful: http://www-nlp.stanford.edu/software/CRF-NER.shtml http://cogcomp.cs.illinois.edu/page/software

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜