开发者

Get search word Hits ( number of occurences) per document in Lucene

Can any one suggest me the best way to get Hits( no of occurrences ) of a word per document in Luce开发者_JAVA百科ne?..


Lucene uses a field-based, rather than document-based, index. In order to get term counts per document:

  1. Iterate over documents using IndexReader.document() and isDeleted().
  2. In document d, iterate over fields using Document.getFields().
  3. For each field f, get terms using getTermFreqVector().
  4. Go over the term vector and sum frequencies per terms.
  5. The sum of term frequencies per field will give you the document's term frequency vector.


SpanTermQuery.getSpans will give an enumeration of docs and where the terms appears. The docs are sorted, so you can just count the number of times each doc appears, ignoring the position info.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜