开发者

Lucene TermPositionVector and retrieving terms at index locations

I've been looking like mad for an answer to this however I'm still in the dark:

i am using

int[] getTermPositions(int index)

of a TermPositionVector I have for a field (which has been set to store both offsets and positions) to get the term positions of the terms I'm interested in highlighting as keyword in context.

The question: What do these positions correspond to? Obviously not the

String[] getTerms()

that is returned by the TermFreqVector interface, as that contains just raw counts of my terms.

What I'm looking for is a way to开发者_JAVA百科 get the "tokenized" array of my field so I can then pull out the surrounding terms around the index values returned by getTermPositions(int index)

Help? Thanks a bunch.


int[] getTermPositions(int index)

returns an array of the term positions of term i. You can get the index i using the

int indexOf(String term)

method of TermFreqVector. The term positions are the positions (with term as the unit) at which the given term occurs. For example,

// source text:
// term position 0   1     2     3   4     5    6   7    8
//               the quick brown fox jumps over the lazy dog

// terms:
// term index 0     1   2   3    4    5    6     7
//            brown dog fox jump lazy over quick the

// Suppose we want to find the positions where "the" occurs

int index = termPositionVector.indexOf("the"); // 7
int positions = termPositionVector.getTermPositions(index); // {0, 6}


Well, this will accomplish what I wanted:

http://lucene.apache.org/java/3_0_2/lucene-contrib/index.html#highlighter

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜