Lucene TermPositionVector and retrieving terms at index locations
I've been looking like mad for an answer to this however I'm still in the dark:
i am using
int[] getTermPositions(int index)
of a TermPositionVector I have for a field (which has been set to store both offsets and positions) to get the term positions of the terms I'm interested in highlighting as keyword in context.
The question: What do these positions correspond to? Obviously not the
String[] getTerms()
that is returned by the TermFreqVector interface, as that contains just raw counts of my terms.
What I'm looking for is a way to开发者_JAVA百科 get the "tokenized" array of my field so I can then pull out the surrounding terms around the index values returned by getTermPositions(int index)
Help? Thanks a bunch.
int[] getTermPositions(int index)
returns an array of the term positions of term i. You can get the index i using the
int indexOf(String term)
method of TermFreqVector. The term positions are the positions (with term as the unit) at which the given term occurs. For example,
// source text:
// term position 0 1 2 3 4 5 6 7 8
// the quick brown fox jumps over the lazy dog
// terms:
// term index 0 1 2 3 4 5 6 7
// brown dog fox jump lazy over quick the
// Suppose we want to find the positions where "the" occurs
int index = termPositionVector.indexOf("the"); // 7
int positions = termPositionVector.getTermPositions(index); // {0, 6}
Well, this will accomplish what I wanted:
http://lucene.apache.org/java/3_0_2/lucene-contrib/index.html#highlighter
精彩评论