开发者

Lucene: searching/filtering by field's value length

I need some help doing a search. Say I have a really simple docu开发者_如何学编程ment structure, just 1 field, labeled name. I need to retrieve all the names whose length is more or less than a specified value. By length I mean String.length(). A range filter seems close in concept, but I couldn't find a good example to write my specific case. Thanks for the help.


Add a NumericField using the length, then use a RangeQuery. See NumericField javadoc's for an example.


This is a classic example of a MultiTermQuery. It's not in the box, but easy to implement. Take a look at WildCardQuery which extends MultiTermQuery. This does something very similar. Just use a different FilterredTermEnum like this one which uses the length of the term.text to filter the terms (not the term text itself).

The magic happens here (this code is in the custom term enumerator at the bottom of my post):

protected internal override bool TermCompare(Term term)
{
  if (field == term.Field())
  {
    System.String searchText = term.Text();
    if (searchText.Length >= text.Length())
    {
      return true;
    }
  }
  endEnum = true;
  return false;
}

The above code looks through all the terms for the field and checks their lengths against the length of the term passed in the constructor. It yields true for any field that is at least that long.

public class MinLengthQuery : MultiTermQuery
{
  public MinLengthQuery(Term term) : base(term)
  {
  }

  protected internal override FilteredTermEnum GetEnum(IndexReader reader)
  {
    return new MinLengthTermEnum(reader, GetTerm());
  }
}

This class does all the work:

public class MinLengthTermEnum : FilteredTermEnum
{
internal Term searchTerm;
internal System.String field = "";
internal System.String text = "";
internal System.String pre = "";
internal int preLen = 0;
internal bool endEnum = false;

public MinLengthTermEnum(IndexReader reader, Term term):base()
{
  searchTerm = term;
  field = searchTerm.Field();
  text = searchTerm.Text();
  SetEnum(reader.Terms(new Term(searchTerm.Field(), "")));
}

protected internal override bool TermCompare(Term term)
{
  if (field == term.Field())
  {
    System.String searchText = term.Text();
    if (searchText.Length >= text.Length())
    {
      return true;
    }
  }
  endEnum = true;
  return false;
}

public override float Difference()
{
  return 1.0f;
}

public override bool EndEnum()
{
  return endEnum;
}
public override void  Close()
{
  base.Close();
  searchTerm = null;
  field = null;
  text = null;
}
}

(I'm a lucene.net guy, but the translation ought be be easy enough... It would probably be easier to start with your version of Lucene's source code for WildCardQuery and TermEnum and work from it).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜