Java, lucene, searcher indexer problem how to do it?

2023-04-01 12:36 问答作者：

I have to make something with lucene and java but I don't have an idea how to start with. I have to do servlet which has to receive from the browser, next make a searching and finally make page with the finded results. Browser should have possibility to choose between searching in names or in names and inside the pages. Browser should search html files in this direction /var/www/manual/. As a helper I a开发者_JAVA百科lready have two files: Indexer.java and Searcher.java.

Indexer

import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.io.StringReader;
import java.util.Date;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexWriter;

/**
 * This code was originally written for
 * Erik's Lucene intro java.net article
 */
public class Indexer {

  public static void main(String[] args) throws Exception {
    if (args.length != 2) {
      throw new Exception("Usage: java " + Indexer.class.getName()
        + " <index dir> <data dir>");
    }
    File indexDir = new File(args[0]);
    File dataDir = new File(args[1]);

    long start = new Date().getTime();
    int numIndexed = index(indexDir, dataDir);
    long end = new Date().getTime();

    System.out.println("Indexing " + numIndexed + " files took "
      + (end - start) + " milliseconds");
  }

  public static int index(File indexDir, File dataDir)
    throws IOException {

    if (!dataDir.exists() || !dataDir.isDirectory()) {
      throw new IOException(dataDir
        + " does not exist or is not a directory");
    }

    IndexWriter writer = new IndexWriter(indexDir,
      new StandardAnalyzer(), true);
    writer.setUseCompoundFile(false);

    indexDirectory(writer, dataDir);

    int numIndexed = writer.docCount();
    writer.optimize();
    writer.close();
    return numIndexed;
  }

  private static void indexDirectory(IndexWriter writer, File dir)
    throws IOException {

    File[] files = dir.listFiles();

    for (int i = 0; i < files.length; i++) {
      File f = files[i];
      if (f.isDirectory()) {
        indexDirectory(writer, f);  // recurse
      } else if (f.getName().endsWith(".txt")) {
//      } else if (f.getName().endsWith(".html.en")) {
        indexFile(writer, f);
      }
    }
  }

  private static void indexFile(IndexWriter writer, File f)
    throws IOException {

    if (f.isHidden() || !f.exists() || !f.canRead()) {
      return;
    }

    System.out.println("Indexing " + f.getCanonicalPath());

    Document doc = new Document();
    doc.add(new Field("contents", new FileReader(f)));
    doc.add(new Field("filename", f.getCanonicalPath(), Field.Store.YES, Field.Index.UN_TOKENIZED));
    //doc.add(new Field("filename", new StringReader(f.getCanonicalPath())));
    writer.addDocument(doc);
  }


}

Searcher

import java.io.File;
import java.io.FileReader;
import java.io.StringReader;
import java.util.Date;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

/**
 * This code was originally written for
 * Erik's Lucene intro java.net article
 */
public class Searcher {

  public static void main(String[] args) throws Exception {
    if (args.length != 2) {
      throw new Exception("Usage: java " + Searcher.class.getName()
        + " <index dir> <query>");
    }

    File indexDir = new File(args[0]);
    String q = args[1];

    if (!indexDir.exists() || !indexDir.isDirectory()) {
      throw new Exception(indexDir +
        " does not exist or is not a directory.");
    }

    search(indexDir, q);
  }

  public static void search(File indexDir, String q)
    throws Exception {
    Directory fsDir = FSDirectory.getDirectory(indexDir, false);
    IndexSearcher is = new IndexSearcher(fsDir);

//    Query query = QueryParser.parse(q, "contents", new StandardAnalyzer());   DEPRECATED
    QueryParser qp = new QueryParser("contents", new StandardAnalyzer());
    Query query = qp.parse(q);
    long start = new Date().getTime();
    Hits hits = is.search(query);
    long end = new Date().getTime();

    System.err.println("Found " + hits.length() +
      " document(s) (in " + (end - start) +
      " milliseconds) that matched query '" +
        q + "':");

    for (int i = 0; i < hits.length(); i++) {
      Document doc = hits.doc(i);
      System.out.println(doc.get("filename"));
    }
  }
}

One of the suggestions is to use HTMLDocument.java from lucene-demos for index html documents.

Could someone help me with this problem? Thank you for any advice.

I don't know if Lucene is a requirement for your project, but if you are interested by the full-text search capabilities of Lucene, then you may find easier to start with Solr (http://lucene.apache.org/solr/), a search engine based on Lucene. Solr is developed by the same people as Lucene, so you can be sure that everything is done the right way, and likely to be faster than code you could write.

Otherwise there is a nice "Getting started" guide on Lucene's website which will help you understand how to use Lucene (what is a Directory, how to read and write the index?) and the best practices (reuse IndexWriter instances, etc.) :

http://lucene.apache.org/java/3_3_0/gettingstarted.html#Getting Started

继续阅读：lucene

Java, lucene, searcher indexer problem how to do it?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？