problem with incremental update in lucene

2023-01-28 11:35 问答作者：

I am creating a program that can index many text files in different folder. so that's mean every folder that has text files get indexed and its index are stored in another folder. so this another folder acts like a universal index of all files in my computer. and I am using lucene to achieve this because lucene fully supported incremental update. this is the source code into which I use it for indexing.

public class Si开发者_如何学JAVAmpleFileIndexer {


public static void main(String[] args) throws Exception   {

    int i=0;
    while(i<2) {
    File indexDir = new File("C:/Users/Raden/Documents/myindex");
    File dataDir = new File("C:/Users/Raden/Documents/indexthis");
    String suffix = "txt";

    SimpleFileIndexer indexer = new SimpleFileIndexer();

    int numIndex = indexer.index(indexDir, dataDir, suffix);

    System.out.println("Total files indexed " + numIndex);
    i++;
    Thread.sleep(1000);

    }
}


private int index(File indexDir, File dataDir, String suffix) throws Exception {
    RAMDirectory ramDir = new RAMDirectory();          // 1
    @SuppressWarnings("deprecation")
    IndexWriter indexWriter = new IndexWriter(
            ramDir,                                    // 2
            new StandardAnalyzer(Version.LUCENE_CURRENT),
            true,
            IndexWriter.MaxFieldLength.UNLIMITED);
    indexWriter.setUseCompoundFile(false);
    indexDirectory(indexWriter, dataDir, suffix);
    int numIndexed = indexWriter.maxDoc();
    indexWriter.optimize();
    indexWriter.close();

    Directory.copy(ramDir, FSDirectory.open(indexDir), false); // 3

    return numIndexed;
}


private void indexDirectory(IndexWriter indexWriter, File dataDir, String suffix)  throws IOException {
    File[] files = dataDir.listFiles();
    for (int i = 0; i < files.length; i++) {
        File f = files[i];
        if (f.isDirectory()) {
            indexDirectory(indexWriter, f, suffix);
        }
        else {
            indexFileWithIndexWriter(indexWriter, f, suffix);
        }
    }
}

private void indexFileWithIndexWriter(IndexWriter indexWriter, File f, String suffix) throws IOException {
    if (f.isHidden() || f.isDirectory() || !f.canRead() || !f.exists()) {
        return;
    }
    if (suffix!=null && !f.getName().endsWith(suffix)) {
        return;
    }
    System.out.println("Indexing file " + f.getCanonicalPath());

    Document doc = new Document();
    doc.add(new Field("contents", new FileReader(f)));      
doc.add(new Field("filename", f.getCanonicalPath(), Field.Store.YES, Field.Index.ANALYZED));
    indexWriter.addDocument(doc);
} }

and this is the source code that I use for searching the lucene-created index

public class SimpleSearcher {

public static void main(String[] args) throws Exception {

    File indexDir = new File("C:/Users/Raden/Documents/myindex");
    String query = "revolution";
    int hits = 100;

    SimpleSearcher searcher = new SimpleSearcher();
    searcher.searchIndex(indexDir, query, hits);

}

private void searchIndex(File indexDir, String queryStr, int maxHits) throws Exception {

    Directory directory = FSDirectory.open(indexDir);

    IndexSearcher searcher = new IndexSearcher(directory);
    @SuppressWarnings("deprecation")
    QueryParser parser = new QueryParser(Version.LUCENE_30, "contents", new StandardAnalyzer(Version.LUCENE_CURRENT));
    Query query = parser.parse(queryStr);

    TopDocs topDocs = searcher.search(query, maxHits);

    ScoreDoc[] hits = topDocs.scoreDocs;
    for (int i = 0; i < hits.length; i++) {
        int docId = hits[i].doc;
        Document d = searcher.doc(docId);
        System.out.println(d.get("filename"));
    }

    System.out.println("Found " + hits.length);

}

}

the problem I am having now is that the indexing program I created above seem can't do any incremental update. I mean I can search for a text file but only for the file that existed in the last folder to which I already indexed, and the other previous folder that I had already indexed seems to be missing in the search result and didn't get displayed. can you tell me what went wrong in my code? I just wanted to be able to have incremental update feature in my source code. so in essence, my program seems to be overwriting the existing index with the new one instead of merging it.

thanks though

Directory.copy() overwrites the destination directory, you need to use IndexWriter.addIndexes() to merge the new directory indices into the main one.

You can also just re-open the main index and add documents to it directly. A RAMDirectory isn't necessarily more efficient than properly tuned buffer and merge factor settings (see IndexWriter docs).

Update: instead of Directory.copy() you need to open ramDir for reading and indexDir for writing and call .addIndexes on the indexDir writer and pass it the ramDir reader. Alternatively, you can use .addIndexesNoOptimize and pass it ramDir directly (without opening a reader) and optimize the index before closing.

But really, it's probably easier to just skip the RAMDir and open a writer on indexDir in the first place. Will make it easier to update changed files as well.

Example

private int index(File indexDir, File dataDir, String suffix) throws Exception {
    RAMDirectory ramDir = new RAMDirectory();
    IndexWriter indexWriter = new IndexWriter(ramDir,
        new StandardAnalyzer(Version.LUCENE_CURRENT), true,  
        IndexWriter.MaxFieldLength.UNLIMITED);
    indexWriter.setUseCompoundFile(false);
    indexDirectory(indexWriter, dataDir, suffix);
    int numIndexed = indexWriter.maxDoc();
    indexWriter.optimize();
    indexWriter.close();


    IndexWriter index = new IndexWriter(FSDirectory.open(indexDir),
        new StandardAnalyzer(Version.LUCENE_CURRENT), true,  
        IndexWriter.MaxFieldLength.UNLIMITED);
    index.addIndexesNoOptimize(ramDir);
    index.optimize();
    index.close();

    return numIndexed;
}

But, just this is fine too:

private int index(File indexDir, File dataDir, String suffix) throws Exception {

    IndexWriter index = new IndexWriter(FSDirectory.open(indexDir),
        new StandardAnalyzer(Version.LUCENE_CURRENT), true,  
        IndexWriter.MaxFieldLength.UNLIMITED);

    // tweak the settings for your hardware
    index.setUseCompoundFile(false);
    index.setRAMBufferSizeMB(256);
    index.setMergeFactor(30);

    indexDirectory(index, dataDir, suffix);

    index.optimize();
    int numIndexed = index.maxDoc();
    index.close();

    // you'll need to update indexDirectory() to keep track of indexed files
    return numIndexed;
}

继续阅读：lucene

problem with incremental update in lucene

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？