how to index almost 3 millions xml files with Lucene Solr
I am trying to index almos开发者_如何学JAVAt 3 million xml files with lucene solr. When I try to use command line "java -jar post.jar *.xml". There is no response from the machine. How can I do the indexing? Big thanks.
Break it into smaller batches. E.g. assuming your XML files are named aaa.xml to zzz.xml and fairly distributed, first send "java -jar a*.xml", then "java -jar b*.xml", etc.
The Open library project a while ago loaded a large number of books into solr for it's search purposes. There's a blog post about it here which might be useful to you.
Have you tried loading 3000 documents? Were you successful, and how long did it take? You haven't said how big the files are, so it's impossible to give estimates, but I've seen database loading (not lucene, but similar) run at 100,000 documents per hour.
精彩评论