What are the challenges in embedding text search (Lucene/Solr/Hibernate Search) in applications that are hosted at client sites
We have a enterprise java web-app that our customers (external) deploy on their intranets. I am exploring different full text search options: Lucene/Solr/Hibernate Search and one common concern is deployment/administration/tuning overhead for this.
This is particularly challenging in our case, since we do not host these applications. From what I have seen, most uses of these technologies have been in hosted applications. Our customers t开发者_JAVA百科ypically deploy our application in a clustered environment and do not have any experience with Lucene/Solr.
Does anyone have any experience with this? What challenges have you encountered with this approach? How did you overcome them? At this point I am trying to determine if this is feasible.
Thank you
It is very feasible to deploy applications onto clients sites that use Lucene (or Solr).
Some things to keep in mind: Administration
- you need a way to version your index,
so if/when you change the document
structure in the index, it can be
upgraded. - you therefore need a good way to force a re-index of all existing data. Probably also a good idea to provide an Admin option to allow an Admin to trigger a re-index as well.
- you could also provide an Admin option to allow optimize() be called on your index, or have this scheduled. Best to test the actual impact this will have first, since it may not be needed depending on the shape of your index
Deployement If you are deploying into a clustered environment, the simplest (and fastest in terms of dev speed and runtime speed) solution could be to create the index on each node.
Tuning * Do you have a reasonable approximation of the dataset you will be indexing? You will need to ensure you understand how your index scales (in both speed and size), since what you consider a reasonable dataset size, may not be the same as your clients... Therefore, you at least need to be able to let clients know what factors will lead to overly large index size, and possibly slower performance.
There are two advantages to embedding lucene in your app over sending the queries to a separate Solr cluster, performance and ease of deployment/installation. Embedding lucene means to run lucene in the same JVM which means no additional server round trips. Commits should be batched in a separate thread. Embedding lucene also means including some more JAR files in your class path so no separate install for Solr.
If your app is cluster aware, then the embedded lucene option becomes highly problematic. An update to one node in the cluster needs to be searchable from any node in the cluster. Synchronizing the lucene index on all nodes yields no better performance than using Solr. With Solr 4, you may find the administration to be less of a barrier to entry for your customers. Check out the literature of the grossly misnamed Solr Cloud.
精彩评论