开发者

How can I Schedule data imports in Solr

The wiki page, http://wiki.apache.org/solr/DataImportHand开发者_如何学JAVAler explains how to index data using DataImportHandler. But the example uses a command to initiate the import operation. How can I schedule a job to do this on a regular basis?c


On UNIX/Linux, cron jobs are your friends! On Windows, there is Task Scheduler.

UPDATE
To do it from Java code, since this is a simple GET request, you can use the HTTP Client library. See this tutorial on using the GetMethod.

If you need to programmatically send other requests to Solr, you probably should use the Solrj library. It allows to send all the basic commands to Solr ant it can be configured to access any Solr handlers:

CommonsHttpSolrServer server = new CommonsHttpSolrServer("http://localhost:8983/solr");
ModifiableSolrParams params = new ModifiableSolrParams();
params.set("command", "full-import");
QueryRequest request = new QueryRequest(params);
request.setPath("/dataimport");
server.request(request);


I was able to make it work following the steps:

  1. Create classes ApplicationListener, HTTPPostScheduler and SolrDataImportProperties (source code listed on http://wiki.apache.org/solr/DataImportHandler#Scheduling). I believe these classes haven't been committed yet.

  2. Add the following listener to Solr web.xml file:

    <listener>
       <listener-class>org.apache.solr.handler.dataimport.scheduler.ApplicationListener</listener-class>
    </listener>
    
  3. Configure dataimport.properties as per instructions in the wiki page.


simple add this line to your crontab with crontab -e command:

0,30 * * * * /usr/bin/wget http://<solr_host>:8983/solr/<core_name>/dataimport?command=full-import 

This will full import every 30 minutes. Replace <solr_host> and <core_name> with your configuration


There's a fresh patch by Esteve Fernandez that makes the whole thing work on Unix/Linux: https://issues.apache.org/jira/browse/SOLR-2305

@Eldo If you're going to need more help in building your own JAR just drop a question here...


This is a bit old, but I created a Windows WPF application and service to deal with this, as using CRON jobs and Task Scheduler is a bit difficult to maintain if you have a lot of cores / environments.

https://github.com/systemidx/SolrScheduler

You basically just drop in a JSON file in a specified folder and it will use a REST client to issue the commands to Solr.


We can use Quartz to do that, which is like the crontab on linux. But basically, the TimerTask embedded in jdk is enough for you.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜