Incremental update of docs in solr
I have a table called as EMP with ID and Name as fields and it has 2 records. I have formed the xml doc as
<add>
<doc>
<field name="ID">1</field>
<field name="Name">Name1</field>
</doc>
<doc>
<field name="ID">2</field>
<field name="Name">Name2</field>
</doc>
</add>
I have indexed the above xml file to solr for the first time. But When a 3rd record gets created in the table, should I go for a xml file that has only the third record i.e
<add>
<doc>
<field name="ID">3</field>
<field name="Name">Name3</field>
</doc>
</add>
and index this xml file separately?
OR should I add the new 3rd record to the original xml file i.e
<add>
<doc>
<field name="ID">1</field>
<field name="Name">Name1</field>
</doc>
<doc>
<field name="ID">2</field>
<field name="Name">Name2</field>
</doc>
<doc>
<field name="ID">3</field>
<field name="Name">Name3</field>
</doc>
</add>
and Index this newly created xml file again? But here when there are millions of record the generating a xml file will take time and also indexing all 开发者_运维知识库the docs will also take time.
So, How do I need to handle the incremental indexing/updation scenario?
Creating a new XML file that contains the delta (i.e. new or changed rows) solves your problem, and you are correct that it is more efficient than doing a full export/import.
If you want to circumvent creating your XML file, you should look into the DataImportHandler, which lets you import data from (for instance) a JDBC source. It's included as a contrib to Solr.
精彩评论