How to optimize indexing of large number of DB records using Zend_Lucene and Zend_Paginator
So I have this cron script that is deployed and ran using Cron on a host and indexes all the records in a database table - the index is later used both for the front end of the site and the backed operations as well.
After the operation, the index is about 3-4 MB.
The problem is it takes a lot of resources (CPU: 30+ and a good chunk of memory) and slows the machine down. My question is about how to optimize the operation described below:
First there is a select query built using the Zend Framework API, this query is then passed to a Paginator factory that returns a paginator which I am using to balance the current number of items 开发者_如何学Gobeing indexed and not iterate over too much items. The script is iterating over the current items in the paginator object using a foreach loop until reaching the end and then it starts from the beginning after getting items for the next page.
I am suspecting this overhead is caused by the Zend_Lucene but no idea how this could be improved.
See my answer to Can I predict how large my Zend Framework index will be?
I tested Zend_Search_Lucene versus Apache Lucene (the Java version). In my test, the Java product indexed 1.5 million documents about 300x faster than the PHP product.
You'll be much happier using Apache Solr (the Tomcat container for Apache Lucene). Solr includes a tool called DataImportHandler that sucks data directly from a JDBC data source.
Use the PECL Solr extension to communicate with Solr from PHP. If you can't install that PHP extension, use Curl which should be available in default installations of PHP.
精彩评论