Hibernate and Batch Update
Although I can see a lot of discussions around Hibernate and Batch Update, here is my specific scenario I hope to get comments from you all experts. I am iterating through a list of Document objects and for each of the document, I need to update a DOCUMENT_METADATA table with a property from Document object.
I can just use standard JDBC 2.0 Batch Update to do this. However, JDBC usage will be against the software standard in my place to use Hibernate throughout the application and I hate to make an exception.
In order to use Hibernate, I have to first fetch the DocumentMetadata object given the document ID from Document object I am iterating, set the DocumentMetadata property and then update the table.
I can do something like
for each document {
//fetch DocumentMetadata object given the id from Document
//invoke setter on DocumentMetadata object
em.persist(DocumentMetadata);
if (count % 50 == 0) {
em.flush(); //flush a batch of updates and release memory:
em.clear();
开发者_JAVA技巧 }
}
For n records ( I will be running around 10,000 records at a time), as the best case am I not doing n selects = 1 update with the Hibernate approach above? Given the size of my table (DOCUMENT_METADATA table has more than 100 columns and 1 million records), I fear I will run into performance issue comapred to JDBC approach.
ANy suggestions?
Thanks KOB
JPA, hibernate absolutely supports batch operations. Do whatever you would do with jdbc, but do it with this.
This is my code to batch save/update. I am doing it this way because we have a batch limit of 1000 records to be inserted. If the collection has 25k entries, it will be inserted in batches of 1000.
private static int BATCH_SIZE = 1000; // current batch limit
private void saveBulkEntries(
final Collection<? extends MyObject> entries,
final String insertSql) {
if (entries.isEmpty()) {
return;
}
// Create a new session independent of the current hibernate session
// This avoids problems with the job transactions
StatelessSession session = this.hibernateTemplate.getSessionFactory()
.openStatelessSession();
Transaction transaction = null;
Long entryCounter = 0L;
PreparedStatement batchUpdate = null;
try {
transaction = session.beginTransaction();
batchUpdate = session.connection().prepareStatement(insertSql);
for (MyObject entry : entries) {
entry.addEntry(batchUpdate);
batchUpdate.addBatch();
if (++entryCounter % BATCH_SIZE == 0) {
// Reached limit for uncommitted entries, so commit
batchUpdate.executeBatch();
}
}
// Commit any entries that have not been committed yet
batchUpdate.executeBatch();
batchUpdate.close();
batchUpdate = null;
}
catch (HibernateException ex) {
transaction.rollback();
transaction = null;
}
catch (SQLException ex) {
transaction.rollback();
transaction = null;
}
finally {
if (transaction != null) {
transaction.commit();
}
if (batchUpdate != null) {
try {
batchUpdate.cancel();
batchUpdate.close();
}
catch (SQLException ex) {
}
}
session.close();
}
}
In addition to the excellent advice from hvgotcodes (which points to the hibernate doc about how to use batch updates and how to handle thousands of entities in a single transaction without memory problems), the optimization you could use is to get the DocumentMetaData in chunks, rather than one by one.
The chunk size should also be the same as the JDBC batch size. And you would just have to split your N document IDs into chunks of (for example) 20, and use a where id in (:idsOfChunk)
clause.
Finally, I would like to point that the call to persist is not useful: an entity which has been loaded from the database using the entity manager is attached, and any change done to this entity is automatically written to the database at flush time.
精彩评论