Automatic Transactions in OpenJPA
I have a class, Location. Location contains a List of BorderPoint objects, but it can be a HUGE list (20,000 is not impossible). The tables for this are LOCATION and BORDERPOINT.
I initially populate Location via an import from an ESRI Shapefile. Here's a code snip:
try {
while (featureIterator.hasNext()) {
Location location = new Location();
SimpleFeatureImpl feature = (SimpleFeatureImpl) featureIterator.next();
// set the information in location based on stuff in the feature, lets me clean up this
// method a bit
setLocationInfo(location, feature);
List<BorderPoint> borderPointList = getBorderPoints(feature, location);
//saveBorderPoints(location, feature);
location.setBorderPointList(borderP开发者_JAVA技巧ointList);
try {
locationRepository.persist(location);
} catch (RepositoryException e) {
throw new ServiceException("processShapefile() threw RepositoryException", e);
}
}
} finally {
featureIterator.close();
}
Since there are so many BorderPoint objects in the List, but I am only saving them by calling persist on the Location object, can I automatically set some sort of batch size for persisting the BorderPoints?
I don't know OpenJPA, but I have used Hibernate a lot. You probably have to control the transaction size yourself. If you change the code a little this should be easy:
- Create and persist the Location. You should probably also commit the database transaction.
- Persist BorderPoints into the database, making sure that you have set their parent Location. This implies that the parent Location is mapped on BorderPoint. You probably want to commit every 100 BorderPoints or so.
- Query the Location from the database and access its BorderPoints. All persisted BorderPoints should be there.
If you use JTA you might have to break the import into batches yourself. However, you might want to check if you really have to store each point as a row.
My colleagues tried to save a graph with many points and after getting bad performance they analyzed the usage and realized they always loaded all points. Thus they ended up serializing all points into a blob and the performance improvement was huge.
精彩评论