Avoiding N+One selects and Invalid results from eclipselink with batch read
I'm trying to cut down the number of n+1 selects incurred by my application, the application uses EclipseLink as an ORM and in as many places as possible I've tried to add the batch read hint to queries. In a large number of places in the app I don't always know exactly what relationships I'll be traversing (My view displays fields based on user preferences). At that point I'd like to run one query to populate all of those relationships for my objects.
My dream is to call something like ReadAllRelationshipsQuery(Collection,R开发者_运维技巧elationshipName) and populate all of these items so that later calls to:
Collection.get(0).getMyStuff will already be populated and not cause a db query. How can I accomplish this? I'm willing to write any code I need to but I can't find a way that work with the eclipselink framework?
Why don't I just batch read all of the possible fields and let them load lazily? What I've found is that the batch value holders that implement batch reads don't behave well with the eclipselink cache. If a batch read value holder isn't "evaluated" and ends up in the eclipse link cache it can become stale and return incorrect data (This behavior was logged as an eclipselink bug but rejected...) edit: I found the link to the bug here: https://bugs.eclipse.org/bugs/show_bug.cgi?id=326197
How do I avoid N+1 selects for objects I already have a reference to?
You have three basic ways to load data into objects from a JPA-based solution. These are:
- Load dynamically by object traversal (e.g. myObject.getMyCollection().get()).
- Load graphs of objects by prefetching dynamically using JPA QL (e.g. FETCH JOINs as described at the Oracle JPA tutorial )
- Load by setting the fetch mode ( Is there a way to change the JPA fetch type on a method? )
Each of these has pros and cons.
- Loading dynamically by object transversal will generate more (highly targeted queries). These queries are usually small (not large SQL statements, but may load lots of data) and tend to play nicely with a second level cache, but you can get lots and lots of little queries.
- Prefetching with JPA QL will give you exactly what you want, but that assumes that you know what you want.
- Setting the fetch mode to EAGER will load lots and lots of data for you automatically, but depending on the configuration and usage this may not actually help much (or may make things a lot worse) as you may wind up dragging a LOT of data from the DB into your app that you didn't expect.
Regardless, I highly recommend using p6spy ( http://sourceforge.net/projects/p6spy/ ) in conjunction with any JPA-based application to understand the effects of your tuning.
Unfortunately, JPA makes some things easy and some things hard - mainly, side-effects of your usage. For example, you might fix one problem by setting the fetch mode to eager, and then create another problem where the eager fetch pulls in too much data. EclipseLink does provide tooling to help sort this out ( EclipseLink Performance Tools )
In theory, if you wanted to you could write a generic JavaBean property walker by using something like Apache BeanUtils. Usually just calling a method like size() on a collection is enough to force it to load (although using a collection batch fetch size might complicate things a bit).
One thing to pay particular attention to is the scope of your session and your use of caches (EclipseLink cache).
Something not clear from your post is the scope of a session. Is a session a one shot affair (e.g. like a web page request) or is it a long running thing (e.g. like a classic client/server GUI app)?
It is very difficult to optimize the retrieval of relationships if you do not know what relationships you require.
If you application is requesting what relationships it wants, then you must know at some level which relationships you require, and should be able to optimize these in your query for the objects.
For an overview of relationship optimization techniques see,
http://java-persistence-performance.blogspot.com/2010/08/batch-fetching-optimizing-object-graph.html
For Batch Fetching, there are three types, JOIN, EXISTS, and IN. The problem you outlined of changes to data affecting the original query for cache batched relationships only applies to JOIN and EXISTS, and only when you have a selection criteria based on updateale fields, (if the query you are optimizing is on id, or all instances you are ok). IN batch fetching does not have this issue, so you can use IN batch fetching for all the relationships and not have this issue.
ReadAllRelationshipsQuery(Collection,RelationshipName)
How about,
Query query = em.createQuery("Select o from MyObject o where o.id in :ids");
query.setParameter(ids, ids);
query.setHint("eclipselink.batch", relationship);
If you know all possible relations and the user preferences, why don't you just dynamically build the JPQL string (or Criteria) before executing it?
Like:
String sql = "SELECT u FROM User u"; //use a StringBuilder, this is just for simplity's sake
if(loadAdress)
{
sql += " LEFT OUTER JOIN u.address as a"; //fetch join and left outer join have the same result in many cases, except that with left outer join you could load associations of address as well
}
...
Edit: Since the result would be a cross product, you should then iterate over the entities and remove duplicates.
In the query, use FETCH JOIN to prefetch relationships.
Keep in mind that the resulting rows will be the cross product of all rows selected, which can easily be more work than the N+1 queries.
精彩评论