Struggling to optimize N+1 query in Hibernate
I’m struggling to improve a n+1 query on a project I’m working on. I use Hibernate with the model shown below, and I want to express a query to retrieve all items related t开发者_如何转开发o a portfolio, including the last two prices on each item (price on given date and previous price).
Example API:
List<Items> items = findItemsWithLatestTwoPrices(portfolio, latestPriceDate);
Currently I use one query to extract all items related to the portfolio, and then I iterate over those items to query the two latest prices on a given item (so n+1).
I tried expressing this in native sql using a correlated subquery , but the performance was terrible. This and the fact that there are new prices every day (so the query is getting slower) has lead me to think I need a different model, but I’m struggling to come up with a model that is reasonably effective and constant over time as number of prices increase.
I’ve been thinking about different solutions including representing prices as linked lists, or using some sort of tree but I believe there are better alternatives. Am I missing something obvious? Has anyone working on a similar problem come up with at good solution?
I don't really care wether I use HQL or native SQL as long as the performance is decent. I'm also open to make changes to the model.
Thanks!
[Edit]
Since I have over two years of price data, and there can be 1000+ items pr. portfolio, retrieving the entire graph is probably not a good idea. Also I need random access by date, so storing the two prices as fields on the item is unfortunately not an option.
Not sure I´m catching all your concerns, but like you´ve probably figured, there´s no easy solution to this with Hibernate. It will come down to your modeling of the domain. I think you´re best to separate the normal case and the special case. You can model them in your normal domain, or use special representations for the special cases.
For fetching the n latest prizes have you tried setting the batch size on the relation? Make the relation ordered (latest on top), and then set the batch-size to something like 10. That would make Hibernate query for 10 and 10 rows, and with indices on the foreign key and the order column it should perform ok in most cases.
It also seems to me you could keep extra relations as well as the entire set. Don´t be afraid of explicitly model important relationships like "last months prices", even though it would be duplication of data. It should be possible to avoid duplication in the DB in most cases.
For your random access based on dates it sounds like you´re best served with a custom query instead of access through the domain model, if they´re too slow consider using second level caching, but I´m guessing that your access pattern won´t benefit much from this.
You should try to retrieve the items AND the prices in one query. If you do so, you can iterate over your items and their prices without needing to do a select for every item. Your n+1 problem should then be gone.
For example, you could use eager fetching within your query or on the definition of your association.
Relating to your performance concern of increasing price objects. Maybe you can store the two lates prices in one or two extra fields of your item class. Then you could always eager fetch those extra fields and lazy fetch the older prices in your collection if you need to.
You can try a couple of options
- Since your prices are date based you can look at partitioning your data on the db by month. This will considerably help your queries as the number of records for price lookup would considerably decrease instead of looking at the entire 2 year prices. Try the SQL query after this. Also do run the explain to make sure you are hitting the right indexes etc.
- Have you considered caching (eg: Memcache) ? You can pre-load your item prices for current & previous price to cache. Then you can fetch the portfolio, items & lookup cache for the prices which should be pretty quick.
If you're using Postgre or Oracle, you could easily use an analytic / windowing functionon those prices when you join them, retrieving the first two values. As long as the column for ORDER BY
is indexed, that should give good enough performance.
P.S. Next time, if you say you're considering using native SQL -- add the DB vendor/version.
精彩评论