Query with JOIN FETCH performance problem
I have problem with hibernate query performance which I can't figure out. In code snippet below I need select entities with at least one mapping and filtered mapping. I'm using FETCH JOIN for this to load only filtered ma开发者_StackOverflow社区ppings. But in that case I have performance problems with query. Hibernate says warning log :
org.hibernate.hql.ast.QueryTranslatorImpl - firstResult/maxResults specified with collection fetch; applying in memory!
When I omit FETCH JOIN and left only JOIN query is nice fast. But in result I have all mappings loaded to entity which is not acceptable state for me. Is there a way to boost query performance? There are a lot rows in mapping table.
HQL query :
select distinct e from Entity
join fetch e.mappings as mapping
where e.deleted = 0 and e.mappings is not empty
and e = mapping.e and mapping.approval in (:approvals)
Entities :
@Entity
@Table(name="entity")
class Entity {
...
@OneToMany(mappedBy="entity", cascade=CascadeType.REMOVE, fetch=FetchType.LAZY)
@OrderBy("created")
private List<Mapping> mappings = new ArrayList<Mapping>();
...
}
@Entity
@Table(name="mapping")
class Mapping {
public static enum MappingApproval {
WAITING, // mapping is waiting for approval
APPROVED, // mapping was approved
DECLINED; // mapping was declined
}
...
@ManyToOne(fetch=FetchType.EAGER)
@JoinColumn(name="entity_id", nullable=false)
private Entity entity;
@Enumerated(EnumType.STRING)
@Column(name="approval", length=20)
private MappingApproval approval;
...
}
Thanks
From the JPA-Specifications
The effect of applying setMaxResults or setFirstResult to a query involving fetch joins over collections is undefined. (JPA "Enterprise JavaBeans 3.0, Final Release", Kapitel 3.6.1 Query Interface)
Hibernate does the right thing, but executes a part of the query in memory, which is tremendously slower. In my case the difference is between 3-5 ms to 400-500 ms.
My solution was to implement the paging within the query itself. Works fast with the JOIN FETCH.
If you need a firstResult/maxResults with "fetch" you can split your query in 2 queries:
Query your entity ids with firstResult/maxResults but without the "fetch" on sub-tables:
select entity.id from entity (without fetch) where .... (with firstResult/maxResults)
Query your entities with the "fetch" on the ids returned by your first query:
select entity from entity fetch ... where id in <previous ids>
The reason is slow is because Hibernate executes the SQL query with no pagination at all and the restriction is done in memory.
However, if the join has to scan and fetch 100k records, while you are interested in just 100 results, then 99.9% of the work being done by the Extractor and all the I/O done over networking is just waste.
You can easily turn a JPQL query that uses both JOIN FETCH
and pagination:
List<Post> posts = entityManager.createQuery("""
select p
from Post p
left join fetch p.comments
where p.title like :title
order by p.id
""", Post.class)
.setParameter("title", titlePattern)
.setMaxResults(maxResults)
.getResultList();
into an SQL query that limits the result using DENSE_RANK
by the parent identifier:
@NamedNativeQuery(
name = "PostWithCommentByRank",
query =
"SELECT * " +
"FROM ( " +
" SELECT *, dense_rank() OVER (ORDER BY \"p.created_on\", \"p.id\") rank " +
" FROM ( " +
" SELECT p.id AS \"p.id\", " +
" p.created_on AS \"p.created_on\", " +
" p.title AS \"p.title\", " +
" pc.id as \"pc.id\", " +
" pc.created_on AS \"pc.created_on\", " +
" pc.review AS \"pc.review\", " +
" pc.post_id AS \"pc.post_id\" " +
" FROM post p " +
" LEFT JOIN post_comment pc ON p.id = pc.post_id " +
" WHERE p.title LIKE :titlePattern " +
" ORDER BY p.created_on " +
" ) p_pc " +
") p_pc_r " +
"WHERE p_pc_r.rank <= :rank ",
resultSetMapping = "PostWithCommentByRankMapping"
)
@SqlResultSetMapping(
name = "PostWithCommentByRankMapping",
entities = {
@EntityResult(
entityClass = Post.class,
fields = {
@FieldResult(name = "id", column = "p.id"),
@FieldResult(name = "createdOn", column = "p.created_on"),
@FieldResult(name = "title", column = "p.title"),
}
),
@EntityResult(
entityClass = PostComment.class,
fields = {
@FieldResult(name = "id", column = "pc.id"),
@FieldResult(name = "createdOn", column = "pc.created_on"),
@FieldResult(name = "review", column = "pc.review"),
@FieldResult(name = "post", column = "pc.post_id"),
}
)
}
)
The query can be executed like this:
List<Post> posts = entityManager
.createNamedQuery("PostWithCommentByRank")
.setParameter(
"titlePattern",
"High-Performance Java Persistence %"
)
.setParameter(
"rank",
5
)
.unwrap(NativeQuery.class)
.setResultTransformer(
new DistinctPostResultTransformer(entityManager)
)
.getResultList();
To transform the tabular result set back into an entity graph, you need a ResultTransformer
which looks as follows:
public class DistinctPostResultTransformer
extends BasicTransformerAdapter {
private final EntityManager entityManager;
public DistinctPostResultTransformer(
EntityManager entityManager) {
this.entityManager = entityManager;
}
@Override
public List transformList(
List list) {
Map<Serializable, Identifiable> identifiableMap =
new LinkedHashMap<>(list.size());
for (Object entityArray : list) {
if (Object[].class.isAssignableFrom(entityArray.getClass())) {
Post post = null;
PostComment comment = null;
Object[] tuples = (Object[]) entityArray;
for (Object tuple : tuples) {
if(tuple instanceof Identifiable) {
entityManager.detach(tuple);
if (tuple instanceof Post) {
post = (Post) tuple;
}
else if (tuple instanceof PostComment) {
comment = (PostComment) tuple;
}
else {
throw new UnsupportedOperationException(
"Tuple " + tuple.getClass() + " is not supported!"
);
}
}
}
if (post != null) {
if (!identifiableMap.containsKey(post.getId())) {
identifiableMap.put(post.getId(), post);
post.setComments(new ArrayList<>());
}
if (comment != null) {
post.addComment(comment);
}
}
}
}
return new ArrayList<>(identifiableMap.values());
}
}
That's it!
after increasing memory for JVM things goes much better. After all I end with not using FETCH in queries.
精彩评论