Handling very large amount of data in MyBatis

2023-03-17 03:27 问答作者：

My goal is actually to dump all the data of a database to an XML file. The database is not terribly big, it's about 300MB. The pr开发者_StackOverflow中文版oblem is that I have a memory limitation of 256MB (in JVM) only. So obviously I cannot just read everything into memory.

I managed to solve this problem using iBatis (yes I mean iBatis, not myBatis) by calling it's getList(... int skip, int max) multiple times, with incremented skip. That does solve my memory problem, but I'm not impressed with the speed. The variable names suggests that what the method does under the hood is to read the entire result-set skip then specified record. This sounds quite redundant to me (I'm not saying that's what the method is doing, I'm just guessing base on the variable name).

Now, I switched to myBatis 3 for the next version of my application. My question is: is there any better way to handle large amount of data chunk by chunk in myBatis? Is there anyway to make myBatis process first N records, return them to the caller while keeping the result set connection open so the next time the user calls the getList(...) it will start reading from the N+1 record without doing any "skipping"?

myBatis CAN stream results. What you need is a custom result handler. With this you can take each row separately and write it to your XML file. The overall scheme looks like this:

session.select(
    "mappedStatementThatFindsYourObjects",
    parametersForStatement,
    resultHandler);

Where resultHandler is an instance of a class implementing the ResultHandler interface. This interface has just one method handleResult. This method provides you with a ResultContext object. From this context you can retrieve the row currently being read and do something with it.

handleResult(ResultContext context) {
  Object result = context.getResultObject();
  doSomething(result);
}

No, mybatis does not have full capability to stream results yet.

EDIT 1: If you don't need nested result mappings then you could implement a custom result handler to stream results. on current released versions of MyBatis. (3.1.1) The current limitation is when you need to do complex result mapping. The NestedResultSetHandler does not allow custom result handlers. A fix is available, and it looks like is currently targeted for 3.2. See Issue 577.

In summary, to stream large result sets using MyBatis you'll need.

Implement your own ResultSetHandler.
Increase fetch size. (as noted below by Guillaume Perrot)
For Nested result maps, use the fix discussed on Issue 577. This fix also resolves some memory issues with large result sets.

I have successfully used MyBatis streaming with the Cursor. The Cursor has been implemented on MyBatis at this PR.

From the documentation it is described as

A Cursor offers the same results as a List, except it fetches data lazily using an Iterator.

Besides, the code documentation says

Cursors are a perfect fit to handle millions of items queries that would not normally fits in memory.

Here is an example of implementation I have done and which I was able to successfully use it:

import org.mybatis.spring.SqlSessionFactoryBean;

// You have your SqlSessionFactory somehow, if using Spring you can use 
SqlSessionFactoryBean sqlSessionFactory = new SqlSessionFactoryBean();

Then you define your mapper, e.g., UserMapper with the SQL query that returns a Cursor of your target object, not a List. The whole idea is to not store all the elements in memory:

import org.apache.ibatis.annotations.Select;
import org.apache.ibatis.cursor.Cursor;

public interface UserMapper {

    @Select(
        "SELECT * FROM users"
    )
    Cursor<User> getAll();
}

Then you write the that code that will use an open SQL session from the factory and query using your mapper:

try(SqlSession sqlSession = sqlSessionFactory.openSession()) {
    Iterator<User> iterator = sqlSession.getMapper(UserMapper.class)
                                        .getAll()
                                        .iterator();
    while (iterator.hasNext()) {
        doSomethingWithUser(iterator.next());
    }
}

handleResult receives as many records as the query gets, no pause.

When there are too many records to process I used sqlSessionFactory.getSession().getConnection(). Then as, normal JDBC, get a Statement, get the Resultset, and process one by one the records. Don't forget to close the session.

If just dumping all the data without any ordering requirement from tables, why not directly do the pagination in SQL? Set a limit to the query statement, where specifying different record id as the offset, to separate the whole table into chunks, each of which could directly be read into memory if the rows limit is a reasonable number.

The sql could be something like:

SELECT * FROM resource 
    WHERE "ID" >= continuation_id LIMIT 300;

I think this could be viewed as an alternative solution for you to dump all the data by chunks, getting rid of the different feature problems in mybatis, or any Persistence layer, support.

继续阅读：large-data mybatis

Handling very large amount of data in MyBatis

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？