Handling very large amount of data in MyBatis
My goal is actually to dump all the data of a database to an XML file. The database is not terribly big, it's about 300MB. The pr开发者_StackOverflow中文版oblem is that I have a memory limitation of 256MB (in JVM) only. So obviously I cannot just read everything into memory.
I managed to solve this problem using iBatis (yes I mean iBatis, not myBatis) by calling it's getList(... int skip, int max)
multiple times, with incremented skip
. That does solve my memory problem, but I'm not impressed with the speed. The variable names suggests that what the method does under the hood is to read the entire result-set skip then specified record. This sounds quite redundant to me (I'm not saying that's what the method is doing, I'm just guessing base on the variable name).
Now, I switched to myBatis 3 for the next version of my application. My question is: is there any better way to handle large amount of data chunk by chunk in myBatis? Is there anyway to make myBatis process first N records, return them to the caller while keeping the result set connection open so the next time the user calls the getList(...) it will start reading from the N+1 record without doing any "skipping"?
myBatis CAN stream results. What you need is a custom result handler. With this you can take each row separately and write it to your XML file. The overall scheme looks like this:
session.select(
"mappedStatementThatFindsYourObjects",
parametersForStatement,
resultHandler);
Where resultHandler is an instance of a class implementing the ResultHandler interface. This interface has just one method handleResult
. This method provides you with a ResultContext object. From this context you can retrieve the row currently being read and do something with it.
handleResult(ResultContext context) {
Object result = context.getResultObject();
doSomething(result);
}
No, mybatis does not have full capability to stream results yet.
EDIT 1: If you don't need nested result mappings then you could implement a custom result handler to stream results. on current released versions of MyBatis. (3.1.1) The current limitation is when you need to do complex result mapping. The NestedResultSetHandler does not allow custom result handlers. A fix is available, and it looks like is currently targeted for 3.2. See Issue 577.
In summary, to stream large result sets using MyBatis you'll need.
- Implement your own ResultSetHandler.
- Increase fetch size. (as noted below by Guillaume Perrot)
- For Nested result maps, use the fix discussed on Issue 577. This fix also resolves some memory issues with large result sets.
I have successfully used MyBatis streaming with the Cursor. The Cursor has been implemented on MyBatis at this PR.
From the documentation it is described as
A Cursor offers the same results as a List, except it fetches data lazily using an Iterator.
Besides, the code documentation says
Cursors are a perfect fit to handle millions of items queries that would not normally fits in memory.
Here is an example of implementation I have done and which I was able to successfully use it:
import org.mybatis.spring.SqlSessionFactoryBean;
// You have your SqlSessionFactory somehow, if using Spring you can use
SqlSessionFactoryBean sqlSessionFactory = new SqlSessionFactoryBean();
Then you define your mapper, e.g., UserMapper
with the SQL query that returns a Cursor of your target object, not a List. The whole idea is to not store all the elements in memory:
import org.apache.ibatis.annotations.Select;
import org.apache.ibatis.cursor.Cursor;
public interface UserMapper {
@Select(
"SELECT * FROM users"
)
Cursor<User> getAll();
}
Then you write the that code that will use an open SQL session from the factory and query using your mapper:
try(SqlSession sqlSession = sqlSessionFactory.openSession()) {
Iterator<User> iterator = sqlSession.getMapper(UserMapper.class)
.getAll()
.iterator();
while (iterator.hasNext()) {
doSomethingWithUser(iterator.next());
}
}
handleResult receives as many records as the query gets, no pause.
When there are too many records to process I used sqlSessionFactory.getSession().getConnection(). Then as, normal JDBC, get a Statement, get the Resultset, and process one by one the records. Don't forget to close the session.
If just dumping all the data without any ordering requirement from tables, why not directly do the pagination in SQL? Set a limit to the query statement, where specifying different record id as the offset, to separate the whole table into chunks, each of which could directly be read into memory if the rows limit is a reasonable number.
The sql could be something like:
SELECT * FROM resource
WHERE "ID" >= continuation_id LIMIT 300;
I think this could be viewed as an alternative solution for you to dump all the data by chunks, getting rid of the different feature problems in mybatis, or any Persistence layer, support.
精彩评论