开发者

Is there any way to load result query in memory?

I have a huge database (2.1 billions row) and I need to perform some calculation to extract some statistical results. To my understanding, it's obvious that it is not wise to perform the calculation directly on this database.

What I'm try开发者_JAVA百科ing to do now is to load a portion of the table into memory (says 1 million rows). I'm not sure how to load the whole chunk of the data of the ResultSet into memory without iterating row by row.

Any idea?


it's obvious that it is not wise to perform the calculation directly on this database

On the contrary, it is very likely that doing it in the database is the wisest way to do it. Databases are built precisely to handle huge amounts of data. If you can express your calculation as an SQL query, the database will work out how to execute it efficiently. In general, it will do this far quicker than if you loaded all the data into memory and processed it yourself.

There are some kinds of calculation where this is not the case - graph analyses, for example - but it should be your default approach.


Why do you have a problem with iterating over the result set? If you want the actual data to be retrieved in larger chunks (e.g. to prevent network roundtrips to the database), you can set the result set's fetch size to hint the JDBC driver to retrieve larger portions of data.


I agree with previous answers that it may be not good to load such amount in memory. But answering your question, there is class for disconnected representation of ResultSet: javax.sql.RowSet


Moving millions of rows from the database to the middle tier, just to do a calculation that could be done on the database, makes no sense to me at all. You should look into doing this on the database. Prototype it and see if the data convinces you to drop your prejudice.


What you want to do is called OLAP (Online Analytical Processing). Many RDBMS have a rich feature set for such functionality, precisely for calculating stuff, for instance Oracle Analytic Functions. If the calculation cannot be done in a single SQL query, you can write stored procedures doing the job for you.

But please - do not load terabytes of data out of some database just do do some calculations that can be done in the data store directly. Why? You'll use 99% of the time for data-loading and transfer and 1% for calculation.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜