how to handle large amount of float data?
We have a binary file which contains a large amount of float
data (about 80MB). we need to process it in our Java application. The data is from a med开发者_如何学Goical scanner. One file contains data from one Rotation
. One Rotation
contains 960 Views
. One View
contains 16 Rows
and one Rows
contains 1344 Cells
. Those numbers (their relationship) are fixed.
We need to read ALL the floats into our application with a code structure reflect above structure about Rotation-view-row-cell
.
What we are doing now is using float[]
to hold data for Cells
and then using ArrayList
for Rotation
, View
and Row
to hold their data.
I have two questions:
- how to populate the Cell data (read floats into our float[]) quickly?
- do you have better idea to hold those data?
- Use a
DataInputStream
(and itsreadFloat()
method) wrapping aFileInputStream
, possibly with eBufferedInputStream
in between (try whether the buffer helps performance or not). - Your data structure looks fine.
Assuming you don't make changes to the data (add more views, etc.) why not put everything in one big array? The point of ArrayLists is you can grow and shrink them, which you don't need here. You can write access methods to get the right cell for a given view, rotation, etc.
Using arrays of arrays is a better idea, that way the system is figuring out how to access what for you and it is just as fast as a single array.
Michael is right, you need to buffer the input, otherwise you will be doing a file access operation for every byte and your performance will be awful.
If you want to stick with the current approach as much as possible, you can minimize the memory used by your ArrayLists by setting their capacity to the number of elements they hold. Otherwise they keep a number of slots in reserve, expecting you to add more.
For the data loading:
DataInputStream should work well. But make sure you wrap the underlying FileInputStream in a BufferedInputStream, otherwise you run the risk of doing I/O operations for every float which can kill performance.
Several options for holding the data:
- The (very marginally) most memory-efficient way will be to store the entire array in on large float[], and calculate offsets into it as needed. A bit ugly to use, but might make sense if you are doing a lot of calculations or processing loops over the entire set.
- The most "OOP" way would be to have separate objects for Rotation, View, Row and Cell. But having each cell as a separate object is pretty wasteful, might even blow your memory limits.
- You could use nested ArrayLists with a float[1344] to represent the lowest level data for the cells in each row. I understand this is what you're currently doing - in fact I think it's a pretty good choice. The overhead of the ArrayLists won't be much compared to the overall data size.
- A final option would be to use a float[rotationNum][rowNum][cellNum] to represent each rotation. A bit more efficient than ArrayLists, but arrays are usually less nice to manipulate. However this seems a pretty good option if, as you say, the array sizes will always be fixed. I'd probably choose this option myself.
Are you having any particular performance/usage problems with your current approach?
The only thing I can suggest based on the information that you provide is to try representing a View as float[][] of rows and cells.
I also think that you can put all your data structure into a float[][][]
(same as Nathan Hughes suggests). You could have a method that reads your file and return a float[][][]
, where the first dimension is that of views (960), the second is that of rows (16), and the third is that of cells (1344): if those numbers are fixes, you'd better use this approach: you save memory, and it's faster.
80 MB shouldn't be so much data that you need to worry so terribly much. I would really suggest:
- create Java wrapper objects representing the most logical structure/hierarchy for the data you have;
- one way or another, ensure that you're only making an actual "raw" I/O call (so an InputStream.read() or equivalent) every 16K or so of data-- e.g. you could read into a 16K/32K byte array that is wrapped in a ByteBuffer for the purpose of pulling out the floats, or whatever you need for your data;
- if you actually have a performance problem with this approach, try to identify, not second-guess, what that performance problem actually is.
I understand that you are looking effective way of store data you described above, though size you mentioned is not very huge i would suggest you to have look at Huge Collections.
精彩评论