Reduce memory imprint when Java application reads gigantic file in chunks
I am creating an application to upload data to a server. The data will be pretty huge, up to 60-70gb. I am using java since I need it to run in any browser.
My approach is something like this:
InputStream s = new FileInputStream(file);
byte[] chunk = new byte[20000000];
s.read(chunk);
s.close();
client.postToServer(c开发者_如何学编程hunk);
For the moment it uses a large amount of memory, steadily climbs to about 1gb, and when the garbage collector hits it is VERY obvious, a 5-6 second gap between chunks.
Is there any way to improve the performance of this and keep the memory footprint to a decent level?
EDIT:
This is not my real code. There is alot of other things I do like calculating CRC, validating against InputStream.read return value, etcetera.
You need to think about buffer reuse, something like this:
int size = 64*1024; // 64KiB
byte[] chunk = new byte[size];
int read = -1;
for( read = s.read(chunk); read != -1; read = s.read(chunk)) {
/*
* I do hope you have some API call like the thing below, or at least one with a wrapper object that
* exposes partially filled buffers. Because read might not be the size of the entire buffer if there
* are less than that amount of bytes available in the input stream until the end of the file...
*/
client.postToServer(chunk, 0, read);
}
The first step would be to re-use your buffer, if you don't already do so. Reading a huge file should not generally require a lot of memory unless you keep it all in memory.
Also: Why are you using such a huge buffer? There's nothing really to be gained from it (unless you have an insanely fast network connection & hard disk). Reducing it to about 64k should have no negative effect on performance and might help Java with the GC.
You can try to tune the garbage collector ( http://www.oracle.com/technetwork/java/gc-tuning-5-138395.html , http://www.petefreitag.com/articles/gctuning/ )
精彩评论