开发者

Java External Sorting Efficiency

I have to write an external sorting program in java which given a file A containing an arbitrary number of integers, sorts them using only file B (which is the same size) as temporary storage. For the first stage I am reading blocks of the file into ram, using the inbuilt java sort and writing back to file B, however this is proving to be very slow. I would like to know if there are any glaring inefficiencies in my code? Note that input1 and output are RandomAccessFile Objcets and BUFFER_SIZE is the block size decided at runtime by the amount of free memory.

public void SortBlocks() throws IOException{
    int startTime = (int) System.currentTimeMillis();
    input1.seek(0);output.seek(0);
    DataInputStream in = new DataInputStream(new BufferedInputStream(new FileInputStream(input1.getFD()),2048));
    DataOutputStream out = new DataOutputStream(new BufferedOutputStream(new FileOutputStream(output.getFD()),2048));
    int[] buffer = new int[BUFFER_SIZE];
    int j=0;
    for(int i=0; i<input1.length();i+=4){
        buffer[j]开发者_如何学C = in.readInt();
        j++;
        if(j == BUFFER_SIZE){
            writeInts(buffer,out,j);
            j=0;
        }
    }
    writeInts(buffer,out,j);
    out.flush();
    SwitchIO();
    int endTime = (int) System.currentTimeMillis();
    System.out.println("sorted blocks in " + Integer.toString(endTime-startTime));
}

    private static void writeInts(int[] Ints, DataOutputStream out, int size) throws IOException{
    Arrays.sort(Ints,0,size);
    for(int i=0;i<size;i++){
        out.writeInt(Ints[i]);
    }
}

Thanks in advance for your feedback.


The most glaring inefficiency is the use of input1.length() which is a relatively expensive operation and you are calling it on every int value.

I can't see why you decrease the buffer size when the default (8192) would be more efficient.

If you are reading files, I would use a ByteBuffer as an IntBuffer. A bottleneck is likely to be the way you read and write data. Using int values in native order would improve the translation performance. (Rather than the default which big endian)

If you access the file as a memory mapped file you may be able to gracefully handle files larger than the memory size.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜