开发者

to SORT 1 BILLION of integer numbers with small physical memory

Want to SORT 1 BILLION of integer numbers and my system has just 1 GB of RAM.What could be the fastest and efficient way to sort?

  1. Say we have an input in a开发者_开发知识库 text file an integer per line.

  2. We are using java program to sort.

  3. I have specified RAM as we cannot hold all the input integers in the RAM.

Update: Integers are 7 digit numbers.


Integers are 7 digit numbers.

So there are only 10 million possible values.

You have 1GB of RAM. Make an array of counters, one for each possible value.

Read through the file once, count up the counters.

When done, output the numbers according to the final counter values.

Every number can occur at most 1 billion times. So a 32bit counter would be enough. This means a 10M x 4 bytes = 40M byte array.


The simplest thing to do is break the input into smaller files that can fit in memory and sort each, and then merge the results.

Guido van Rossum has a good description of doing this in python while obviously not the same language the principle is the same.


You specified that are sorting a billion 7 (decimal) digit numbers.

If there were no duplicates, you could sort in memory with 107 BITS using radix sort. Since you must have duplicates (107 less than 109), you could implement radix sort using (say) an array of 107 8-bit counters, with a HashMap<Integer, Integer> to deal with the relatively few cases where the counters overflow. Or just an array of 107 32-bit counters.

Another more general approach (that works for any kind of value) is to split the file into N smaller subfiles, sort each subfile in memory, and then perform an N-way merge of the sorted subfiles.


Using a BitSet with 4 billion possible values occupies 512 MB. Just set all the int values you see and write them out in order (they are naturally sorted)

This only works if you don't care about duplicates.

If counting duplicates matters I would still consider either a memory mapped file for counting, or using a merge sort of sorted subsections of data. (I believe the later is an expected answer)

I recently bough a 24 GB PC for under £1K, so a few GB isn't that much unless you limited by a hosted solution. (Or using a mobile device)


Assuming every integer occurs exactly one time you can read the file and for every number you find you set a bit - the bit array has to hold 10000000 bits - this uses only 1,28 MB RAM which should be available... after you have read all integers you just go through the array and output the numbers where a bit ist set...

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜