开发者

Space optimization for the UNIX sort -m command?

I'm trying to run a permutation test, which involves merging a thousand very large pre-sorted files into one mega sorted file.

The current run has some files that are even larger than those I usually merge (68M to 106M each).

I don't have enough hard drive space for the inputs, the temporary intermediates, and the outputs all at the same time. Is there a way to destructively merge all of these files using sort?

Here's the command I'm currently using:

sort -T /media/WD_Book/tmp --compress-program=gzip -g -k 6 -m *.rand.tab > /media/WD_Book/output/merged.rand.tab

(The files are numbered 0001.rand.tab through 1000.rand.tab, and the sort key is in exponential notation in th开发者_StackOverflow社区e 6th column [thus -k 6 and -g].)

I know it's possible to run a non-merge sort in-place, but the manpage specifically says it won't work for -m.


Maybe that way (WARNING: may wipe data.)

touch merged.rand.tab  # Create a zero-sized result file
for file in [0-9]*.rand.tab; do
   sort -k 6 -g -m merged.rand.tab $file > result.rand.tab
   rm -f merged.rand.tab
   mv result.rand.tab merged.rand.tab
   # you can rm $file if space is really scarce.
done

Basically exchanging space for time. You merge one file at a time with the result of the previous merging. Also, you can remove the already merged file.

Again, backup your data before trying. ;-)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜