Space optimization for the UNIX sort -m command?
I'm trying to run a permutation test, which involves merging a thousand very large pre-sorted files into one mega sorted file.
The current run has some files that are even larger than those I usually merge (68M to 106M each).
I don't have enough hard drive space for the inputs, the temporary intermediates, and the outputs all at the same time. Is there a way to destructively merge all of these files using sort
?
Here's the command I'm currently using:
sort -T /media/WD_Book/tmp --compress-program=gzip -g -k 6 -m *.rand.tab > /media/WD_Book/output/merged.rand.tab
(The files are numbered 0001.rand.tab
through 1000.rand.tab
, and the sort key is in exponential notation in th开发者_StackOverflow社区e 6th column [thus -k 6
and -g
].)
I know it's possible to run a non-merge sort
in-place, but the manpage specifically says it won't work for -m
.
Maybe that way (WARNING: may wipe data.)
touch merged.rand.tab # Create a zero-sized result file
for file in [0-9]*.rand.tab; do
sort -k 6 -g -m merged.rand.tab $file > result.rand.tab
rm -f merged.rand.tab
mv result.rand.tab merged.rand.tab
# you can rm $file if space is really scarce.
done
Basically exchanging space for time. You merge one file at a time with the result of the previous merging. Also, you can remove the already merged file.
Again, backup your data before trying. ;-)
精彩评论