Large Number of file concatenation
I have around 3-4 million files in开发者_Go百科 a directory filename ending with, say type1.txt, type2.txt.
(file are 1type1.txt, 1type2.txt,2type2.txt,2type2.txt
etc )
Now I want to concatenate all files ending with type1.txt & type2.txt.
Currently I am doing cat *type1.txt > allTtype1.txt
similarly for type2.txt
.
I wanted to preserve order in both final output file, it is my guess that cat
does that.
But it is too slow.
Please suggest some faster method to do the same.
Thanks, Ravi
You can do this using this command:
ls | while read file; do cat $file >> allTtype${file#*type}; done
But as snap said above in his answer, each time cat
need to open a file, it will have to do an inode lookup which would take a long time in a directory with lots of file. To try to speed things up, you could cat by inode using icat from the Sleuth Kit:
ls -i | while read -a file_array; do icat /dev/sda1 ${file_array[0]} >> allTtype${file_array[1]#*type}; done
And even better, you can put the resulting files in another directory:
ls -i | while read -a file_array; do icat /dev/sda1 ${file_array[0]} >> /newdir/allTtype${file_array[1]#*type}; done
cat
itself is not slow. But every time you expand a shell wild card (? and *), the shell will read and search through all the file names in that directory, which is very slow.
Also the kernel will take time finding the file when you open it by name, which you can not avoid. This depends on the file system in use (unspecified in the question): some file systems are more intelligent with huge directories than others.
To sort this out you might benefit from taking a file listing once:
ls > /tmp/filelist
...and then using grep
or similar for selecting the files out of that list:
cat `grep foo /tmp/filelist` > /out/bar
After you have sorted this mess out, make sure to structure your storage/application in such a way that this does not ever happen again. :) Also make sure to to rmdir
the existing directory after you have gotten your files out of it (using it again for any purpose will not be effective even if there is just a single file in it).
精彩评论