Comparison script help
I'm trying to write a Bash script that will go through a set of directories for a cache and make some comparisons on the contents. (I want to find the two that have the smallest differences for purposes of a project I'm working on).
The structure is that there is a root directory; two subdirectories after that; under that up to 52 directories (a AA b BB etc); and under each of those a variable number of directories where the contents actually are. Basically:
root >> a/b >> a/AA/b/BB/.../z/ZZ >> <some hex-named directory>
So I need to get to that last level, then run diff on the file in that directory (the contents are always named identically) and all the other cached files and figure out what the most similar files are.
The two directories at the top never change name, so that's easy. The directories under those follow a set format (they fill sequentially starting with 'a' and 'AA' up through 'z' and 'ZZ'), so I could just hard code an array for that. The best way I can think to do the last level is to run 'ls > dirList', then read dirList into an array, and use that to go into the directories, and run diff through a loop on every other cache thing using the same algorithm (yes, run time is going to be awful, but it will save a t开发者_StackOverflowremendous amount of time in the long run).
- Is this a reasonable approach? Is there a better, or more efficient way?
- Also, is there a way to get diff to count the number of lines that are different?
I know this is a bit long, but any help would be greatly appreciated. Thanks!
Assuming the 2 directories in your root directory are the ones to compare (a & b), I would try something like that:
min_diff=9999 # big value
file2remember=''
cd a || return $?
find * -type f |while read f
do
n=`diff "$f" "../b/$f"|wc -l`
if [ $n -lt $min_diff ]
then min_diff=$n ; file2remember="$f"
fi
done
echo $file2remember
NB: I do not have a linux or unix box to test that.
精彩评论