开发者

recursive diff is extremely slow - checking contents of directories

I am running a diff on two directories, recursively, with a few options. The directories are somewhat large, however, I am trying to just see the differences in the contents of folders, not between the files, using the -q option (am i using this right?)

I have also tried rsync dry run, that seems to take equally as long. The output goes through sed, I have tried without, it doesn't seem to effect anything. I also ignore hidden files. I think I may be mis-using diff -q to just compare the contents of 2 directories.

I used a code block from another tip to time how long just comparing ONE of these directories was (1 directory, 14 subdirectories) and it took 88 minutes. However, every file was a 30 minutes long TV-show, so if diff is comparing these files, that makes sense, but I thought that -q would cause that to not happen?

Also, one directory is mounted over AFP, one is a firewire connected external drive. This doesn't matter, because I copied both directories locally and the diff took the same amount of time.

I have a solution to this - I ran ls -1 over both directories and diff'd the output - but why is diff taking so long to run?

Here is the code; any suggestions?

#!/bin/bash

before="$(date +%s)"

diff -r -x '.*' /Volumes/directory1/ /Volumes/directory2/ | sed 's/^.\{24\}//g' > /Volumes/stuff.txt
diff -r -x '.*' /Volumes/directory3/ /Volumes/directory4/ | sed 's/^.\{24\}//g' > /Volumes/stuff.txt
diff -r -x '.*' /Volumes/directory5/ /Volumes/directory6/ | sed 's/^.\{24\}//g' > /Volumes/stuff.txt
diff -r -x '.*' /Volumes/directory7/ /Volumes/开发者_开发技巧directory8/ | sed 's/^.\{24\}//g' > /Volumes/stuff.txt
diff -r -x '.*' /Volumes/directory9/ /Volumes/directory10/ | sed 's/^.\{24\}//g' > /Volumes/stuff.txt
diff -r -x '.*' /Volumes/directory11/ /Volumes/directory12/ | sed 's/^.\{24\}//g' > /Volumes/stuff.txt

after="$(date +%s)"
elapsed_seconds="$(expr $after - $before)"
echo Elapsed time for code block: $elapsed_seconds


When files are different diff will be able to figure that out fairly quickly. When they're the same, though, it has to scan the files in full to verify that they are indeed byte-for-byte identical.

If all you care about is differences in file names and don't want to inspect the contents of the files, try something like:

diff <(find /Volumes/directory1/ -printf '%P\n') \
     <(find /Volumes/directory2/ -printf '%P\n')

This assumes you have GNU find with the -printf action. If you don't, use some subshell magic per Gordon's comment:

diff <(cd /Volumes/directory1; find .) \
     <(cd /Volumes/directory2; find .)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜