How to find files with same size?
I have a file structure like so
a/file1
a/file2
a/file3
a/...
b/file1
b/file2
b/file3
b/...
...
where within each dir, some files have the same file size, and I would like to delete those.
I guess if the problem co开发者_如何学编程uld be solved for one dir e.g. dir a
, then I could wrap a for-loop around it?
for f in *; do
???
done
But how do I find files with same size?
ls -l|grep '^-'|awk '{if(a[$5]){ a[$5]=a[$5]"\n"$NF; b[$5]++;} else a[$5]=$NF} END{for(x in b)print a[x];}'
this will only check files, no directories.
$5 is the size of ls command
test:
kent@ArchT60:/tmp/t$ ls -l
total 16
-rw-r--r-- 1 kent kent 51 Sep 24 22:23 a
-rw-r--r-- 1 kent kent 153 Sep 24 22:24 all
-rw-r--r-- 1 kent kent 51 Sep 24 22:23 b
-rw-r--r-- 1 kent kent 51 Sep 24 22:23 c
kent@ArchT60:/tmp/t$ ls -l|grep '^-'|awk '{if(a[$5]){ a[$5]=a[$5]"\n"$NF; b[$5]++;} else a[$5]=$NF} END{for(x in b)print a[x];}'
a
b
c
kent@ArchT60:/tmp/t$
update based on Michał Šrajer 's comment:
Now filenames with spaces are also supported
command:
ls -l|grep '^-'|awk '{ f=""; if(NF>9)for(i=9;i<=NF;i++)f=f?f" "$i:$i; else f=$9;
if(a[$5]){ a[$5]=a[$5]"\n"f; b[$5]++;} else a[$5]=f}END{for(x in b)print a[x];}'
test:
kent@ArchT60:/tmp/t$ l
total 24
-rw-r--r-- 1 kent kent 51 Sep 24 22:23 a
-rw-r--r-- 1 kent kent 153 Sep 24 22:24 all
-rw-r--r-- 1 kent kent 51 Sep 24 22:23 b
-rw-r--r-- 1 kent kent 51 Sep 24 22:23 c
-rw-r--r-- 1 kent kent 51 Sep 24 22:40 x y
kent@ArchT60:/tmp/t$ ls -l|grep '^-'|awk '{ f=""
if(NF>9)for(i=9;i<=NF;i++)f=f?f" "$i:$i; else f=$9;
if(a[$5]){ a[$5]=a[$5]"\n"f; b[$5]++;} else a[$5]=f} END{for(x in b)print a[x];}'
a
b
c
x y
kent@ArchT60:/tmp/t$
Solution working with "file names with spaces" (based on Kent (+1) and awiebe (+1) posts):
for FILE in *; do stat -c"%s/%n" "$FILE"; done | awk -F/ '{if ($1 in a)print $2; else a[$1]=1}' | xargs echo rm
to make it remove duplicates, remove echo
from xargs.
Here is code if you need the size of a file:
FILESIZE=$(stat -c%s "$FILENAME")
echo "Size of $FILENAME = $FILESIZE bytes."
Then use a for loop to get the first item in your structure, Store the size of that file in a variable.
Nest a for loop in that for loop to each item in your structure(excluding the current item) to the current item.
Route all the names of identical files into a text file to ensure you have written you script correctly(insteed of executing rm immediately) .
Execute rm on the contents of this file.
Based on the accepted answer, the following provides a list of all the files of the same size in the current directory (so you can choose which one to keep), sorted by size:
for FILE in *; do stat -c"%s/%n" "$FILE"; done | awk -F/ '{if ($1 in a)print a[$1]"\n"$2; else a[$1]=$2}' | sort -u | tr '\n' '\0' | xargs -0 ls -lS
To determine if the files are actually the same, not just the contain the same number of bytes, do an shasum
or md5sum
on each file:
for FILE in *; do stat -c"%s/%n" "$FILE"; done | awk -F/ '{if ($1 in a)print a[$1]"\n"$2; else a[$1]=$2}' | sort -u | tr '\n' '\0' | xargs -0 -n1 shasum
Plain bash solution
find -not -empty -type f -printf "%s\n" |
sort -rn | uniq -d |
xargs -I{} -n1 find -type f -size {}c -print0 |
xargs -0 du | sort
Looks like what you really want is a duplicate file finder?
It sounds like this has been answered several times and in several different ways, so I may be beating a dead horse but here goes...
find DIR_TO_RUN_ON -size SIZE_OF_FILE_TO_MATCH -exec rm {} \;
find is an awesome command and I highly recommend reading its manpage.
精彩评论