How to find files with same size?

2023-04-08 17:17 问答作者：

I have a file structure like so

a/file1
a/file2
a/file3
a/...
b/file1
b/file2
b/file3
b/...
...

where within each dir, some files have the same file size, and I would like to delete those.

I guess if the problem co开发者_如何学编程uld be solved for one dir e.g. dir a, then I could wrap a for-loop around it?

for f in *; do
???
done

But how do I find files with same size?

 ls -l|grep '^-'|awk '{if(a[$5]){ a[$5]=a[$5]"\n"$NF; b[$5]++;} else a[$5]=$NF} END{for(x in b)print a[x];}'

this will only check files, no directories.

$5 is the size of ls command

test:

kent@ArchT60:/tmp/t$ ls -l
total 16
-rw-r--r-- 1 kent kent  51 Sep 24 22:23 a
-rw-r--r-- 1 kent kent 153 Sep 24 22:24 all
-rw-r--r-- 1 kent kent  51 Sep 24 22:23 b
-rw-r--r-- 1 kent kent  51 Sep 24 22:23 c
kent@ArchT60:/tmp/t$ ls -l|grep '^-'|awk '{if(a[$5]){ a[$5]=a[$5]"\n"$NF; b[$5]++;} else a[$5]=$NF} END{for(x in b)print a[x];}'
a
b
c
kent@ArchT60:/tmp/t$

update based on Michał Šrajer 's comment:

Now filenames with spaces are also supported

command:

 ls -l|grep '^-'|awk '{ f=""; if(NF>9)for(i=9;i<=NF;i++)f=f?f" "$i:$i; else f=$9; 
        if(a[$5]){ a[$5]=a[$5]"\n"f; b[$5]++;} else a[$5]=f}END{for(x in b)print a[x];}'

test:

kent@ArchT60:/tmp/t$ l
total 24
-rw-r--r-- 1 kent kent  51 Sep 24 22:23 a
-rw-r--r-- 1 kent kent 153 Sep 24 22:24 all
-rw-r--r-- 1 kent kent  51 Sep 24 22:23 b
-rw-r--r-- 1 kent kent  51 Sep 24 22:23 c
-rw-r--r-- 1 kent kent  51 Sep 24 22:40 x y

kent@ArchT60:/tmp/t$ ls -l|grep '^-'|awk '{ f=""
        if(NF>9)for(i=9;i<=NF;i++)f=f?f" "$i:$i; else f=$9; 
        if(a[$5]){ a[$5]=a[$5]"\n"f; b[$5]++;} else a[$5]=f} END{for(x in b)print a[x];}'
a
b
c
x y

kent@ArchT60:/tmp/t$

Solution working with "file names with spaces" (based on Kent (+1) and awiebe (+1) posts):

for FILE in *; do stat -c"%s/%n" "$FILE"; done | awk -F/ '{if ($1 in a)print $2; else a[$1]=1}' | xargs echo rm

to make it remove duplicates, remove echo from xargs.

Here is code if you need the size of a file:

FILESIZE=$(stat -c%s "$FILENAME")
echo "Size of $FILENAME = $FILESIZE bytes."

Then use a for loop to get the first item in your structure, Store the size of that file in a variable.

Nest a for loop in that for loop to each item in your structure(excluding the current item) to the current item.

Route all the names of identical files into a text file to ensure you have written you script correctly(insteed of executing rm immediately) .

Execute rm on the contents of this file.

Based on the accepted answer, the following provides a list of all the files of the same size in the current directory (so you can choose which one to keep), sorted by size:

for FILE in *; do stat -c"%s/%n" "$FILE"; done | awk -F/ '{if ($1 in a)print a[$1]"\n"$2; else a[$1]=$2}' | sort -u | tr '\n' '\0' | xargs -0 ls -lS

To determine if the files are actually the same, not just the contain the same number of bytes, do an shasum or md5sum on each file:

for FILE in *; do stat -c"%s/%n" "$FILE"; done | awk -F/ '{if ($1 in a)print a[$1]"\n"$2; else a[$1]=$2}' | sort -u | tr '\n' '\0' | xargs -0 -n1 shasum

Plain bash solution

find -not -empty -type f -printf "%s\n" | 
sort -rn | uniq -d | 
xargs -I{} -n1 find -type f -size {}c -print0 | 
xargs -0 du | sort

Looks like what you really want is a duplicate file finder?

It sounds like this has been answered several times and in several different ways, so I may be beating a dead horse but here goes...

find DIR_TO_RUN_ON -size SIZE_OF_FILE_TO_MATCH -exec rm {} \;

find is an awesome command and I highly recommend reading its manpage.

继续阅读：bash

How to find files with same size?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？