开发者

Deleting (near) Duplicate Files

What's the best scripted way to delete (near) duplicate files based on filespec in Windows (XP in this case)? I am thinking of RegEX and some VB Script but if there is a better way...

E开发者_如何学JAVAxamples include filenames that slighlty differ in name either with one or two (known) extra characters at the end or beggining but are identical in size, files that are slighlty different in size as well..etc

Is Regex the best way to handle these variances if the boundaries are known.


No, I don't think regex is the right tool here. It sounds a bit dangerous, if you ask me. Anyway, you could calculate the Levenshtein distance between the two file names and if sufficiently small (be careful with file names that consist of just a couple of characters!) delete one of the two.

The sizes can be done using simple arithmetic.


You can use regex to match (or near match) the filenames.

I would use regex to match the names, and build a list of file sizes. You can calculate a variance based on those file sizes which fall within that variance.

After you have build the list of matching files you can access different file attributes (size , date etc.) to flag which files to delete.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜