Delete sequential, identical, duplicate files
I have a server running Windows Server 2003 R2 Enterprise with directories of anywhere between 50,000 to 250,000 1KB text files each. The filenames are sequential开发者_Go百科 (e.g., MLLP000001.rcv, MLLP000002.rcv, etc.) and identical files will be sequential. Once subsequent files differ, I can expect I won't receive another identical file.
I need a script that will do the following, but I don't know where to begin.
for each file in the target directory index 'i'
{
for each file in the target directory index 'j' = i+1
{
compare the hash values of files i and j
if the hashes are identical
delete file j
if the hashes differ
set i = j // to skip past the files that are now deleted
break
}
}
I tried DOS batch scripts, but that's really cumbersome, I can't break out of the inner loop, and it trips over itself because the outer loop has a list of files in the directory, but that list is constantly changing. VBScript doesn't have a hash function as far as I know.
Since the files are only 1KB in size, why not do a bitwise compare and avoid the hash?
Sounds like you could do something like:
Set Files to an array of files in a given directory.
Set PreviousHash to hash of the first file in the Files.
For each CurrentFile file after the first in Files,
Set CurrentHash to hash of the CurrentFile.
If CurrentHash is equal to PreviousHash, then delete CurrentFile.
Else, set PreviousHash to CurrentHash.
精彩评论