开发者

How to compare directories to determine which files have changed?

We need a script that will compare two directories of files and for each file that has been altered between directory 1 and directory 2 (added, deleted, modified), need to create a subset of only those modified files.

My first impression is to create a python script to traverse each directory, compute a hash of each file, and if the hash has changed, copy the file over to the new su开发者_运维技巧bset of files. Is this a proper approach? Am I neglecting any tools out there which may do this already? I've never used it, but maybe use something like rsync could be used?

Thanks

Edit:

The important part is that I am able to compile a subset of only those files were changed-- so if a only 3 files have changed between versions, I only need those three files copied to a new directory...


It seems to me that you need something as simple as that:

from os.path import getmtime
from os import sep,listdir

rep1 = 'I:\\dada'
rep2 = 'I:\\didi'

R1 = listdir(rep1)
R2 = listdir(rep2)


vanished = [ filename for filename in R1 if filename not in R2]
appeared = [ filename for filename in R2 if filename not in R1]
modified = [ filename for filename in ( f for f in R2 if f in R1)
             if getmtime(rep1+sep+filename)!=getmtime(rep2+sep+filename)]


print 'vanished==',vanished
print 'appeared==',appeared
print 'modified==',modified


That is one completely reasonable approach, but you are essentially reinventing rsync. So yes, use rsync.

edit: There's a way to create "difference-only" folders using rsync


I like diffmerge, it works great for this purpose.


I have modified @eyquem answer a bit!

Arguments can be given as

python file.py dir1 dir2

NOTE : sorts on basis of modification time !

#!/usr/bin/python
import os, sys,time
from os.path import getmtime
from os import sep,listdir

ORIG_DIR = sys.argv[1]#orig:-->/root/backup.FPSS/bin/httpd
MODIFIED_DIR = sys.argv[2]#modified-->/FPSS/httpd/bin/httpd

LIST_OF_FILES_IN_ORIG_DIR = listdir(ORIG_DIR)
LIST_OF_FILES_IN_MODIFIED_DIR = listdir(MODIFIED_DIR)


vanished = [ filename for filename in LIST_OF_FILES_IN_ORIG_DIR if filename not in LIST_OF_FILES_IN_MODIFIED_DIR]
appeared = [ filename for filename in LIST_OF_FILES_IN_MODIFIED_DIR if filename not in LIST_OF_FILES_IN_ORIG_DIR]
modified = [ filename for filename in ( f for f in LIST_OF_FILES_IN_MODIFIED_DIR if f in LIST_OF_FILES_IN_ORIG_DIR) if getmtime(ORIG_DIR+sep+filename)<getmtime(MODIFIED_DIR+sep+filename)]
same = [ filename for filename in ( f for f in LIST_OF_FILES_IN_MODIFIED_DIR if f in LIST_OF_FILES_IN_ORIG_DIR) if getmtime(ORIG_DIR+sep+filename)>=getmtime(MODIFIED_DIR+sep+filename)]

def print_list(arg):
    for f in arg:
        print '----->',f
    print 'Total :: ',len(arg)

print '###################################################################################################'
print 'Files which have Vanished from MOD: ',MODIFIED_DIR,' but still present ',ORIG_DIR,' ==>\n',print_list(vanished)
print '-----------------------------------------------------------------------------------------------------'
print 'Files which are Appearing in MOD: ',MODIFIED_DIR,' but not present ',ORIG_DIR,' ==>\n',print_list(appeared)
print '-----------------------------------------------------------------------------------------------------'
print 'Files in MOD: ',MODIFIED_DIR,' which are MODIFIED if compared to ORIG: ',ORIG_DIR,' ==>\n',print_list(modified)
print '-----------------------------------------------------------------------------------------------------'
print 'Files in MOD: ',MODIFIED_DIR,' which are NOT modified if compared to ORIG: ',ORIG_DIR,' ==>\n',print_list(same)
print '###################################################################################################'


Including Subfolders and comparing hashes of the files (>Python 3.11 required)

from os.path import isdir,normpath
from os import sep,walk
import hashlib

rep1=normpath(input('Folder 1: '))
rep2=normpath(input('Folder 2: '))

def hashcheck(fileloc1,fileloc2): # only works from python 3.11 on
    if isdir(fileloc1) or isdir(fileloc2):
        return False if fileloc1[fileloc1.rfind(sep):]==fileloc2[fileloc2.rfind(sep):] else True
    with open(fileloc1,'rb') as f1:
        f1hash=hashlib.file_digest(f1,"sha256").hexdigest()
    with open(fileloc2,'rb') as f2:
        f2hash=hashlib.file_digest(f2,"sha256").hexdigest()
    return (f1hash!=f2hash)

R1=[]
R2=[]
for wfolder in list(walk(rep1)):
    R1+=(wfolder[0].replace(rep1,'')+sep+item for item in wfolder[2])
for wfolder in list(walk(rep2)):
    R2+=(wfolder[0].replace(rep2,'')+sep+item for item in wfolder[2])

vanished = [ pathname for pathname in R1 if pathname not in R2]
appeared = [ pathname for pathname in R2 if pathname not in R1]
modified = [ pathname for pathname in ( f for f in R2 if f in R1)
            if hashcheck(rep1+sep+pathname,rep2+sep+pathname)]

print ('vanished==',vanished,'\n')
print ('appeared==',appeared,'\n')
print ('modified==',modified,'\n')
input()
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜