python: copy only missing files from FTP dirs and sub-dirs to local dirs and sub-dirs
the problem is:
I have a local directory '/local' and a remote FTP directory '/remote' full of subdirectories and files. I want to check if there are any new files in the sub-directories of '/remote'. If there are any, then copy them over to '/local'.
the question is:
am I using the right strategy? Is this totally overkill and is there a much faster pythonic way to do it? DISCLAIMER: I'm a python n00b trying to learn. So be gentle ... =) This is what I've tried:
Create a list of all files in /local开发者_开发技巧 and its sub-dirs.
LocalFiles=[]
for path, subdirs, files in os.walk(localdir):
for name in files:
LocalFiles.append(name)
Do some ftplib magic, using ftpwalk() and copying its results to a list of the form:
RemoteFiles=[['/remote/dir1/','/remote/dir1/','/remote/dir3/'],['file1.txt','file12.py','file3.zip']]
so I have the directory corresponding to each file. Then see which files are missing by comparing the lists of filenames,
missing_files= list(set(RemoteFiles[1]) - set(LocalFiles))
and once I've found their name, I try to find the directory that came with that name,
for i in range(0,len(missing_files)):
theindex=RemoteFiles[1].index(missing_files[i])
which lets me build the list of missing files and their directories,
MissingDirNFiles.append([remotefiles[0][theindex],remotefiles[1][theindex]])
so I can copy them over with ftp.retrbinary. Is this a reasonable strategy? Any tips, comments and advice is appreciated [especially for large numbers of files].
If you get the modification time of both the local and the remote FTP directories and store it in a data base, you could prune the search for new or modified files. This should speed up the sync procedure significantly.
精彩评论