Resume FTP download after timeout
I'm downloading files from a flaky FTP server that often times out during file transfer and I was wondering if there was a way to reconnect and resume the download. I'm using Python's ftplib. Here is the code that I am using:
#! /usr/bin/python
import ftplib
import os
import socket
import sys
#--------------------------------#
# Define parameters for ftp site #
#--------------------------------#
site = 'a.really.unstable.server'
user = 'anonymous'
password = 'someperson@somewhere.edu'
root_ftp_dir = '/directory1/'
root_local_dir = '/directory2/'
#---------------------------------------------------------------
# Tuple of order numbers to download. Each web request generates
# an order numbers
#---------------------------------------------------------------
order_num = ('1','2','3','4')
#----------------------------------------------------------------#
# Loop through each order. Connect to server on each loop. There #
# might be a ti开发者_开发知识库me out for the connection therefore reconnect for #
# every new ordernumber #
#----------------------------------------------------------------#
# First change local directory
os.chdir(root_local_dir)
# Begin loop through
for order in order_num:
print 'Begin Proccessing order number %s' %order
# Connect to FTP site
try:
ftp = ftplib.FTP( host=site, timeout=1200 )
except (socket.error, socket.gaierror), e:
print 'ERROR: Unable to reach "%s"' %site
sys.exit()
# Login
try:
ftp.login(user,password)
except ftplib.error_perm:
print 'ERROR: Unable to login'
ftp.quit()
sys.exit()
# Change remote directory to location of order
try:
ftp.cwd(root_ftp_dir+order)
except ftplib.error_perm:
print 'Unable to CD to "%s"' %(root_ftp_dir+order)
sys.exit()
# Get a list of files
try:
filelist = ftp.nlst()
except ftplib.error_perm:
print 'Unable to get file list from "%s"' %order
sys.exit()
#---------------------------------#
# Loop through files and download #
#---------------------------------#
for each_file in filelist:
file_local = open(each_file,'wb')
try:
ftp.retrbinary('RETR %s' %each_file, file_local.write)
file_local.close()
except ftplib.error_perm:
print 'ERROR: cannot read file "%s"' %each_file
os.unlink(each_file)
ftp.quit()
print 'Finished Proccessing order number %s' %order
sys.exit()
The error that I get:
socket.error: [Errno 110] Connection timed out
Any help is greatly appreciated.
Resuming a download through FTP using only standard facilities (see RFC959) requires use of the block transmission mode (section 3.4.2), which can be set using the MODE B
command. Although this feature is technically required for conformance to the specification, I'm not sure all FTP server software implements it.
In the block transmission mode, as opposed to the stream transmission mode, the server sends the file in chunks, each of which has a marker. This marker may be re-submitted to the server to restart a failed transfer (section 3.5).
The specification says:
[...] a restart procedure is provided to protect users from gross system failures (including failures of a host, an FTP-process, or the underlying network).
However, AFAIK, the specification does not define a required lifetime for markers. It only says the following:
The marker information has meaning only to the sender, but must consist of printable characters in the default or negotiated language of the control connection (ASCII or EBCDIC). The marker could represent a bit-count, a record-count, or any other information by which a system may identify a data checkpoint. The receiver of data, if it implements the restart procedure, would then mark the corresponding position of this marker in the receiving system, and return this information to the user.
It should be safe to assume that servers implementing this feature will provide markers that are valid between FTP sessions, but your mileage may vary.
A simple example for implementing a resumable FTP download using Python ftplib:
def connect():
ftp = None
with open('bigfile', 'wb') as f:
while (not finished):
if ftp is None:
print("Connecting...")
FTP(host, user, passwd)
try:
rest = f.tell()
if rest == 0:
rest = None
print("Starting new transfer...")
else:
print(f"Resuming transfer from {rest}...")
ftp.retrbinary('RETR bigfile', f.write, rest=rest)
print("Done")
finished = True
except Exception as e:
ftp = None
sec = 5
print(f"Transfer failed: {e}, will retry in {sec} seconds...")
time.sleep(sec)
More fine-grained exception handling is advisable.
Similarly for uploads:
Handling disconnects in Python ftplib FTP transfers file upload
To do this, you would have to keep the interrupted download, then figure out which parts of the file you are missing, download those parts and then connect them together. I'm not sure how to do this, but there is a download manager for Firefox and Chrome called DownThemAll that does this. Although the code is not written in python (I think it's JavaScript), you could look at the code and see how it does this.
DownThemll - http://www.downthemall.net/
精彩评论