开发者

Python Downloader

So I am trying to write a script to download a picture file with python and I found this def using google but every picture I get it to download comes out "corrupt". Any ideas...

def download(url):
 """Copy the contents of a file from a given URL
 to a local file.
 """
 import urllib
 webFile = urllib.urlopen(url)
 localFile = open(url.split('/')[-1], 'w')开发者_C百科
 localFile.write(webFile.read())
 webFile.close()
 localFile.close()

Edit: the code tag didn't retain the indentions very nicely but I can assure you that they are there, that is not my problem.


You can simply do

urllib.urlretrieve(url, filename)

and save yourself any troubles.


You need to open the local file in binary mode:

localFile = open(url.split('/')[-1], 'wb')

Otherwise the CR/LF characters in the binary stream will be mangled, corrupting the file.


You must include the 'b' flag, if you intend on writing a binary file. Line 7 becomes:

localFile = open(url.split('/')[-1], 'wb')

It is not necessary for the code to work, but in the future you might consider:

  • Importing outside of your functions.
  • Using os.path.basename, rather than string parsing to get the name component of a path.
  • Using the with statement to manage files, rather than having to manually close them. It makes your code cleaner, and it ensures that they are properly closed if your code throws an exception.

I would rewrite your code as:

import urllib
import os.path

def download(url):
 """Copy the contents of a file from a given URL
 to a local file in the current directory.
 """
 with urllib.urlopen(url) as webFile:
  with open(os.path.basename(url), 'wb') as localFile:
   localFile.write(webFile.read())


It's coming out corrupt because the function you're using is writing the bytes to the file, as if it was plain text. However, what you need to do is write the bytes to it in binary mode (wb). Here's an idea of what you should do:

import urllib

def Download(url, filename):
  Data = urllib.urlopen(url).read()
  File = open(filename, 'wb')
  File.Write(Data)
  #Neatly close off the file...
  File.flush()
  File.close()
  #Cleanup, for you neat-freaks.
  del Data, File


import subprocess
outfile = "foo.txt"
url = "http://some/web/site/foo.txt"
cmd = "curl.exe -f -o %(outfile)s %(url)s" % locals()
subprocess.check_call(cmd)

Shelling out may seem inelegant but when you start encountering issues with more sophisticated sites, but curl has a wealth of logic for handling getting you through the barriers presented by web servers (cookies, authentication, sessions, etc.)

wget is another alternative.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜