Using urllib2 in Python. How do I get the name of the file I am downloading?

2023-02-22 07:46 问答作者：

I am a python beginner. I am using urllib2 to download files. When I download a file, I specify a filename to with which to save the downloaded file on my hard drive. However, if I download the file using my browser, a default filename i开发者_C百科s automatically provided.

Here is a simplified version of my code:

def downloadmp3(url):
    webFile = urllib2.urlopen(url)
    filename = 'temp.zip'
    localFile = open(filename, 'w')
    localFile.write(webFile.read())

The file downloads just fine, but if I type the string stored in the variable "url" into my browser, there is a default filename given to the file when I download it. I want to use this filename for my downloaded file not 'temp.zip' or whatever I assign it.

How do I use urllib2 (or some other Python library) to save the file with the filename that the server I am downloading from intends it to have?

If anyone doesn't understand this question, please say so, so that I can try to make it clearer.

The filename is usually included by the server through the content-disposition header:

content-disposition: attachment; filename=foo.pdf

You have access to the headers through

result = urllib2.urlopen(...)
result.info() <- contains the headers


i>>> import urllib2
ur>>> result = urllib2.urlopen('http://zopyx.com')
>>> print result
<addinfourl at 4302289808 whose fp = <socket._fileobject object at 0x1006dd5d0>>
>>> result.info()
<httplib.HTTPMessage instance at 0x1006fbab8>
>>> result.info().headers
['Date: Mon, 04 Apr 2011 02:08:28 GMT\r\n', 'Server: Zope/(unreleased version, python 2.4.6, linux2) ZServer/1.1 Plone/3.3.4\r\n', 'Content-Length: 15321\r\n', 'Content-Type: text/html; charset=utf-8\r\n', 'Via: 1.1 www.zopyx.com\r\n', 'Cache-Control: max-age=3600\r\n', 'Expires: Mon, 04 Apr 2011 03:08:28 GMT\r\n', 'Connection: close\r\n']

See

http://docs.python.org/library/urllib2.html

But be aware that this header does not need to be present. Otherwise you need to generate a reasonable name yourself from the URL requested - e.g. from the last component of the URI. Use the urlparse() method of Python in this case.

My issue with the previous answers is that they were using the original URL, and that would fail in the case of a redirect. Here's how I do it: (note the use of result.url instead of url)

import os
import urllib2
result = urllib2.urlopen(url)
filename = os.path.basename(urllib2.urlparse.urlparse(result.url).path)

You can do that using urlretrieve :

http://docs.python.org/library/urllib.html

I had an issue where server did not give me any content-disposition header so if it's also your case, you can extract filename from url like this:

os.path.basename(urlparse.urlparse(file_url))

In my case, I used file_stream.headers.subtype which contained file extension and I renamed files based on my django's model slug, here's an example:

import urlparse, os

tmp_file = NamedTemporaryFile(delete=True)
file_stream = urllib2.urlopen(file_url)
tmp_file.write(file_stream.read())
tmp_file.flush()

new_file_name = "some_prefix_" + my_model.slug + "." + file_stream.headers.subtype
#You may prefer this:
# new_file_name = os.path.basename(urlparse.urlparse(file_url))

my_model.file.save(new_file_name, File(tmp_file))

Last line is saving file using django's save method, also handling duplicated file names by adding random characters at the end :)

Awesome.

继续阅读：default download filenames python urllib2

Using urllib2 in Python. How do I get the name of the file I am downloading?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？