开发者

python: check if url to jpg exists

In python, how would I check if a url ending in .jpg exists?

ex: http://www.开发者_JAVA百科fakedomain.com/fakeImage.jpg

thanks


The code below is equivalent to tikiboy's answer, but using a high-level and easy-to-use requests library.

import requests

def exists(path):
    r = requests.head(path)
    return r.status_code == requests.codes.ok

print exists('http://www.fakedomain.com/fakeImage.jpg')

The requests.codes.ok equals 200, so you can substitute the exact status code if you wish.

requests.head may throw an exception if server doesn't respond, so you might want to add a try-except construct.

Also if you want to include codes 301 and 302, consider code 303 too, especially if you dereference URIs that denote resources in Linked Data. A URI may represent a person, but you can't download a person, so the server will redirect you to a page that describes this person using 303 redirect.


>>> import httplib
>>>
>>> def exists(site, path):
...     conn = httplib.HTTPConnection(site)
...     conn.request('HEAD', path)
...     response = conn.getresponse()
...     conn.close()
...     return response.status == 200
...
>>> exists('http://www.fakedomain.com', '/fakeImage.jpg')
False

If the status is anything other than a 200, the resource doesn't exist at the URL. This doesn't mean that it's gone altogether. If the server returns a 301 or 302, this means that the resource still exists, but at a different URL. To alter the function to handle this case, the status check line just needs to be changed to return response.status in (200, 301, 302).


thanks for all the responses everyone, ended up using the following:

try:
  f = urllib2.urlopen(urllib2.Request(url))
  deadLinkFound = False
except:
  deadLinkFound = True


Looks like http://www.fakedomain.com/fakeImage.jpg automatically redirected to http://www.fakedomain.com/index.html without any error.

Redirecting for 301 and 302 responses are automatically done without giving any response back to user.

Please take a look HTTPRedirectHandler, you might need to subclass it to handle that.

Here is the one sample from Dive Into Python:

http://diveintopython3.ep.io/http-web-services.html#redirects


There are problems with the previous answers when the file is in ftp server (ftp://url.com/file), the following code works when the file is in ftp, http or https:

import urllib2

def file_exists(url):
    request = urllib2.Request(url)
    request.get_method = lambda : 'HEAD'
    try:
        response = urllib2.urlopen(request)
        return True
    except:
        return False


Try it with mechanize:

import mechanize
br = mechanize.Browser()
br.set_handle_redirect(False)
try:
 br.open_novisit('http://www.fakedomain.com/fakeImage.jpg')
 print 'OK'
except:
 print 'KO'


This might be good enough to see if a url to a file exists.

import urllib
if urllib.urlopen('http://www.fakedomain.com/fakeImage.jpg').code == 200:
  print 'File exists'


in Python 3.6.5:

import http.client

def exists(site, path):
    connection =  http.client.HTTPConnection(site)
    connection.request('HEAD', path)
    response = connection.getresponse()
    connection.close()
    return response.status == 200

exists("www.fakedomain.com", "/fakeImage.jpg")

In Python 3, the module httplib has been renamed to http.client

And you need remove the http:// and https:// from your URL, because the httplib is considering : as a port number and the port number must be numeric.


Python3

import requests

def url_exists(url):
    """Check if resource exist?"""
    if not url:
        raise ValueError("url is required")
    try:
        resp = requests.head(url)
        return True if resp.status_code == 200 else False
    except Exception as e:
        return False


The answer of @z3moon was good, but I think it is for py 2.x. For python 3.x, you may want to add request to the module call.

import urllib
def check_valid_URLs(url) -> bool:
  try:
    if urllib.request.urlopen(url).code == 200:
      return True
    else:
      return False
  except:
    return False


I think you can try send a http request to the url and read the response.If no exception was caught,it probably exists.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜