python: check if url to jpg exists
In python, how would I check if a url ending in .jpg exists?
ex: http://www.开发者_JAVA百科fakedomain.com/fakeImage.jpg
thanks
The code below is equivalent to tikiboy's answer, but using a high-level and easy-to-use requests library.
import requests
def exists(path):
r = requests.head(path)
return r.status_code == requests.codes.ok
print exists('http://www.fakedomain.com/fakeImage.jpg')
The requests.codes.ok
equals 200
, so you can substitute the exact status code if you wish.
requests.head
may throw an exception if server doesn't respond, so you might want to add a try-except construct.
Also if you want to include codes 301
and 302
, consider code 303
too, especially if you dereference URIs that denote resources in Linked Data. A URI may represent a person, but you can't download a person, so the server will redirect you to a page that describes this person using 303 redirect.
>>> import httplib
>>>
>>> def exists(site, path):
... conn = httplib.HTTPConnection(site)
... conn.request('HEAD', path)
... response = conn.getresponse()
... conn.close()
... return response.status == 200
...
>>> exists('http://www.fakedomain.com', '/fakeImage.jpg')
False
If the status is anything other than a 200, the resource doesn't exist at the URL. This doesn't mean that it's gone altogether. If the server returns a 301 or 302, this means that the resource still exists, but at a different URL. To alter the function to handle this case, the status check line just needs to be changed to return response.status in (200, 301, 302)
.
thanks for all the responses everyone, ended up using the following:
try:
f = urllib2.urlopen(urllib2.Request(url))
deadLinkFound = False
except:
deadLinkFound = True
Looks like http://www.fakedomain.com/fakeImage.jpg
automatically redirected to http://www.fakedomain.com/index.html
without any error.
Redirecting for 301 and 302 responses are automatically done without giving any response back to user.
Please take a look HTTPRedirectHandler, you might need to subclass it to handle that.
Here is the one sample from Dive Into Python:
http://diveintopython3.ep.io/http-web-services.html#redirects
There are problems with the previous answers when the file is in ftp server (ftp://url.com/file), the following code works when the file is in ftp, http or https:
import urllib2
def file_exists(url):
request = urllib2.Request(url)
request.get_method = lambda : 'HEAD'
try:
response = urllib2.urlopen(request)
return True
except:
return False
Try it with mechanize:
import mechanize
br = mechanize.Browser()
br.set_handle_redirect(False)
try:
br.open_novisit('http://www.fakedomain.com/fakeImage.jpg')
print 'OK'
except:
print 'KO'
This might be good enough to see if a url to a file exists.
import urllib
if urllib.urlopen('http://www.fakedomain.com/fakeImage.jpg').code == 200:
print 'File exists'
in Python 3.6.5:
import http.client
def exists(site, path):
connection = http.client.HTTPConnection(site)
connection.request('HEAD', path)
response = connection.getresponse()
connection.close()
return response.status == 200
exists("www.fakedomain.com", "/fakeImage.jpg")
In Python 3, the module httplib
has been renamed to http.client
And you need remove the http://
and https://
from your URL, because the httplib
is considering :
as a port number and the port number must be numeric.
Python3
import requests
def url_exists(url):
"""Check if resource exist?"""
if not url:
raise ValueError("url is required")
try:
resp = requests.head(url)
return True if resp.status_code == 200 else False
except Exception as e:
return False
The answer of @z3moon was good, but I think it is for py 2.x. For python 3.x, you may want to add request
to the module call.
import urllib
def check_valid_URLs(url) -> bool:
try:
if urllib.request.urlopen(url).code == 200:
return True
else:
return False
except:
return False
I think you can try send a http request to the url and read the response.If no exception was caught,it probably exists.
精彩评论