开发者

Need help specifying a ending while condition

I have written a Python script to download all of the xkcd comic images. The only problem is I can't tell it to stop when it gets to the last one... Here is what I have so far.

import re, mechanize
from urllib import urlretrieve
from BeautifulSoup import BeautifulSoup as bs

baseUrl = "http://xkcd.com/1/" #Specify the first comic page
br = mechanize.Browser() #Create a browser

response = br.open(baseUrl) #Create an initial response

x = 1 #Assign an initial file name
while (SomeCondition):
    soup = bs(response.get_data()) #Create an instance of bs that contains the response data
    img = soup.findAll('img')[1] #Get the online file path of the image
    localFile = "C:\\Comics\\xkcd\\" + str(x) + ".jpg"  #Come up with a local file name
    urlretrieve(img["src"], localFile) #Download the image file
    response = br.follow_link(text = "Next >") #Store the response of the next button
    x += 1 #Increase x by 1
print "All xkcd comics downloaded" #Let the user know the images have been downloaded

Initially what I had was something like

开发者_开发技巧while br.follow_link(text = "Next >") != br.follow_link(text = ">|"):

but by doing this I actually send skip to the last page before the script has a chance to perform the intended purpose.


When you follow the "Next" link from the most recent xkcd comic, a hash tag is appended to the URL. Try using the following.

while not br.geturl().endswith("#"):
    ...
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜