check if url exist at fanfiction.net
I am trying to find out the last chapter number of a story at www.fanfiction.net just for fun. For this I thought that since it has a fixed pattern of url I will just increment the chapter number till the time that it gives me a url which does not exist.
To find whether the url existed I tried out the script a开发者_运维知识库t this stackoverflow ques
However i found out that it does not give a response error of > 400 and rather gives a message along with 200 response. What would be the best way to identify that the page exists or not.
Here is a link that actually exists exists and here is one that does not exist does not exist
How can i do so ?
EDIT 1
Thanks to GregSchoen I worked it out. I hope it is correct though :)
I checked out the values for resp.getheader("last-modified", None) and it gives some date for active links and None for those which are not.
Thanks a lot
If you do a HEAD request on the URLs you supplied, Last-Modified is set on valid pages but not on invalid pages. This would be an easy way to key on valid pages, since their server is not responding with a proper HTTP code.
Perhaps use cURL, read 100 bytes and just look for "FanFiction.Net Message Type 1" at the start of the data?
That website isn't giving a 404
error, which renders all of those scripts useless. You will need to download the whole webpage and check whether it looks like a 404
page.
I think just running:
if (page.find('<style>') == 0):
does the trick, as the page begins with a <style>
tag (a normal page shouldn't).
精彩评论