开发者

check if url exist at fanfiction.net

I am trying to find out the last chapter number of a story at www.fanfiction.net just for fun. For this I thought that since it has a fixed pattern of url I will just increment the chapter number till the time that it gives me a url which does not exist.

To find whether the url existed I tried out the script a开发者_运维知识库t this stackoverflow ques

However i found out that it does not give a response error of > 400 and rather gives a message along with 200 response. What would be the best way to identify that the page exists or not.

Here is a link that actually exists exists and here is one that does not exist does not exist

How can i do so ?

EDIT 1

Thanks to GregSchoen I worked it out. I hope it is correct though :)

I checked out the values for resp.getheader("last-modified", None) and it gives some date for active links and None for those which are not.

Thanks a lot


If you do a HEAD request on the URLs you supplied, Last-Modified is set on valid pages but not on invalid pages. This would be an easy way to key on valid pages, since their server is not responding with a proper HTTP code.


Perhaps use cURL, read 100 bytes and just look for "FanFiction.Net Message Type 1" at the start of the data?


That website isn't giving a 404 error, which renders all of those scripts useless. You will need to download the whole webpage and check whether it looks like a 404 page.

I think just running:

if (page.find('<style>') == 0):

does the trick, as the page begins with a <style> tag (a normal page shouldn't).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜