开发者

robots.txt parser java

I want to know how to parse the robots.txt in java.

Is there already any co开发者_运维问答de?


Heritrix is an open-source web crawler written in Java. Looking through their javadoc, I see that they have a utility class Robotstxt for parsing the robots.txt file.


There's also jrobotx library hosted at SourceForge.

(Full disclosure: I spun off the code that forms that library.)


There is also a new release of crawler-commons:

https://github.com/crawler-commons/crawler-commons

The library aims to implement functionality common to any web crawler and this includes a very handy robots.txt parser

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜