开发者

how to get Company contact page url

Hi i have csv file 开发者_JAVA技巧which contains company url list like this www.google.com,www.ibm.com.....

Here i want to get contactus or aboutus page url (example http://www.google.com/contact) for each url which is present in csv file i have one idea checking the links with the following patterns (contact us, about us, about, locations).

If you do not find any of those, flag the url and write it into a log file. If you find the pattern, just print the address (it is used for some other process)


I'd suggest using Beautiful Soup to parse the page. Another alternative would be to setup a HIT on Mechanical Turk.


scrapy is the best. The best thing about scrapy is that it is a open source. scrapy documentation

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜