开发者

fetching only website details as search engine does

I have to fetch website details as search engine does. I need the description of the site,link 开发者_StackOverflow中文版and some info about them and will store it in my DB. Is there any libraries available for doing this? Please remember I can crawl a whole webpage but I need only the information in the format crawled by search engines.

Thanks,

Karthik


Which language? APIs and bindings exist for reading webpage content. Do you realize the scale of the task if you wish to create a new 'search engine'? Your question is so generic, there's not a lot of advice that can be given, other than:

Respect robots.txt

Don't hammer the server with requests, you'll soon get your IP blocked by sensible sysadmins.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜