How search engines find websites over internet
I'm going to write a Web parser (an application that crawles on the web from one site to another).
How Can I find list 开发者_JAVA百科of available domains/IPs in the internet (as complete as possible)? How search engines find websites (What they use as a reliable list of registred IP/Domains for starting point)?Thanks
As Michael P's comment indicates, depends on what your objective is.
My company recently wanted to answer a question about third-party tools used on leading websites. I used Alexa as a starting point to find the top (by traffic) websites, and created a parser that can answer the specific question my company asked. If you start from such a list, you can program your web crawler to follow the links it encounters to broaden your knowledge of sites on the web.
Hopefully that helps you think about the problem.
精彩评论