Stopping Google's crawl of my site

2023-03-24 04:11 问答作者：

Google has started crawling my site, but from a temporary domain (beta.mydomain instead of just mydomain) and also I only want him to crawl just some of my pages. Therefore, I want to stop their crawl and only let them crawl pages I specify in a sitemap. How can I do that? (I know how to add a sitemap, but how can I stop their current crawling and request that they'll crawl just the sitemap)

Update: If I kill the subdomain beta.mydomai开发者_运维知识库n - will that be "fine" by them or will they continue go over all killed pages and "not like" them? Can I specify that in each page's header?

Create a single text file called 'robots.txt' in the root folder for your site. Inside...

User-agent: *
Disallow: /thisfolder/
Disallow: /foo.html
Disallow: /andthisfoldertoo/
Disallow: /andthisfile.html

I use this for project files. In fact, as I write this I think I'll change the way I work on projects and always put them in a sub-directory called /projects/project1/ so one line will do...

Disallow: /projects/

AND I also add a line for my image files. I don't like my images all over the web...

Disallow: /imgs/

You could start with a robots.txt file.

See google's info here

I presume you have already looked at webmaster tools and sitemaps from what you say? Do be aware that while a sitemap will help tell google WHAT to crawl, it won't work very well for telling them what NOT to crawl.

For that you will want to use the robots.txt file to block certain pages / folders.

Use a robots.txt, see this site.

继续阅读：web-config web-crawler

Stopping Google's crawl of my site

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？