Listing both sitemaps and sitemap index files in robots.txt?
My site is comprised of 3 main sections: Reviews, Forum, and Blog. I have plugins for the forum and blog that automatically generate sitemaps for them. The forum plugin generates a sitemap INDEX file pointing to multiple indexes, and the blog plugin generates a regular sitemap file containing all my blog content. Here are their entries from robots.txt:
Sitemap: http://www.datesphere.c开发者_如何学Pythonom/forum/sitemap-index.xml
Sitemap: http://www.datesphere.com/blog/sitemap.xml
I just created a Reviews sitemap.xml file that contains all the content in the Reviews section. I was planning to just add a line to robots.txt so the whole thing would look like this:
Sitemap: http://www.datesphere.com/forum/sitemap-index.xml
Sitemap: http://www.datesphere.com/blog/sitemap.xml
Sitemap: http://www.datesphere.com/reviews-sitemap.xml
HERE'S MY QUESTION: I know you can list multiple sitemaps in robots.txt, but is it OK to have a sitemap index file as well as multiple sitemaps listed? Will Googlebot ignore the other sitemap files if it finds a sitemap-index.xml file in robots.txt? If so, do I have to put my blog and reviews sitemaps in another sitemap index file and just list that in robots.txt?
I've checked around but can only find answers to the question "can I list multiple sitemaps?"
Googlebot will not ignore any of the Sitemaps you list in robots.txt even if you list their parent Sitemap Index, too. We follow pretty much every link we find and if we're allowed to, we'll crawl them. Personally, I'd probably list only the Sitemap Indexes, though only for manageability's sake, but it's up to you, Googlebot won't mind if you list both the indexes and the Sitemaps.
When you have multiple sitemaps, you can either specify your sitemap index file URL in your robots.txt file as shown in the example below:
// robots.txt
Sitemap: http://www.example.com/sitemap_index.xml
User-agent:*
Disallow: /some/disallowed/path
Or, you can specify individual URLs of your multiple sitemap files, as shown in the example below:
// robots.txt
Sitemap: http://www.example.com/sitemap_host1.xml
Sitemap: http://www.example.com/sitemap_host2.xml
User-agent:*
Disallow: /some/disallowed/path
Finally, this is what you need to pay attention to when adding the Sitemap directive to the robots.txt file.
精彩评论