Specifying variables in robots.txt
My URL structure is set up in two parallels (both lead to the same place ):
www.example.com/subname
www.example.com/123
The trouble is is that, the spiders are crawling into things like:
www.example.com/subname/default_media_function
www.example.com/subname/map_function
Note that the name "subname" represents thousands of different 开发者_StackOverflow社区pages on my site that all have that same function.
And they are throwing out errors because those links are strictly for JSON or AJAX purposes and not actual links. I would like to block them from accessing those pages, but how would I do that if the URL contains a variable?
Would this work in robots.txt?
Disallow: /map_function
You are going to have to do
Disallow: /subname/map_function
The robots will look for the robots.txt at root level. Also there they evaluate URLs left to right with no wildcards.
So, you will either need to make one location for all the map_function and exclude that, or exclude all locations.
精彩评论