开发者

Specifying variables in robots.txt

My URL structure is set up in two parallels (both lead to the same place ):

  • www.example.com/subname
  • www.example.com/123

The trouble is is that, the spiders are crawling into things like:

  • www.example.com/subname/default_media_function
  • www.example.com/subname/map_function

Note that the name "subname" represents thousands of different 开发者_StackOverflow社区pages on my site that all have that same function.

And they are throwing out errors because those links are strictly for JSON or AJAX purposes and not actual links. I would like to block them from accessing those pages, but how would I do that if the URL contains a variable?

Would this work in robots.txt?

Disallow: /map_function


You are going to have to do

Disallow: /subname/map_function

The robots will look for the robots.txt at root level. Also there they evaluate URLs left to right with no wildcards.

So, you will either need to make one location for all the map_function and exclude that, or exclude all locations.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜