开发者

disallow certain url in robots.txt [closed]

Closed. This question is off-topic. It is not cur开发者_Go百科rently accepting answers.

Want to improve this question? Update the question so it's on-topic for Stack Overflow.

Closed 10 years ago.

Improve this question

We implemented a rating system on a site a while back that involves a link to a script. However, with the vast majority of ratings on the site at 3/5 and the ratings very even across 1-5 we're beginning to suspect that search engine crawlers etc. are getting through. The urls used look like this:

http://www.thesite.com/path/to/the/page/rate?uid=abcdefghijk&value=3

When we started we add the following to our robots.txt:

User-agent: *
Disallow: /rate

Is this incorrect or are googlebot and others simply ignoring our robots.txt?


You should use POST for actions which change things as search engine usually do not submit forms. Additionally, this will prevent users who download your website recursively (e.g. with wget) from submitting tons of votes.

Depending on your site, handling voting though javascript might be a solution, too.

Regarding your robots.txt: It has to be in the root path - i.e. http://www.thesite.com/robots.txt - and if your rating system is at /blah/rate you need to use Disallow: /blah/rate instead of Disallow: /rate


Looks incorrect to me. You're only disallowing access to http://www.thesite.com/rate (and pages below it IIRC). Plus some crawlers ignore robots.txt!

Better to make it so that ratings are only ever altered in response to a POST, rather than a GET. Search engines never use POST.


User-agent: *
Disallow: /path/to/the/page/rate

You have to use the full path.

Might want to read up here a bit: http://www.javascriptkit.com/howto/robots.shtml

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜