开发者

Does googlebot crawl urls in jQuery $.get() calls and can it be prevented?

I have a page that has a form using this ajaxForm jQuery plugin. The form submits, and when it's complete, there is a call using $.get() to load some new content to the page.

My problem is, the Googlebot "appears" to 开发者_如何学Pythonbe indexing the url in the $.get() method.

My first question is, is that even possible? I was under the impression the Googlebot didn't evaluate javascript for the most part (I read something about it being able to index content on urls with !#).

My second question is, if Google is indexing this call to that url, is there a way to prevent it?

Thanks in advance.


You could robots.txt the file specifically, googlebot will should honor it.

From robotstxt.org:

User-agent: *
Disallow: /~joe/junk.html
Disallow: /~joe/foo.html
Disallow: /~joe/bar.html

You can also look at Google's Webmaster Central to remove the file from the listing.


First of all you need to check that that is really the GoogleBot because anyone can pretend being GoogleBot, even a legitimate user.

The recommended technique would be to do a reverse DNS lookup, verify that the name is in the googlebot.com domain, and then do a corresponding forward DNS->IP lookup using that googlebot.com name.

Sourced from Official Google Webmaster Central Blog: How to verify Googlebot.


googlebot interprets pretty much every string in inline-javascript as an URL that contains a "/" or a common file extenstion (".html", ".php") ... especially the first one is very very annoying.

confuscate every URL in inline JS that you do not want to get crawled. i.e.: replace "/" with '|' on the server side and make a wrapper method in JS that replaces "|" to "/" again.

yes, thats majorly annoying and there are better ways i.e.: having all your js in an external file that is not crawlable.

the robots.txt solution is not really a solution. because the URLs still get found, pushed to discovery (the pipe google uses to determine what to crawl next) but then the crawling is blocked, which is basically one missed opportunity.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜