multi language site and search engines
I'm developing a site for a company that has clients from all over the world and the site will be served in two languages: Italian (local) and English. Once a visitor visits the site I check the IP, if its coming from Italy I show the site in Italian , if it's not I show it in English. Of course they will have the option to manually override the language. What exactly开发者_StackOverflow中文版 happens when the search engine bots inspect the site to index the pages?
- usually the crawlers always have USA based IPs
- even if the crawlers "click" on the "change language" link to show Italian pages since they can't accept cookies (and so sessions) I can't keep the language set or keep trace of what has been chosen
So the question is , how can you handle this situation in a way that search engines scan both the languages and also index them?
Google actually has an article in their Webmaster guidelines on this subject. You may want to take a look, as they specifically address the issues you have raised: http://www.google.com/support/webmasters/bin/answer.py?answer=182192
I'd use subdomains:
eng.mysite.com/whatever
it.mysite.com/whatever
Then have a sitemap which points to the home page of each of those language subdomains, and they should all be crawled just fine.
You can use the following approach:
- Scan the Accept-Language header (
$_SERVER['HTTP_ACCEPT_LANGUAGE']
) for languages that the user agent prefers. This is usually more reliable than checking the IP address for their country. - Check the User-Agent header (
$_SERVER['HTTP_USER_AGENT']
) to see if the request comes from a search engine, such as "Googlebot" and "Yahoo! Slurp".
精彩评论