What are the SEO-unfriendly Java things a webapp developer should be aware of?
This is a serious question (see my comment).
The question is simple: what are all the SEO-u开发者_如何转开发nfriendly things Java is doing that will make your website rank not as well as it should in the major search engines?
There's a major default behavior of servlets SNAFU related to JSESSIONID.
This is HUGE (in uppercase bold).
What Google has to say about session ID in URLs:
Allow search bots to crawl your sites without session IDs or arguments that track their path through the site. These techniques are useful for tracking individual user behavior, but the access pattern of bots is entirely different. Using these techniques may result in incomplete indexing of your site, as bots may not be able to eliminate URLs that look different but actually point to the same page.
They specifically mention here that you should not serve session IDs to search bots.
That is just one quote: on several pages Google warns webmasters about session IDs in URLs and the countless issue they raise and why it will harm your ranking.
Yet by default any Java Webapp will serve very long JSESSIONID, different everytime the search bots contact your Java website.
This not only creates hundreds of millions (!) of useless URLs in Google (and other) search engine results:
it clutters the screen (not too bad)
it also creates countless dupes (very bad)
it makes old content you'd want to be replaced "stick" in Google's search results (very very bad)
In addition to that, it is firmly believed that providing dupes actually lowers your ranking because Google's PageRank penalize you if you do so.
This is very concerning for any Webapp developer concerned at all by SEO.
There's a solution: provide a version without JSESSIONID to the Google bots. But be very careful: providing a different page to the Google bots and to your users can get you penalized too.
In the "JSESSIONID considered harmful" article, the author, who's obviously well aware of SEO issues, creates a filter that gets rid of the JSESSIONID altogether (no cookie, no sugar). It's a bit overkill, but it's probably better than destroying your pagerank by using the default spec'ed servlet behavior.
This is wild.
Search engines don't care in the least about Java, only the output HTML. Your concern is misplaced with Java, instead become a student of quality content marked up with Semantic HTML (http://en.wikipedia.org/wiki/HTML#Semantic_HTML)
If you are asking about JavaScript (instead of Java), most search engines do not pay any attention to JavaScript. So do not expect dynamically added HTML to be indexed. This also means, do not use JavaScript onclick actions to replace the basic functionality of the href attribute of anchor tags. Similar to Java, the recommendation falls back to clean semantic HTML markup of quality content.
精彩评论