Combining age verification and google indexing
As spiders will generally not execute javascript i am thinking of taking one of the options below in order to successfully get them to index the content of a web开发者_StackOverflowsite that requires age verification.
My preferred solution:
Checking for a cookie 'ageverification
'. If it does not exist, add some javascript to
redirect the user to ~/verifyage.aspx which will add the required cookie and redirect the user to their previous page.
Another solution:
As above, but do not redirect the user. Instead, if the cookie doesnt exist, draw the age verification form 'over the top' of the existing page.
Another solution:
Add a 'Yes I am over 18' anchor link that a crawler can follow. I am slightly skeptical over the legality of this.
Any insight or ideas greatly appreciated.
What I do - I store age verification in session data. If the session variable does not exist, the server appends a div to the end of the body (after the footer) with the click to verify or click to exit. I use CSS to have it cover the content.
For the css - I use:
display: block; width: 100%; height: 100%; position: fixed; top: 0px; left: 0px; z-index: 9999;
That causes the div the cover all other content in a graphical browser even though it is placed at the very end of the body.
For users without JS enabled, the "Enter" link points to a web page that sets the session variable and returns the user to the page they requested. That results in two page loads of the browser for them to get to the content they want which is not ideal, but it is the only way to do it for non JS enabled browsers.
For JS enabled browsers, a small JavaScript is attached to the page that will change the "Enter" link href link to # and attach a very basic function to the click event, so that clicking on Enter triggers the use XMLHttpRequest to tell the server the person clicked "Enter". The server then updates the session and responds to the XMLHttpRequest with a 200 OK response, triggering the JavaScript to hide the age verification div covering the content. Thus the session is updated so the server knows the user verified the age and the user gets to see the content they wanted with no page reloading in the browser, a much better user experience.
The age verification thus works without JavaScript by sending the user to the verify page the stateless way or in a much friendlier way with JavaScript.
When a search spider crawls the site, it gets the age verification div on every page because a spider will not have the necessary session variable set, but since the div is at the very end of the html body the spider still indexes the real content first.
You've got a real problem either way.
If you let the crawler onto the age-verified portion of your site, then it has that content in its index. Which means it will present snippets of that to users who search for things. Who haven't been through your age verification. In the case of Google, this means users actually have access to the entire body of content you were putting behind the verifywall without going through your screener - they can pull it from the Google cache!
No-win situation, sorry. Either have age-verified content or SEO, not both. Even if you somehow tell the search engine not to spit out your content, the mere fact that your URL shows up in search results tells people about your site's (restricted) contents.
Additionally, about your JavaScript idea: this means that users who have JavaScript disabled would get the content without even knowing that there should have been a click-through. And if you display a banner on top, that means that you sent the objectionable content to their computer before they accepted. Which means it's in their browser cache. Or they could just hack out your banner and have at whatever it is you were covering up without clicking 'OK'.
I don't know what it is your website does, but I really suggest forcing users to POST a form to you before they're allowed to view anything mature. Store their acceptance status in a session variable. That's not fakeable. Don't let the search engine in unless it's old enough, too, or you have some strong way to limit what it does with what it sees AND strong information about your own liability.
精彩评论