Mod_rewrite - How to tell Google to dynamically delete pages from their index after 7 days
Search engines like to crawl and index webpages or URLs, but what if your webpages/URLs have expired content and you do not want them to be indexed after so many days?
Can you put an expiration in th开发者_如何学编程e URL and have mod_rewrite 301 redirect pages after a given expiration date?
Or maybe a cron job to add a 301 redirect header to all expired pages?
Just have the 'expired' pages return a 404? I am pretty sure that when Google encounters a 404, it will remove the page.
Not 404 or 301, but 410 Gone. This is the appropriate HTTP response:
The requested resource is no longer available at the server and no forwarding address is known. This condition is expected to be considered permanent. Clients with link editing capabilities SHOULD delete references to the Request-URI after user approval. If the server does not know, or has no facility to determine, whether or not the condition is permanent, the status code 404 (Not Found) SHOULD be used instead. This response is cacheable unless indicated otherwise.
The 410 response is primarily intended to assist the task of web maintenance by notifying the recipient that the resource is intentionally unavailable and that the server owners desire that remote links to that resource be removed. Such an event is common for limited-time, promotional services and for resources belonging to individuals no longer working at the server's site. It is not necessary to mark all permanently unavailable resources as "gone" or to keep the mark for any length of time -- that is left to the discretion of the server owner.
How you provide this response is open to discussion, however. There are many ways.
精彩评论