Is it safe to depend on a trailing slash in a URL for routing purposes?
I'm building a site that has products, each of which belongs to one or more categories, which can be nested within parent categories. I'd like to have SEO-friendly URLs, which look like this:
- mysite.com/category/
- mysite.com/category/product
- mysite.com/category/sub-category/
- mysite.com/category/sub-category/product
My question is: Is it safe to depend on a the presence of a trailing slash to differentiate between cases 2 and 3? Can I always assume the user wants a category index when a trailing slash is detected, vs a specific product's page with no trailing slash?
I'm not worried about implementing this URI scheme; I've already done as much with PHP and mod_rewrite. I'm simply wondering if anybody knows of any objections to this kind of URL routing. Are there any known issues with browsers stripping/adding trailing URLs from the address bar, or with search engines crawling such a site? Any SEO issues开发者_StackOverflow社区 or other stumbling blocks that I'm likely to run into?
In addition to the other pitfall ideas you mentioned, the user might himself change the URL (by typing the product or category) and add/remove the trailing "/".
To solve your problem, why not have a special sub-category "all" and instead of "mysite.com/category/product" have "mysite.com/category/all/product"?
To me, it seems very unnatural that http://product/
and http://product
would represent two entirely different resources. It is confusing, and it makes your URLs less hackable, since it is difficult to tell when a trailing slash should be present or not.
Also, in RFC 3986, Uniform Resource Identifier (URI): Generic Syntax, there is a note on Protocol-Based Normalization in chapter 6.2.4, which talks about this particular situation with regard to non-human visitors of your site, such as search engines and web spiders:
Substantial effort to reduce the incidence of false negatives is often cost-effective for web spiders. Therefore, they implement even more aggressive techniques in URI comparison. For example, if they observe that a URI such as
http://example.com/data
redirects to a URI differing only in the trailing slash
http://example.com/data/
they will likely regard the two as equivalent in the future. (...)
One way to differentiate would be to make sure product pages have an extension, but category or sub-category pages to not. That is:
- mysite.com/category/
- mysite.com/category/product.html
- mysite.com/category/sub-category/
- mysite.com/category/sub-category/product.html
That makes it unambiguous.
Never assume the user will do anything BUT the worst case scenario in anything URL related.
unless you're prepared to do redirects in your code, assume you have the equal chance of a URI ending in slash or no slash. Only way to make sure your code is robust and thus won't have to worry about this kind of issue.
This question assumes that the addition of a trailing slash to a URL creates a URL that refers to a different resource. This is wrong; the semantics of URLs is that they both refer to the same resource. The presence of a trailing slash in a base URL merely changes how relative URLs are interpreted using that base URL.
精彩评论