What prevents the fixing of the Insert-whatever-you-want-into-the-URL issue that some content management systems have?

2023-04-01 20:54 问答作者：

I don't understand what real-world issues prevent a system from di开发者_如何学Gosallowing these kinds of URLs?

http://www.washingtonpost.com/hey-this-url-doesn't-mean-a-damn-thing/gIQAocHrpJ_story.html

I understand what's going on. The routing system looks for the key after the final backslash. And then it parses out what's after the underscore to build out the version.

So: washingtonpost.com/whatever/gIQAocHrpJ_story.html brings us the normal story version washingtonpost.com/whatever/gIQAocHrpJ_print.html brings us the normal print version washingtonpost.com/whatever/gIQAocHrpJ_mobile.html brings us the mobile xml version

Strangely, even changing that .html to another common extension, like .js or .xml or nothing at all, brings you back the same page. However, changing it to something non-standard, like .fffuuu alternatively brings you a human-friendly 404 page or a total blank page. It's like the CMS programmer just whitelisted the first few filetypes that came to mind and had the system treat them all the same.

I've only built simple sites in Rails and Wordpress, so I understand simple concepts about url patterns, such as how prefix constants can affect the lookup speed...but am I wrong in thinking that there is no rhyme or reason to the above design pattern?

Mind you, the Washington Post just recently completed a major redesign. This isn't about trying to make do with a legacy system, their CMS designers apparently had the freedom to adopt modern best practices. I just don't see the advantages of the url-design-pattern that they've adopted, except that the CMS designer doesn't know any better.

How is their current system any faster than a database model that has a unique key and then a human-readable field?

http://www.washingtonpost.com/HUMAN-READABLE-KEY/UNIQUE_KEY.html

The pattern in between the domain backslash and the final backslash is the human readable key. The system finds a record with the UNIQUE_KEY and then sees if the human-readable-key matches what the DB has for that record.

I noticed that in the official version of the links, as they are generated from the homepage, include year/month/day information. Again, it's meaningless, as you can alter those and get the same page (thankfully, no JS seems to depend on parsing those).

I'm guessing the CMS designer didn't want to be bound by dates, as a news story could break on 8/20/2011 but the print version goes live on 8/21/2011...Sure, then just don't have dates at all in the URL. If the URL can be changed to anything, then don't train the user to expect document-specific information in it.

Not even the first term after the domain means anything. Therefore:

http://www.washingtonpost.com/politics/mitt-romney-debates-us-economy/gIQAocHrpJ_story.html

Goes to the same story as

http://www.washingtonpost.com/sex/mitt-romney-debates-us-economy/gIQAocHrpJ_story.html

And finally, doesn't this play havoc with Google and other search engines?

The key reason this is done is to make sure that if the headline changes readers can still get to the story. The "slug" (what you call the human readable key: mitt-romney-debates-us-economy) is usually auto-generated from the page's headline or title text. In some older CMSes, where this wasn't well thought out, changing the headline often left the URL the same (with the old slug in it). As you can imagine, at times, when the original headline was ill-chosen, this could be quite embarrassing.

As a result, most CMS developers switched to looking up the story based on an ID, which it's much easier to make sure doesn't change. But then what to do with the slug? Some CMSes just ignore it; that's the Washington Post's approach.

Another (pretty easy and probably better) solution is: When you find your story in the database, make sure the URL's slug matches the story's current slug in the database (based on the current headline). If it doesn't, redirect the user to the correct URL. From the end user's perspective, it's seamless: You type in http://www.washingtonpost.com/hey-this-url-doesn't-mean-a-damn-thing/gIQAocHrpJ_story.html and when the page is done loading you're at http://www.washingtonpost.com/politics/mitt-romney-debates-us-economy/gIQAocHrpJ_story.html

Why the Washington Post isn't doing that, I'm not sure; they have lots of smart people there, so there's probably some excellent technical reason linked to their particular CMS (which I would guess is based on something they bought from a vendor). In other systems, the solution I've described can be done very easily (in Django, I've done it in three lines).

继续阅读：content-management-system routes

What prevents the fixing of the Insert-whatever-you-want-into-the-URL issue that some content management systems have?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？