开发者

Transform title into dashed URL-friendly string [closed]

Closed. This question needs to be more focused. It is not currently accepting answers. 开发者_运维技巧

Want to improve this question? Update the question so it focuses on one problem only by editing this post.

Closed 3 years ago.

Improve this question

I would like to write a C# method that would transform any title into a URL friendly string, similar to what Stack Overflow does:

  • replace spaces with dashes
  • remove parenthesis
  • etc.

I'm thinking of removing Reserved characters as per RFC 3986 standard (from Wikipedia) but I don't know if that would be enough? It would make links workable, but does anyone know what other characters are being replaced here at stackoverflow? I don't want to end up with %-s in my URLs...

Current implementation

string result = Regex.Replace(value.Trim(), @"[!*'""`();:@&+=$,/\\?%#\[\]<>«»{}_]");
return Regex.Replace(result.Trim(), @"[\s*[\-–—\s]\s*]", "-");

My questions

  1. Which characters should I remove?
  2. Should I limit the maximum length of resulting string?
  3. Anyone know which rules are applied on titles here on SO?


Rather than looking for things to replace, the list of unreserved chars is so short, it'll make for a nice clear regex.

return Regex.Replace(value, @"[^A-Za-z0-9_\.~]+", "-");

(Note that I didn't include the dash in the list of allowed chars; that's so it gets gobbled up by the "1 or more" operator [+] so that multiple dashes (in the original or generated or a combination) are collapsed, as per Dominic Rodger's excellent point.)

You may also want to remove common words ("the", "an", "a", etc.), although doing so can slightly change the meaning of a sentence. Probably want to remove any trailing dashes and periods as well.

Also strongly recommend you do what SO and others do, and include a unique identifier other than the title, and then only use that unique ID when processing the URL. So http://example.com/articles/1234567/is-the-pop-catholic (note the missing 'e') and http://example.com/articles/1234567/is-the-pope-catholic resolve to the same resource.


I would be doing:

string url = title;
url = Regex.Replace(url, @"^\W+|\W+$", "");
url = Regex.Replace(url, @"'\"", "");
url = Regex.Replace(url, @"_", "-");
url = Regex.Replace(url, @"\W+", "-");

Basically what this is doing is it:

  • strips non-word characters from the beginning and end of the title;
  • removes single and double quotes (mainly to get rid of apostrophes in the middle of words);
  • replaces underscores with hyphens (underscores are technically a word character along with digits and letters); and
  • replaces all groups of non-word characters with a single hyphen.


Most "sluggifiers" (methods for converting to friendly-url type names) tend to do the following:

  1. Strip everything except whitespace, dashes, underscores, and alphanumerics.
  2. (Optional) Remove "common words" (the, a, an, of, et cetera).
  3. Replace spaces and underscores with dashes.
  4. (Optional) Convert to lowercase.

As far as I know, StackOverflow's sluggifier does #1, #3, and #4, but not #2.


How about this:

string FriendlyURLTitle(string pTitle)
{
    pTitle = pTitle.Replace(" ", "-");
    pTitle = HttpUtility.UrlEncode(pTitle);
    return Regex.Replace(pTitle, "\%[0-9A-Fa-f]{2}", "");
}


this is how I currently slug words.

        public static string Slug(this string value)
    {
        if (value.HasValue())
        {
            var builder = new StringBuilder();
            var slug = value.Trim().ToLowerInvariant();

            foreach (var c in slug)
            {
                switch (c)
                {
                    case ' ':
                        builder.Append("-");
                        break;
                    case '&':
                        builder.Append("and");
                        break;
                    default:

                        if ((c >= '0' && c <= '9') || (c >= 'a' && c <= 'z') && c != '-')
                        {
                            builder.Append(c);
                        }

                        break;
                }
            }

            return builder.ToString();
        }

        return string.Empty;
    }


I use this one...

    public static string ToUrlFriendlyString(this string value)
    {
        value = (value ?? "").Trim().ToLower();

        var url = new StringBuilder();

        foreach (char ch in value)
        {
            switch (ch)
            {
                case ' ':
                    url.Append('-');
                    break;
                default:
                    url.Append(Regex.Replace(ch.ToString(), @"[^A-Za-z0-9'()\*\\+_~\:\/\?\-\.,;=#\[\]@!$&]", ""));
                    break;
            }
        }

        return url.ToString();
    }


This works for me

string output = Uri.UnescapeDataString(input);
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜