Regular Expression - Is this possible?

2023-04-01 07:59 问答作者：

Rather than describing what I want (it's difficult to explain), Let me provide an example of what I need to accomplish in C# using a regular expression:

"HelloWorld" should be transformed to "Hello World" 
"HelloWORld" should be transformed to "Hello WO Rld" //Two co开发者_StackOverflow社区nsecutive letters in capital should be treatead as one word
"helloworld" should be transformed to "helloworld"

EDIT:

"HellOWORLd" should be transformed to "Hell OW OR Ld"

Every 2-consecutive capital letters should be considered one word.

Is this possible?

This is fully working C# code, not just the regex:

Console.WriteLine(
    Regex.Replace(
        "HelloWORld", 
        "(?<!^)(?<wordstart>[A-Z]{1,2})", 
        " ${wordstart}", RegexOptions.Compiled));

And it prints:

Hello WO Rld

Update

To make this more UNICODE/international aware, consider replacing [A-Z] by \p{Lt} (meaning a UNICODE code point that represents a Letter in uppercase). The result for the current input would the same. So here is a slightly more compelling example:

Console.WriteLine(Regex.Replace(
            @"ÉclaireürfØÑJßå",
            @"(?<!^)(?<wordstart>\p{Lu}{1,2})", 
            @" ${wordstart}",
            RegexOptions.Compiled));

The regular expression engine is not a transformative thing by nature, but rather a pattern matching (and replacing) engine. People often mistake the replace part of Regex, thinking that it can do more than it's designed to.

Back to your question, though... Regex cannot do what you want, instead, you should write your own parser to do this. With C#, if you're familiar with the language, this task is somewhat trivial.

It's a case of "You're using the wrong tool for the job".

Here are regular expressions that detect what you are looking for:

([A-Z]\w*?)[A-Z]

this matches any uppercase letter from A to Z once followed by aphanumerics up to the next uppercase.

([A-Z]{2}\w*?)[A-Z]

this matches any uppercase letter from A to Z exactly 2 times.

Regex is a matching engine, you can parse the input string and use regex.isMatch to find candidate matches to then insert spaces into the output string

string f(string input)
{ 
  //'lowerUPPER' -> 'lower UPPER'
  var x = Regex.Replace(input, "([a-z])([A-Z])","$1 $2"); 

  //'UPPER' -> 'UP PE R'
  return Regex.Replace(x, "([A-Z]{2})","$1 "); 
}

class Program
{
    static void Main(string[] args)
    {
        Print(Parse("HelloWorld"));
        Print(Parse("HelloWORld"));
        Print(Parse("helloworld"));
        Print(Parse("HellOWORLd"));
        Console.ReadLine();
    }

    static void Print(IEnumerable<string> input)
    {
        foreach (var s in input)
        {
            Console.Write(s);
            Console.Write(' ');
        }
        Console.WriteLine();
    }

    static IEnumerable<string> Parse(string input)
    {
        var sb = new StringBuilder();
        for (int i = 0; i < input.Length; i++)
        {
            if (!char.IsUpper(input[i]))
            {
                sb.Append(input[i]);
                continue;
            }
            if (sb.Length > 0)
            {
                yield return sb.ToString();
                sb.Clear();
            }
            sb.Append(input[i]);
            if (char.IsUpper(input[i + 1]))
            {
                sb.Append(input[++i]);
                yield return sb.ToString();
                sb.Clear();
            }
        }
        if (sb.Length > 0)
        {
            yield return sb.ToString();
        }
    }
}

I think does not need regular expression in this case. Try this:

  static void Main(string[] args)
        {
            var input = "HellOWORLd";
            var i = 0;
            var x = 4;
            var len = input.Length;
            var output = new List<string>();
            while (x <= len)
            {
                output.Add(SubStr(input, i, x));
                i = x;
                x += 2;

            }
            var ret = output.ToArray(); //["Hell","OW", "OR", "Ld"]

            Console.ReadLine();


        }

static string SubStr(string str, int start, int end)
            {
                var len = str.Length;
                if (start >= 0 && end <= len)
                {
                    var ret = new StringBuilder();
                    for (int i = 0; i < len; i++)
                    {
                        if (i == start)
                        {
                            do
                            {
                                ret.Append(str[i]);
                                i++;
                            } while (i != end);
                        }
                    }
                    return ret.ToString();
                }
                return null;
            }

继续阅读：.net regex

Regular Expression - Is this possible?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？