开发者

Regex to replace invalid characters

I don't have much experience with RegEx so I am using many chained String.Replace() calls to remove unwanted characters -- is there a RegEx I can write to streamline this?

string messyText = GetText();
string cleanText = messyText.Trim()
         .ToUpper()
         .Replace(",", "")
         .Replace(":", "")
         .Replace(".", "")
         .Replace(";", "")
         .Replace("/", "")
         .Replace("\\", "")
         .Replace("\n", "")
         .Replace("\t", "")
         .Repl开发者_如何学编程ace("\r", "")
         .Replace(Environment.NewLine, "")
         .Replace(" ", "");

Thanks


Try this regex:

Regex regex = new Regex(@"[\s,:.;/\\]+");
string cleanText = regex.Replace(messyText, "").ToUpper();

\s is a character class equivalent to [ \t\r\n].


If you just want to preserve alphanumeric characters, instead of adding every non-alphanumeric character in existence to the character class, you could do this:

Regex regex = new Regex(@"[\W_]+");
string cleanText = regex.Replace(messyText, "").ToUpper();

Where \W is any non-word character (not [^a-zA-Z0-9_]).


Character classes to the rescue!

string messyText = GetText();
string cleanText = Regex.Replace(messyText.Trim().ToUpper(), @"[,:.;/\\\n\t\r ]+", "")


You would probably want to use a whitelist approach, there is an ocean of funny characters whose effect depending on combination may not be easy to figure.

A simple regex that removes everything but the allowed characters could look like this:

messyText = Regex.Replace(messyText, @"[^a-zA-Z0-9\x7C\x2C\x2E_]", "");

The ^ is there to invert the selection, apart from the alphanumeric characters this regex allows | , . and _ You can add and remove characters and character sets as needed.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜