Replacing all non-ASCII characters, except right angle character in C#
Writing a file utility to strip out all non-ASCII characters from files. I have this Regex:
Regex rgx = new Regex(@"[^\u0000-\u007F]");
Which works fine. But unfortunatly, I've discovered some silly people use right angles (¬) as delimiters in their files, so these get stripped out as well, but I need those!
I'm pretty new to Regex, and I do understand the basics, but any help would be awesome!
Tha开发者_StackOverflow社区nks in advance!
You just need to include the code point for the angle bracket in the set:
Try this:
Regex rgx = new Regex(@"[^\uxxxx\u0000-\u007F]");
Or this:
Regex rgx = new Regex(@"[^\uxxxx-\uxxxx\u0000-\u007F]");
(Where xxxx is the Unicode code point for the character you want to preserve.)
The reason for giving two options here is that I know you can specify multiple ranges within one negative character group, but I don't know if you can match individual characters with ranges.
Jon's answer is absolutely correct. You may be using the wrong code for the character. Try the following for the similar looking characters:
Regex regex = new Regex(@"([^\u00ac\u0000-\u007F])");
Regex regex = new Regex(@"([^\u02fa\u0000-\u007F])");
Regex regex = new Regex(@"([^\u031a\u0000-\u007F])");
First one should work I think.
精彩评论