Find all but the first occurrence of a character with REGEX
I'm building a .Net application and I need to strip any non-decimal character from a string (excluding the first '.'). Essentially I'm cleaning user input to force a real number result.
So far I've been using online RegEx tools to try and achieve this in a single pass, but I'm not getting very far.
开发者_如何转开发I wish to accomplish this:
asd123.asd123.123.123 = 123.123123123
Unfortunately I've only managed to get to the stage where
asd123.asd123.123.123 = 123.123.123.123
by using this code.
System.Text.RegularExpressions.Regex.Replace(str, "[^\.|\d]*", "")
But I am stuck trying to remove all but the first decimal-point.
Can this be done in a single pass?
Is there a better-way™?This can be done in a single regex, at least in .NET which supports infinite repetition inside lookbehind assertions:
resultString = Regex.Replace(subjectString, @"(?<!^[^.]*)\.|[^\d.]", "");
Explanation:
(?<!^[^.]*) # Either match (as long as there is at least one dot before it)
\. # a dot
| # or
[^\d.] # any characters except digits or dots.
(?<!^[^.]*)
means: Assert that it's impossible to match a string that starts at the beginning of the input string and consists solely of characters other than dots. This condition is true for all dots following the first one.
I think it'll be done better without regular expressions.
string str = "asd123.asd123.123.123";
StringBuilder sb = new StringBuilder();
bool dotFound = false;
foreach (var character in str)
{
if (Char.IsDigit(character))
sb.Append(character);
else if (character == '.')
if (!dotFound)
{
dotFound = true;
sb.Append(character);
}
}
Console.WriteLine(sb.ToString());
Firstly, the regex you are currently using will leave any | characters untouched. You only need [^.\d]*
since .
has no special meaning in []
After this replace, you could try something like this:
Replace(str, "([\d]+\.[\d]+)[^\d].*", "\1");
But you'd only need this if there is a .
at all in the number.
Hope this helps.
精彩评论