开发者

Ignoring white space for a Regex match

I need to match 8 or more digits, the sequence of which can include spaces.

for example, all of the below would be valid matches.

12345678
1 2345678
12 3 45678
1234 5678
12 34567 8
1 2 3 4 5 6 7 8

At the moment I have \d{8,} but this will only capture a solid block of 8 or more digits.

[\d\s]{8,} will n开发者_如何学Pythonot work as I don't want white space to contribute to the count of chars captured.


(\d *){8,}

It matches eight or more occurrences of a digit followed by zero or more spaces. Change it to

( *\d *){8,}  #there is a space before first asterik

to match strings with spaces in the beginning. Or

(\s*\d\s*){8,}

to match tabs and other white space characters (that includes newlines too).

Finally, make it a non-capturing group with ?:. Thus it becomes (?:\s*\d\s*){8,}


Waayy later, but this really needs the correct answer on it, and a reason why. Who knew this question could have such a complex answer, right? Lol. But there are plenty of considerations surrounding spacing in regex.

Firstly; Never put a space in a regex. Doing so will make your regex unreadable, and unmaintainable. Memories of using a mouse to highlight a space to ensure it was only one space comes to mind. This will break your regex:    , but this won't: [    ], because repetition in a character class is ignored. And if you require an exact number of spaces, you can actually see that in a character class like so: [ ]{3}. Versus accidents without the character class like so:   {3} <-- This is actually looking for 5 spaces, woops!

Second; Keep the Freespacing (?x) option in mind, which makes your regex commentable and free-spaceable. You shouldn't fear that somebody using that option might break your regex because you decided to put random keyboard spaces in it. Also, (?x) will not ignore the keyboard space when it's inside a character class like so: [ ]. It is therefore safer to use character classes for your keyboard spaces.

Third; Try not to use \s in this scenario. As Omaghosh points out, it also includes newlines (\r and \n). The scenario you mentioned wouldn't seem to favor that. However, also as Omaghosh points out, you may want more than just keyboard spaces. So you can use either [ ], [\s-[\r\n]], or [\f\t\v\u00A0\u2028\u2029\u0020] depending on what you fancy. The last two in those options are the same thing, but character class subtraction only works in .NET and a couple other weird flavors.

Fourth; This is a commonly over-built pattern: (\s*...\s*)*. It doesn't make any sense. It is the same as this: (\s*\s*...)* or this: (\s*\s*\s*\s*...)*. Because the pattern is repeating. The only argument against what I'm saying is that you'd be guaranteed to capture the spaces prior to the .... But not once is that ever actually wanted. Worst-case scenario, you might see this: \s*(...\s*)*

Omaghosh had the closest answer, but this is the shortest correct answer:

Regex.Match(input, @"(?:\d[ ]*){8,}").Groups[0].Value;

Or the following, if we take the question literally that the six options are in the same text on multiple lines:

Regex.Match(input, @"(?m)^(?:\d[ ]*){8,}$").Groups[0].Value;

Or the following, if it is part of a bigger regex and needs a group:

Regex.Match(input, @"...((?:\d[ ]*){8,})...").Groups[1].Value;

And feel free to replace the [ ] with a .NET Class Subtraction, or a Non-.NET explicit whitespace class:

@"(?:\d[\s-[\r\n]]*){8,}"
// Or . . .
@"(?:\d[\f\t\v\u00A0\u2028\u2029\u0020]*){8,}"


(\d{8,}\s+)*\d{8,}

should work

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜