Ignoring white space for a Regex match

2023-01-07 16:28 问答作者：

I need to match 8 or more digits, the sequence of which can include spaces.

for example, all of the below would be valid matches.

12345678
1 2345678
12 3 45678
1234 5678
12 34567 8
1 2 3 4 5 6 7 8

At the moment I have \d{8,} but this will only capture a solid block of 8 or more digits.

[\d\s]{8,} will n开发者_如何学Pythonot work as I don't want white space to contribute to the count of chars captured.

(\d *){8,}

It matches eight or more occurrences of a digit followed by zero or more spaces. Change it to

( *\d *){8,}  #there is a space before first asterik

to match strings with spaces in the beginning. Or

(\s*\d\s*){8,}

to match tabs and other white space characters (that includes newlines too).

Finally, make it a non-capturing group with ?:. Thus it becomes (?:\s*\d\s*){8,}

Waayy later, but this really needs the correct answer on it, and a reason why. Who knew this question could have such a complex answer, right? Lol. But there are plenty of considerations surrounding spacing in regex.

Firstly; Never put a space in a regex. Doing so will make your regex unreadable, and unmaintainable. Memories of using a mouse to highlight a space to ensure it was only one space comes to mind. This will break your regex: , but this won't: [ ], because repetition in a character class is ignored. And if you require an exact number of spaces, you can actually see that in a character class like so: [ ]{3}. Versus accidents without the character class like so: {3} <-- This is actually looking for 5 spaces, woops!

Second; Keep the Freespacing (?x) option in mind, which makes your regex commentable and free-spaceable. You shouldn't fear that somebody using that option might break your regex because you decided to put random keyboard spaces in it. Also, (?x) will not ignore the keyboard space when it's inside a character class like so: [ ]. It is therefore safer to use character classes for your keyboard spaces.

Third; Try not to use \s in this scenario. As Omaghosh points out, it also includes newlines (\r and \n). The scenario you mentioned wouldn't seem to favor that. However, also as Omaghosh points out, you may want more than just keyboard spaces. So you can use either [ ], [\s-[\r\n]], or [\f\t\v\u00A0\u2028\u2029\u0020] depending on what you fancy. The last two in those options are the same thing, but character class subtraction only works in .NET and a couple other weird flavors.

Fourth; This is a commonly over-built pattern: (\s*...\s*)*. It doesn't make any sense. It is the same as this: (\s*\s*...)* or this: (\s*\s*\s*\s*...)*. Because the pattern is repeating. The only argument against what I'm saying is that you'd be guaranteed to capture the spaces prior to the .... But not once is that ever actually wanted. Worst-case scenario, you might see this: \s*(...\s*)*

Omaghosh had the closest answer, but this is the shortest correct answer:

Regex.Match(input, @"(?:\d[ ]*){8,}").Groups[0].Value;

Or the following, if we take the question literally that the six options are in the same text on multiple lines:

Regex.Match(input, @"(?m)^(?:\d[ ]*){8,}$").Groups[0].Value;

Or the following, if it is part of a bigger regex and needs a group:

Regex.Match(input, @"...((?:\d[ ]*){8,})...").Groups[1].Value;

And feel free to replace the [ ] with a .NET Class Subtraction, or a Non-.NET explicit whitespace class:

@"(?:\d[\s-[\r\n]]*){8,}"
// Or . . .
@"(?:\d[\f\t\v\u00A0\u2028\u2029\u0020]*){8,}"

(\d{8,}\s+)*\d{8,}

should work

继续阅读：.net regex

Ignoring white space for a Regex match

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？