C# reliable way to pattern match?

2022-12-17 23:51 问答作者：

At the moment I am trying to match patterns such as

text text date1 date2

So I have regular expressions that do just that. However, the issue is for example if users input data with say more than 1 whitespace or if they put some of the text in a new line etc the pattern does not get picked up because it doesn't exactly match the pattern set.

Is there a more reliable way for pattern matching? The goal is to make it very simple for the user to write but make it easily matchable on my end. I was considering stripping out all the whitespace/newlines etc and then trying to match the pattern with no spaces i.e. texttextdate1date2.

Anyone got any better solutions?

Update

Here is a small example of the pattern I would need to match:

FIND me@test.com 01/01/2010 to 10/01/2010

开发者_开发百科

Here is my current regex:

FIND [A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4} [0-9]{1,2}/[0-9]{1,2}/[0-9]{2,4} to [0-9]{1,2}/[0-9]{1,2}/[0-9]{2,4}

This works fine 90% of the time, however, if users submit this information via email it can have all different kinds of formatting and HTML I am not interested in. I am using a combination of the HtmlAgilityPack and a HTML tag removing regex to strip all the HTML from the email, but even at that I can't seem to get a match on some occassions.

I believe this could be a more parsing related question than pattern matching, but I think maybe there is a better way of doing this...

To match at least one or more whitespace characters (space, tab, newline), use:

\s+

Substitute the above wherever you have the physical space in your pattern and you should be fine.

Example of matching multiple groups in a text with multiple whitespaces and/or newlines.

var txt = "text text   date1\ndate2";
var matches = Regex.Match(txt, @"([a-z]+)\s+([a-z]+)\s+([a-z0-9]+)\s+([a-z0-9]+)", RegexOptions.Singleline);

matches.Groups[n].Value with n from 1 to 4 will contain your matches.

I would split the string into a string array and match each resulting string to the necessary Regular Expression.

\b(text)[\s]+(text)[\s]+(date1)[\s]+(date2)\b

Its a nasty expression but here is something that will work for the input you provided:

^(\w+)\s+([\w@.]+)\s+(\d{2}\/\d{2}\/\d{4})[^\d]+(\d{2}\/\d{2}\/\d{4})$

This will work with variable amounts of whitespace between the capture groups as well.

Through ORegex you can tokenize your string and just pattern match on token sequences:

var tokens = input.Split(new[]{' ','\t','\n','\r'}, StringSplitOptions.RemoveEmptyEntries);
var oregex = new ORegex<string>("{0}{0}{1}{1}", IsText, IsDate);

var matches = oregex.Matches(tokens); //here is your subsequence tokens.

...

public bool IsText(string str)
{
    ...
}

public bool IsDate(string str)
{
    ...
}

继续阅读：parsing regex

C# reliable way to pattern match?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

Best solution for private video database [closed]

国内夏季避暑旅游胜地有哪些？

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?