开发者

A probably simple regex expression

I am a complete newb when it comes to regex, and would like help to make an expression to match in the following:

 {ValidFunctionName}({parameter}:"{value}")

 {ValidFunctionName}({parameter}:"{value}",
                     {parameter}:"{value}")

 {ValidFunctionName}()

Where {x} is what I want to match, {parameter} can be anything $%"$ for example and {value} must be enclosed in quotation marks.

ThisIsValid_01(a:"40")

would be "ThisIsValid_01", "a", "40"

ThisIsValid_01(a:"40", b:"ZOO")

would be "ThisIsValid_01", "a", "40", "b", "ZOO"

01_ThisIsntValid(a:"40")

wouldn't return anything

ThisIsntValid_02(a:40)

wouldn't return anything, as 40 is not enclosed in quotation marks.

ThisIsValid_02()

would return "ThisIsValid_02"

For a valid function name I came across: "[A-Za-z_][A-Za-z_0-9]*" But I can't for the life of me figure out how to match the rest. I've been playing around on http://regexpal.com/ to try to get valid matches to all conditions, but to no avail开发者_如何转开发 :(

It would be nice if you kindly explained the regex too, so I can learn :)


EDIT: This will work, uses 2 regexs. The first get the function name and everything inside it, the second extracts each pair of params and values from what's inside the function's brackets. You cannot do this with a single regex. Add some [ \t\n\r]* for whitespace.

Regex r = new Regex(@"(?<function>\w[\w\d]*?)\((?<inner>.*?)\)");
Regex inner = new Regex(@",?(?<param>.+?):""(?<value>[^""]*?)""");
string input = "_test0(a:\"lolololol\",b:\"2\") _test1(ghgasghe:\"asjkdgh\")";

List<List<string>> matches = new List<List<string>>();

MatchCollection mc = r.Matches(input);
foreach (Match match in mc)
{
    var l = new List<string>();
    l.Add(match.Groups["function"].Value);
    foreach (Match m in inner.Matches(match.Groups["inner"].Value))
    {
         l.Add(m.Groups["param"].Value);
         l.Add(m.Groups["value"].Value);
    }
    matches.Add(l);
}

(Old) Solution

(?<function>\w[\w\d]*?)\((?<param>.+?):"(?<value>[^"]*?)"\)

(Old) Explanation

Let's remove the group captures so it is easier to understand: \w[\w\d]*?\(.+?:"[^"]?"\)

\w is the word class, it is short for [a-zA-Z_]
\d is the digit class, it is short for [0-9]

  1. \w[\w\d]*? Makes sure there is valid word character for the start of the function, and then matches zero or more further word or digit characters.

  2. \(.+? Matches a left bracket then one or more of any characters (for the parameter)

  3. :"[^"]*?"\) Matches a colon, then the opening quote, then zero or more of any character except quotes (for the value) then the close quote and right bracket.

Brackets (or parens, as some people call them) as escaped with the backslashes because otherwise they are capturing groups.

The (?<name> ) captures some text.

The ? after each the * and + operators makes them non-greedy, meaning that they will match the least, rather than the most, amount of text.

(Old) Use

Regex r = new Regex(@"(?<function>\w[\w\d]*?)\((?<param>.+?):""(?<value>[^""]*?)""");
string input = "_test0(aa%£$!:\"lolololol\") _test1(ghgasghe:\"asjkdgh\")";

List<string[]> matches = new List<string[]>();

if(r.IsMatch(input))
{
    MatchCollection mc = r.Matches(input);
    foreach (Match match in mc)
    matches.Add(new[] { match.Groups["function"].Value, match.Groups["param"].Value, match.Groups["value"].Value });
}

EDIT: Now you've added an undefined number of multiple parameters, I would recommend making your own parser rather than using regexs. The above example only works with one parameter and strictly no whitespace. This will match multiple parameters with strict whitespace but will not return the parameters and values:

\w[\w\d]*?\(.+?:"[^"]*?"(,.+?:"[^"]*?")*\)

Just for fun, like above but with whitepace:

\w[\w\d]*?[ \t\r\n]*\([ \t\r\n]*.+?[ \t\r\n]*:[ \t\r\n]*"[^"]*?"([ \t\r\n]*,[ \t\r\n]*.+?[ \t\r\n]*:[ \t\r\n]*"[^"]*?")*[ \t\r\n]*\)

Capturing the text you want will be hard, because you don't know how many captures you are going to have and as such regexs are unsuited.


Someone else has already given an answer that gives you a flat list of strings, but in the interest of strong typing and proper class structure, I’m going to provide a solution that encapsulates the data properly.

First, declare two classes:

public class ParamValue         // For a parameter and its value
{
    public string Parameter;
    public string Value;
}
public class FunctionInfo       // For a whole function with all its parameters
{
    public string FunctionName;
    public List<ParamValue> Values;
}

Then do the matching and populate a list of FunctionInfos:

(By the way, I’ve made some slight fixes to the regexes... it will now match identifiers correctly, and it will not include the double-quotes as part of the “value” of each parameter.)

Regex r = new Regex(@"(?<function>[\p{L}_]\w*?)\((?<inner>.*?)\)");
Regex inner = new Regex(@",?(?<param>.+?):""(?<value>[^""]*?)""");
string input = "_test0(a:\"lolololol\",b:\"2\") _test1(ghgasghe:\"asjkdgh\")";

var matches = new List<FunctionInfo>();

if (r.IsMatch(input))
{
    MatchCollection mc = r.Matches(input);
    foreach (Match match in mc)
    {
        var l = new List<ParamValue>();

        foreach (Match m in inner.Matches(match.Groups["inner"].Value))
            l.Add(new ParamValue
            {
                Parameter = m.Groups["param"].Value,
                Value = m.Groups["value"].Value
            });

        matches.Add(new FunctionInfo
        {
            FunctionName = match.Groups["function"].Value,
            Values = l
        });
    }
}

Then you can access the collection nicely with identifiers like FunctionName:

foreach (var match in matches)
{
    Console.WriteLine("{0}({1})", match.FunctionName,
        string.Join(", ", match.Values.Select(val =>
            string.Format("{0}: \"{1}\"", val.Parameter, val.Value))));
}


Try this:

^\s*(?<FunctionName>[A-Za-z][A-Za-z_0-9]*)\(((?<parameter>[^:]*):"(?<value>[^"]+)",?\s*)*\)
  • ^\s*(?<FunctionName>[A-Za-z][A-Za-z_0-9]*) matches the function name, ^ means start of the line, so that the first character in string must match. You can keep you remove the whitespace capture if you don't need it, I just added it to make the match a little more flexible.
  • The next set \(((?<parameter>[^:]*):"(?<value>[^"]+)",?)*\) means capture each parameter-value pair inside the parenthesis. You have to escape the parenthesis for the function since they are symbols within the regex syntax.

The ?<> inside parenthesis are named capture groups, which when supported by a library, as they are in .NET, make grabbing the groups in the matches a little easier.


Here:

\w[\w\d]*\s*\(\s*(?:(\w[\w\d]*):("[^"]*"|\d+))*\s*\)

Visualization of that regex here.


For Problems like that I always suggest people not to "find" a single regex but to write multiple regex sharing the work.

But here is my quick shot:

(?<funcName>[A-Za-z_][A-Za-z_0-9]*)
\(
    (?<ParamGroup>
        (?<paramName>[^(]+?)
        :
        "(?<paramValue>[^"]*)"
        ((,\s*)|(?=\)))
    )*
\)

The whitespaces are there for better readability. Remove them or set the option to ignore pattern whitespaces.


This regex passes all your test cases:

^(?<function>[A-Za-z][\w]*?)\(((?<param>[^:]*?):"(?<value>[^"]*?)",{0,1}\s*)*\)$

This works on multiple parameters and no parameters. It also handles special characters in the param name and whitespace after the comma. There may need to be some adjustments as your test cases do not cover everything you indicate in your text.

Please note that \w usually includes digits and is not appropriate as the leading character of the function name. Reference: http://www.regular-expressions.info/charclass.html#shorthand

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜