开发者

How to extract the strings between two special characters using Regular Expressions in C#

I am totally new to regular expressions. And what I need to achieve is, I have a string variable containing the following string for example,

"My Name is #P_NAME# and I am #P_AGE# years old"

I need to extract the two strings P_NAME and P_AGE using regular expressions (to a string array or two string variables etc). i.e. the string starts with a # and ends with a # and I need to extract the middle part.

How can I do this in C# using Regular Expressions..?

And how can I extract the same above in case I have a new line character in between as well. i.e. for example,

"My Name is #P_NAME# and \r\n I am #P_AGE# years old".

Thanks

Thanks Everyone...

Following worked for me... I cannot publish my own answer as the answer until 8 hours expires in stackoverflow... :)

string str = "My Name is #P_NAME# and \r\n I am #P_AGE# years old";

MatchCollection allMatchResults = null;
var regexObj = new Regex(@"#\w*#");
allMatchResults = regexObj.Matches(str);

'allMatchResults' contains #P_NAME# and #P_AGE# (i.e. including # character). But having it helps my othe开发者_如何学Pythonr logics than not having it.


You can do it like this

using System.Text.RegularExpressions;
using System;

public class Test
{
        public static void Main(){
                string s = "My name is #Dave# and I am #18# years old";
                Regex r = new Regex(@"#(.+?)#");
                MatchCollection mc = r.Matches(s);
                Console.WriteLine("Name is " + mc[0].Groups[1].Value);
                Console.WriteLine("Age is " + mc[1].Groups[1].Value);
        }
}

Demo here

I don't know what your application is but I must say this is not a very robust looking data transfer method. Start getting a few extra #s in there and it all goes wrong. For example people with # in their names!

However if you can guarantee that you will always be working with a string of this format then this does work.

Explanation of Regex #(.+?)#

First # matches a #

( begins a group. Indexed into in .Groups[1] in the code. [0] is the whole match eg #Dave# not just Dave

.+? matches at least one character. . is a character. + is repetition (at least once). And ? tells the regex engine to be lazy - so don't match a # as that will get matched by our final #

) close the group

# matches another # - the 'closing' one in this case


A regular expression such as "#[^#]+#" would match a hash, followed by one or more none-hash characters, followed by another hash.

There are various alternatives that would work for this such as "#.*?#".

The following code would output the #P_NAME# and #P_AGE#.

string p = "My Name is #P_NAME# and \r\n I am #P_AGE# years old";
Regex reg = new Regex("#[^#]+#");

MatchCollection matches = reg.Matches(p);
foreach (Match m in matches)
{
    Console.WriteLine(m.Value);
}


Here's an extension method based on this... enjoy. :)

BTW - this does not keep the # characters - something I didn't want - you can change the RegEx to those above to get that.

public static class StringExtensions
{
    ///----------------------------------------------------------------------
    /// <summary>
    /// Gets the matches between delimiters.
    /// </summary>
    /// <param name="source">The source string.</param>
    /// <param name="beginDelim">The beginning string delimiter.</param>
    /// <param name="endDelim">The end string delimiter.</param>
    /// <returns></returns>
    /// <example>
    /// string beginDelim = "<span>";
    /// string endDelim = "</span>";
    /// string input = string.Format("My Name is {0}Lance{1} and I am {0}39{1} years old", beginDelim, endDelim);
    ///
    /// var values = input.GetMatches(beginDelim, endDelim);
    /// foreach (string value in values)
    /// {
    ///     Console.WriteLine(value);
    /// }
    /// </example>
    ///----------------------------------------------------------------------
    public static IEnumerable<string> GetMatches(this string source, string beginDelim, string endDelim)
    {
        Regex reg = new Regex(string.Format("(?<={0})(.+?)(?={1})", Regex.Escape(beginDelim), Regex.Escape(endDelim)));
        MatchCollection matches = reg.Matches(source);
        return (from Match m in matches select m.Value).ToList();
    }
}


Try -

var results = new List<string>();
var subjectString = "My Name is #P_NAME# and \r\n I am #P_AGE# years old";
Regex regexObj = new Regex("#.+?#");
Match matchResults = regexObj.Match(subjectString);
while (matchResults.Success) {
    results.Add(matchResults.ToString().Replace("#",""));
    matchResults = matchResults.NextMatch();
}

This should write the results to the results array.


Thanks everyone..

Following worked for me...

string str = "My Name is #P_NAME# and \r\n I am #P_AGE# years old";

MatchCollection allMatchResults = null;
var regexObj = new Regex(@"#\w*#");
allMatchResults = regexObj.Matches(str);

'allMatchResults' contains #P_NAME# and #P_AGE# (i.e. including # character). But having it helps my other logic


No one mentioned multi-line cases, so if you have multi-line string, like:

var testcase = @"Here is my info
#
John Doe
18 years old
#";
var regex = new Regex(@"#(.+?)#", RegexOptions.Singleline);
var match = regex.Match(testcase);
match.Groups[1].Value.Dump();

// OR

var matches = regex.Matches(testcase);
foreach (Match m in matches) m.Groups[1].Value.Dump();

/*
Output:
John Doe
18 years old
*/

You need to specify the SingleLine flag to ignore newline characters and escape the forward slash.

answer posted for future readers


Try using

var format = "My Name is #P_NAME# and \r\n I am #P_AGE# years old";
Regex rgxp = new Regex(@"#[(?<name>\S+)\]#", RegexOptions.Compiled);
Match m = rgxp .Match(format);
if (true == m.Success)
{
   return m.Groups["name"].Value;     // <-- this statement returns the value you're looking for
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜