How to extract the strings between two special characters using Regular Expressions in C#
I am totally new to regular expressions. And what I need to achieve is, I have a string variable containing the following string for example,
"My Name is #P_NAME# and I am #P_AGE# years old"
I need to extract the two strings P_NAME and P_AGE using regular expressions (to a string array or two string variables etc). i.e. the string starts with a # and ends with a # and I need to extract the middle part.
How can I do this in C# using Regular Expressions..?
And how can I extract the same above in case I have a new line character in between as well. i.e. for example,
"My Name is #P_NAME# and \r\n I am #P_AGE# years old".
Thanks
Thanks Everyone...
Following worked for me... I cannot publish my own answer as the answer until 8 hours expires in stackoverflow... :)
string str = "My Name is #P_NAME# and \r\n I am #P_AGE# years old";
MatchCollection allMatchResults = null;
var regexObj = new Regex(@"#\w*#");
allMatchResults = regexObj.Matches(str);
'allMatchResults' contains #P_NAME# and #P_AGE# (i.e. including # character). But having it helps my othe开发者_如何学Pythonr logics than not having it.
You can do it like this
using System.Text.RegularExpressions;
using System;
public class Test
{
public static void Main(){
string s = "My name is #Dave# and I am #18# years old";
Regex r = new Regex(@"#(.+?)#");
MatchCollection mc = r.Matches(s);
Console.WriteLine("Name is " + mc[0].Groups[1].Value);
Console.WriteLine("Age is " + mc[1].Groups[1].Value);
}
}
Demo here
I don't know what your application is but I must say this is not a very robust looking data transfer method. Start getting a few extra #
s in there and it all goes wrong. For example people with #
in their names!
However if you can guarantee that you will always be working with a string of this format then this does work.
Explanation of Regex #(.+?)#
First #
matches a #
(
begins a group. Indexed into in .Groups[1]
in the code. [0]
is the whole match eg #Dave#
not just Dave
.+?
matches at least one character. .
is a character. +
is repetition (at least
once). And ?
tells the regex engine to be lazy - so don't match a #
as that will get matched by our final #
)
close the group
#
matches another #
- the 'closing' one in this case
A regular expression such as "#[^#]+#"
would match a hash, followed by one or more none-hash characters, followed by another hash.
There are various alternatives that would work for this such as "#.*?#"
.
The following code would output the #P_NAME# and #P_AGE#.
string p = "My Name is #P_NAME# and \r\n I am #P_AGE# years old";
Regex reg = new Regex("#[^#]+#");
MatchCollection matches = reg.Matches(p);
foreach (Match m in matches)
{
Console.WriteLine(m.Value);
}
Here's an extension method based on this... enjoy. :)
BTW - this does not keep the # characters - something I didn't want - you can change the RegEx to those above to get that.
public static class StringExtensions
{
///----------------------------------------------------------------------
/// <summary>
/// Gets the matches between delimiters.
/// </summary>
/// <param name="source">The source string.</param>
/// <param name="beginDelim">The beginning string delimiter.</param>
/// <param name="endDelim">The end string delimiter.</param>
/// <returns></returns>
/// <example>
/// string beginDelim = "<span>";
/// string endDelim = "</span>";
/// string input = string.Format("My Name is {0}Lance{1} and I am {0}39{1} years old", beginDelim, endDelim);
///
/// var values = input.GetMatches(beginDelim, endDelim);
/// foreach (string value in values)
/// {
/// Console.WriteLine(value);
/// }
/// </example>
///----------------------------------------------------------------------
public static IEnumerable<string> GetMatches(this string source, string beginDelim, string endDelim)
{
Regex reg = new Regex(string.Format("(?<={0})(.+?)(?={1})", Regex.Escape(beginDelim), Regex.Escape(endDelim)));
MatchCollection matches = reg.Matches(source);
return (from Match m in matches select m.Value).ToList();
}
}
Try -
var results = new List<string>();
var subjectString = "My Name is #P_NAME# and \r\n I am #P_AGE# years old";
Regex regexObj = new Regex("#.+?#");
Match matchResults = regexObj.Match(subjectString);
while (matchResults.Success) {
results.Add(matchResults.ToString().Replace("#",""));
matchResults = matchResults.NextMatch();
}
This should write the results to the results
array.
Thanks everyone..
Following worked for me...
string str = "My Name is #P_NAME# and \r\n I am #P_AGE# years old";
MatchCollection allMatchResults = null;
var regexObj = new Regex(@"#\w*#");
allMatchResults = regexObj.Matches(str);
'allMatchResults' contains #P_NAME# and #P_AGE# (i.e. including # character). But having it helps my other logic
No one mentioned multi-line cases, so if you have multi-line string, like:
var testcase = @"Here is my info
#
John Doe
18 years old
#";
var regex = new Regex(@"#(.+?)#", RegexOptions.Singleline);
var match = regex.Match(testcase);
match.Groups[1].Value.Dump();
// OR
var matches = regex.Matches(testcase);
foreach (Match m in matches) m.Groups[1].Value.Dump();
/*
Output:
John Doe
18 years old
*/
You need to specify the SingleLine
flag to ignore newline characters and escape the forward slash.
answer posted for future readers
Try using
var format = "My Name is #P_NAME# and \r\n I am #P_AGE# years old";
Regex rgxp = new Regex(@"#[(?<name>\S+)\]#", RegexOptions.Compiled);
Match m = rgxp .Match(format);
if (true == m.Success)
{
return m.Groups["name"].Value; // <-- this statement returns the value you're looking for
}
精彩评论