How Parsing string between [STX] and [ETX] using C# - Split/Append output using Regex or String Functions
Language = C#.NET
Anything that is between [STX] and [ETX] must be accepted rest of the things must be rejected.
string startparam = "[STX]"; string endparam = "[ETX]"; String str1 = "[STX]some string 1[ETX]"; //Option 1 String str2 = "sajksajsk [STX]some string 2 [ETX] saksla"; //Option 2 String str3 = "[ETX] dksldkls [STX]some string 3 [ETX]ds ds"; //Option 3 String str4 = "dksldkls [STX]some string 4.1[ETX]ds ds [STX] some string 4.2[ETX] jdskjd"; //Option 4 /* the various strings can be appended and converted to a single string using string builder or treat them as different strings*/ ProcessString (string str , string startparam , string endparam) { //What To Write here using RegEX or String Functions in c# } /* The output after passing these to a ProcessString () */ /* Append Output To a TextBox or Append it to a String using For Loop.*/ /* Output Required */ some string 1 some string 2 some string 3 some string 4.1 some string 4.2
=============================================================================
EDIT 2
Language = C#
string str = "
[STX]some string 1[ETX]
sajksajsk [STX]some string 2 [ETX] saksla
[ETX] dksldkls [STX]some string 3 [ETX]ds ds
dksldk[STX]ls [STX]some st[ETX]ring 4.1[ETX]ds ds [S开发者_如何学GoTX]some string 4.2[ETX] jdskjd";
How can i get the same output if the string array is one single string
/* output */
some string 1
some string 2
some string 3
some string 4.1
some string 4.2
/*case 1*/
the above string can be "[STX] djkdsj [STX]dskd1[ETX] dsnds[ETX]"
the output should be just "dskd1"
/*case 2*/
the above string can be "[STX] djkdsj [STX]dskd1[ETX] ddd"
the output should be just "dskd1"
/*case 3*/
the above string can be " kdsj [STX]dskd1[ETX] dsnds[ETX]"
the output should be just "dskd1"
/*case 4*/
the above string can be "[STX] djk[STX]dsj [STX]dskd2[ETX] ddd"
the output should be just "dskd2"
The real problem comes when [STX] followed by [STX] i want to consider the newer [STX] and start string processing from the newer [STX] occurance. Eg. Case 2 above
=============================================================================
EDIT 3 : New Request
Language = C#
If i want the data between [STX] and [STX] also can that also be done.
New RegEx which will extract data between 1. [STX] some Data [STX] 2. [STX] some Data [ETX]
Eg.
/* the above string can be */
"[STX] djk[STX]dsj [STX]dskd2[ETX] ddd"
/* the output should be just */
djk
dsj
dskd2
As [STX] means a transmission has been started so i want to extract data between STX as well.
This works for me:
string[] sepValues = input.Split(new char[] {'\u0002', '\u0003'},
StringSplitOptions.RemoveEmptyEntries);
(?<=\[STX\])(?:(?!\[STX\]).)*?(?=\[ETX\])
matches any text (except newlines) between [STX]
and [ETX]
:
(?<=\[STX\]) # Are we right after [STX]? If so,...
(?: # match 0 or more of the following:
(?!\[STX\]) # (as long as it's not possible to match [STX] here)
. # exactly one character
)*? # repeat as needed until...
(?=\[ETX\]) # there is a [ETX] ahead.
This will always match somestring
in each of the following:
blah blah [STX]somestring[ETX] blah blah
[STX]somestring[ETX] blah [STX]somestring[ETX] (hey, two matches here!)
[STX] not this! [STX]somestring[ETX] not this either! [ETX]
blah [ETX] [STX]somestring[ETX] [STX] bla bla
A full reference on positive/negative lookbehind and lookahead assertions (three of which are used in this regex) can be found in Jan Goyvaerts' excellent regular expression tutorial at http://www.regular-expressions.info/lookaround.html.
Try this:
Regex regex = new Regex(@"\[STX\](.*?)\[ETX\]", RegexOptions.IgnoreCase);
And then just pick out the group to get the string between the tags
EDIT: to fit your updated requirements you should use this pattern that takes advantage of look-arounds to skip all STX groups except the last one that has an ETX after it:
string pattern = @"(?<=\[STX])?.*\[STX]\s*(.+?)\s*\[ETX].*?";
Here's a complete example:
string input = @"[STX]some string 1[ETX]
sajksajsk [STX]some string 2 [ETX] saksla
[ETX] dksldkls [STX]some string 3 [ETX]ds ds
dksldkls [STX]some string 4.1[ETX]ds ds [STX] some string 4.2[ETX] jdskjd
[STX] djkdsj [STX]dskd1[ETX] dsnds[ETX]
[STX] djkdsj [STX]dskd1[ETX] ddd
kdsj [STX]dskd1[ETX] dsnds[ETX]
[STX] djk[STX]dsj [STX]dskd2[ETX] ddd";
string pattern = @"(?<=\[STX])?.*\[STX]\s*(.+?)\s*\[ETX].*?";
foreach(Match m in Regex.Matches(input, pattern))
{
// result will be in first group
Console.WriteLine(m.Groups[1].Value);
}
I also added the \s*
between the grouping to eliminate extra whitespace. By doing so you no longer need to use Trim()
as I suggested in my earlier response below.
PREVIOUS RESPONSE
This pattern should fit: "\[STX](.+?)\[ETX]"
Notice that the opening bracket, [
, must be escaped to prevent it from being interpreted as a character class in regex. The closing bracket, ]
need not be escaped. The (.+?)
is a capturing group (due to the parentheses) and matches at least one character in a non-greedy fashion (via the ?
). By being non-greedy it prevents the regex engine from greedily matching multiple occurrences and content till the last "[ETX]" occurrence. Remove the ?
and you'll see what I mean in your str4
example. Since your last example has multiple occurrences you can use the Matches method.
string[] inputs =
{
"[STX]some string 1[ETX]",
"sajksajsk [STX]some string 2 [ETX] saksla",
"[ETX] dksldkls [STX]some string 3 [ETX]ds ds",
"dksldkls [STX]some string 4.1[ETX]ds ds [STX] some string 4.2[ETX] jdskjd"
};
string pattern = @"\[STX](.+?)\[ETX]";
foreach (string input in inputs)
{
Console.WriteLine("Input: " + input);
foreach(Match m in Regex.Matches(input, pattern))
{
// result will be in first group
Console.WriteLine(m.Groups[1].Value);
}
Console.WriteLine();
}
You might consider using a Trim()
to trim any excess spaces (m.Groups[1].Value.Trim()
). It's possible to achieve in the pattern but complicates it unnecessarily. Use the overload that accepts RegexOptions.IgnoreCase
if you need to ignore the case of the "STX" and "ETX" text (if they aren't always in uppercase form).
精彩评论