How Parsing string between [STX] and [ETX] using C# - Split/Append output using Regex or String Functions

2023-01-18 06:12 问答作者：

Language = C#.NET

Anything that is between [STX] and [ETX] must be accepted rest of the things must be rejected.

string startparam = "[STX]";
string endparam = "[ETX]";

String str1 = "[STX]some string 1[ETX]"; //Option 1
String str2 = "sajksajsk [STX]some string 2 [ETX] saksla"; //Option 2
String str3 = "[ETX] dksldkls [STX]some string 3 [ETX]ds ds"; //Option 3
String str4 = "dksldkls [STX]some string 4.1[ETX]ds ds [STX] some string 4.2[ETX] jdskjd"; //Option 4

/* the various strings can be appended and converted to a single 
   string using string builder or treat them as different strings*/

ProcessString (string str , string startparam , string endparam)
{
   //What To Write here using RegEX or String Functions in c#

}

/* The output after passing these to a ProcessString () */     
/* Append Output To a TextBox or Append it to a String using For Loop.*/

/* Output Required */

some string 1 
some string 2
some string 3
some string 4.1 
some string 4.2

=============================================================================

EDIT 2

Language = C#

string str = "
[STX]some string 1[ETX]
sajksajsk [STX]some string 2 [ETX] saksla
[ETX] dksldkls [STX]some string 3 [ETX]ds ds
dksldk[STX]ls [STX]some st[ETX]ring 4.1[ETX]ds ds [S开发者_如何学GoTX]some string 4.2[ETX] jdskjd";

How can i get the same output if the string array is one single string

/* output */
some string 1 
some string 2
some string 3
some string 4.1 
some string 4.2


/*case 1*/ 
the above string can be "[STX] djkdsj [STX]dskd1[ETX] dsnds[ETX]" 
the output should be just "dskd1"

/*case 2*/ 
the above string can be "[STX] djkdsj [STX]dskd1[ETX] ddd" 
the output should be just "dskd1"

/*case 3*/ 
the above string can be " kdsj [STX]dskd1[ETX] dsnds[ETX]" 
the output should be just "dskd1"

/*case 4*/ 
the above string can be "[STX] djk[STX]dsj [STX]dskd2[ETX] ddd" 
the output should be just "dskd2"

The real problem comes when [STX] followed by [STX] i want to consider the newer [STX] and start string processing from the newer [STX] occurance. Eg. Case 2 above

=============================================================================

EDIT 3 : New Request

Language = C#

If i want the data between [STX] and [STX] also can that also be done.

New RegEx which will extract data between 1. [STX] some Data [STX] 2. [STX] some Data [ETX]

Eg.

/* the above string can be */
"[STX] djk[STX]dsj [STX]dskd2[ETX] ddd" 
/* the output should be just */
djk
dsj
dskd2

As [STX] means a transmission has been started so i want to extract data between STX as well.

This works for me:

string[] sepValues = input.Split(new char[] {'\u0002', '\u0003'},
                                 StringSplitOptions.RemoveEmptyEntries);

(?<=\[STX\])(?:(?!\[STX\]).)*?(?=\[ETX\])

matches any text (except newlines) between [STX] and [ETX]:

(?<=\[STX\])  # Are we right after [STX]? If so,...
(?:           # match 0 or more of the following:
 (?!\[STX\])  # (as long as it's not possible to match [STX] here)
 .            # exactly one character
 )*?          # repeat as needed until...
(?=\[ETX\])   # there is a [ETX] ahead.

This will always match somestring in each of the following:

blah blah [STX]somestring[ETX] blah blah
[STX]somestring[ETX] blah [STX]somestring[ETX] (hey, two matches here!)
[STX] not this! [STX]somestring[ETX] not this either! [ETX]
blah [ETX] [STX]somestring[ETX] [STX] bla bla

A full reference on positive/negative lookbehind and lookahead assertions (three of which are used in this regex) can be found in Jan Goyvaerts' excellent regular expression tutorial at http://www.regular-expressions.info/lookaround.html.

Try this:

Regex regex = new Regex(@"\[STX\](.*?)\[ETX\]", RegexOptions.IgnoreCase);

And then just pick out the group to get the string between the tags

EDIT: to fit your updated requirements you should use this pattern that takes advantage of look-arounds to skip all STX groups except the last one that has an ETX after it:

string pattern = @"(?<=\[STX])?.*\[STX]\s*(.+?)\s*\[ETX].*?";

Here's a complete example:

string input = @"[STX]some string 1[ETX]
sajksajsk [STX]some string 2 [ETX] saksla
[ETX] dksldkls [STX]some string 3 [ETX]ds ds
dksldkls [STX]some string 4.1[ETX]ds ds [STX] some string 4.2[ETX] jdskjd
[STX] djkdsj [STX]dskd1[ETX] dsnds[ETX]
[STX] djkdsj [STX]dskd1[ETX] ddd
kdsj [STX]dskd1[ETX] dsnds[ETX] 
[STX] djk[STX]dsj [STX]dskd2[ETX] ddd";

string pattern = @"(?<=\[STX])?.*\[STX]\s*(.+?)\s*\[ETX].*?";

foreach(Match m in Regex.Matches(input, pattern))
{
    // result will be in first group
    Console.WriteLine(m.Groups[1].Value);
}

I also added the \s* between the grouping to eliminate extra whitespace. By doing so you no longer need to use Trim() as I suggested in my earlier response below.

PREVIOUS RESPONSE

This pattern should fit: "\[STX](.+?)\[ETX]"

Notice that the opening bracket, [, must be escaped to prevent it from being interpreted as a character class in regex. The closing bracket, ] need not be escaped. The (.+?) is a capturing group (due to the parentheses) and matches at least one character in a non-greedy fashion (via the ?). By being non-greedy it prevents the regex engine from greedily matching multiple occurrences and content till the last "[ETX]" occurrence. Remove the ? and you'll see what I mean in your str4 example. Since your last example has multiple occurrences you can use the Matches method.

string[] inputs =
{
    "[STX]some string 1[ETX]",
    "sajksajsk [STX]some string 2 [ETX] saksla",
    "[ETX] dksldkls [STX]some string 3 [ETX]ds ds",
    "dksldkls [STX]some string 4.1[ETX]ds ds [STX] some string 4.2[ETX] jdskjd"
};

string pattern = @"\[STX](.+?)\[ETX]";

foreach (string input in inputs)
{
    Console.WriteLine("Input: " + input);
    foreach(Match m in Regex.Matches(input, pattern))
    {
        // result will be in first group
        Console.WriteLine(m.Groups[1].Value);
    }

      Console.WriteLine();
}

You might consider using a Trim() to trim any excess spaces (m.Groups[1].Value.Trim()). It's possible to achieve in the pattern but complicates it unnecessarily. Use the overload that accepts RegexOptions.IgnoreCase if you need to ignore the case of the "STX" and "ETX" text (if they aren't always in uppercase form).

继续阅读：parsing regex string

How Parsing string between [STX] and [ETX] using C# - Split/Append output using Regex or String Functions

EDIT 2

EDIT 3 : New Request

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

EDIT 2

EDIT 3 : New Request

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？