Regular expression to cherry pick a multiline component of a paragraph sitting between tags (Not html)

2023-02-08 01:56 问答作者：

In the following I need a Regexpr to capture the part between the <tagstart></tagstart>

Please note this is not html.

* real time results: shows results as you type 
* code hinting: roll over your expression to see info on specific elements 
* detailed results: roll over a match to see details & view group info below 
* built in regex guide: doub<tagstart>le click entries to insert them into your expression 
* online & desktop: regexr.com or download the desktop version for Mac, Windows, or Linux 
* save your expressions: My Saved expr</tagstart>essions are saved locally 
* search Community expressions and add your own

Thanks

EDIT: As @Kobi correctly points out in the comments, the much simpler version of the original post below is of course:

<(tagstart)>(.*?)</\1>

Since the original version also works and all the other statements remain true, I'll leave it as it is.

If (and only if) the tags cannot be nested:

<(tagstart)>((?:(?!</\1>).)*)</\1>

Explanation:

<(tagstart)>      # matches "<tagstart>" and stores "tagstart" in group 1
(                 # begin group 2
  (?:             #   begin non-capturing group
    (?!           #     begin negative look-ahead (... not followed by)
      </\1>       #       a closing tag with the same name as group 1
    )             #     end negative look-ahead
    .             #     if ok, match the next character
  )*              #   end non-capturing group, repeat
)                 # end group 2 (stores everything between the tags)
</\1>             # a closing tag with the same name as group 1

The regex needs to be applied in "single line" mode (sometimes called "dotall" mode). Either that or you substitute the . for [\s\S].

To generically match text between any two equally named tags, use <(\w+)> instead of <(tagstart)>.

Depending on your regex flavor, some things may work differently, like $1 instead of \1 for back-references, or meta-characters that need additional escaping.

See a Rubular demo.

Maybe this regexp: (\<tagstart\>)(.+)(\<\/tagstart\>)/s would help you? The second match would be what you are searching for. See demo for details.

#!/usr/bin/perl -w

undef $/;

$_ = <>;

m|<(.*?)>(.*)</\1>|s;

print $2;

If you really need just <tagstart>, replace the bits like <(.*?)> with <tagstart> and similar for closing. The undef $/ bit lets you slurp in a lot with a single read, and the $2 selects the second match group. The s and the end of the regex asks for . to match even new-line characters.

继续阅读：regex tags

Regular expression to cherry pick a multiline component of a paragraph sitting between tags (Not html)

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？