Regular expression to cherry pick a multiline component of a paragraph sitting between tags (Not html)
In the following I need a Regexpr to capture the part between the <tagstart></tagstart>
Please note this is not html.
* real time results: shows results as you type
* code hinting: roll over your expression to see info on specific elements
* detailed results: roll over a match to see details & view group info below
* built in regex guide: doub<tagstart>le click entries to insert them into your expression
* online & desktop: regexr.com or download the desktop version for Mac, Windows, or Linux
* save your expressions: My Saved expr</tagstart>essions are saved locally
* search Community expressions and add your own
Thanks
EDIT: As @Kobi correctly points out in the comments, the much simpler version of the original post below is of course:
<(tagstart)>(.*?)</\1>
Since the original version also works and all the other statements remain true, I'll leave it as it is.
If (and only if) the tags cannot be nested:
<(tagstart)>((?:(?!</\1>).)*)</\1>
Explanation:
<(tagstart)> # matches "<tagstart>" and stores "tagstart" in group 1
( # begin group 2
(?: # begin non-capturing group
(?! # begin negative look-ahead (... not followed by)
</\1> # a closing tag with the same name as group 1
) # end negative look-ahead
. # if ok, match the next character
)* # end non-capturing group, repeat
) # end group 2 (stores everything between the tags)
</\1> # a closing tag with the same name as group 1
The regex needs to be applied in "single line" mode (sometimes called "dotall" mode). Either that or you substitute the .
for [\s\S]
.
To generically match text between any two equally named tags, use <(\w+)>
instead of <(tagstart)>
.
Depending on your regex flavor, some things may work differently, like $1
instead of \1
for back-references, or meta-characters that need additional escaping.
See a Rubular demo.
Maybe this regexp: (\<tagstart\>)(.+)(\<\/tagstart\>)/s
would help you? The second match would be what you are searching for. See demo for details.
#!/usr/bin/perl -w
undef $/;
$_ = <>;
m|<(.*?)>(.*)</\1>|s;
print $2;
If you really need just <tagstart>
, replace the bits like <(.*?)>
with <tagstart>
and similar for closing. The undef $/
bit lets you slurp in a lot with a single read, and the $2
selects the second match group. The s
and the end of the regex asks for .
to match even new-line characters.
精彩评论