Regular expression, find a word between two words
I have this string
<p/><ul><li>test1<p/></li><li>test2<p/></li></ul><p/>
What i attempt to do is extract all the "p" ta开发者_StackOverflowg within the "li" tag, but not the "p" tag outside of it.
I'm only able so far to extract all the "li" tags by
\<li\>(.*?)\</li\>
I'm lost at how to extract the "p" tag within it.
Any pointer is greatly appreciated it!!
It is a lot more reliable to use an HTML parser instead of a regex. Use HTML Agility Pack:
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml("<p/><ul><li>test1<p/></li><li>test2<p/></li></ul><p/>");
IEnumerable<HtmlNode> result = doc.DocumentNode
.Descendants("li")
.SelectMany(x => x.Descendants("p"));
Try this, it uses lookahead so that the LI is not part of the selection.
(?<=<li>)(.*?<p/?>.*?)(?=</li>)
P.S. You also need to fix your HTML because the way you have P tags is not right. The Regex works on this HTML below.
<ul><li><p>test1<p/></li><li><p>test2<p/></li></ul>
<li>(.*?<p/?>.*?)</li>
Will match all content between <li>
which also contain a <p/>
. If you just want to match the <p/>
then:
(?<=<li>).*?(<p/?>).*?(?=</li>)
Will have group 1 match the <p/>
tag.
精彩评论