开发者

Regular expression, find a word between two words

I have this string

<p/><ul><li>test1<p/></li><li>test2<p/></li></ul><p/>

What i attempt to do is extract all the "p" ta开发者_StackOverflowg within the "li" tag, but not the "p" tag outside of it.

I'm only able so far to extract all the "li" tags by

\<li\>(.*?)\</li\>

I'm lost at how to extract the "p" tag within it.

Any pointer is greatly appreciated it!!


It is a lot more reliable to use an HTML parser instead of a regex. Use HTML Agility Pack:

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml("<p/><ul><li>test1<p/></li><li>test2<p/></li></ul><p/>");
IEnumerable<HtmlNode> result = doc.DocumentNode
                                  .Descendants("li")
                                  .SelectMany(x => x.Descendants("p"));


Try this, it uses lookahead so that the LI is not part of the selection.

(?<=<li>)(.*?<p/?>.*?)(?=</li>)

P.S. You also need to fix your HTML because the way you have P tags is not right. The Regex works on this HTML below.

<ul><li><p>test1<p/></li><li><p>test2<p/></li></ul>


<li>(.*?<p/?>.*?)</li>

Will match all content between <li> which also contain a <p/>. If you just want to match the <p/> then:

(?<=<li>).*?(<p/?>).*?(?=</li>)

Will have group 1 match the <p/> tag.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜