Regular Expression to capture the first <p> of HTML

2023-01-01 22:46 问答作者：

I have the following regular expression:

(?:<(?<tag>\w*)>(?<text>.*)</\k<tag>>)

I want it t grab the text within the first HTML element.

eg.

<p>This should capture</p>This shouldn't

Works, but ...

<p>This should capture</p><p>This shouldn't</p>

Doesn't work. As you'd expect, it returns:

This shou开发者_运维百科ld capture</p><p>This shouldn't

I'm racking my brains here. How can I just have it select the FIRST inner text?

(I'm trying to be tag-agnostic, so <strong>This should match</strong> is equally appropriate, etc.)

You should use the HTML Agility Pack.

For example:

doc.DocumentNode.Descendants("p").First().InnerText

Stop. Just stop. If you are parsing HTML, use an HTML parser (or XML if you're dealing with valid XHTML). See this answer for more info.

In order to have a non-greedy * selection, you should add an ? after the *.

(?:<(?<tag>\w*)>(?<text>.*?)</\k<tag>>)

继续阅读：.net regex

精彩评论