开发者

Regex to get the tags

I have a html like this :

<h1> Headhing </h>
<font name="arial">some text</font></br>
some other text

In C#, I want to get the out put as below. Simply content inside the font start tag and end tag

<font name="arial">开发者_StackOverflow社区;some text</font>


First off, your html is wrong. you should close a <h1> with a </h1> not </h>. This one thing is why reg ex is inappropriate to parse tags.

Second, there are hundreds of questions on SO talking about parsing html with regex. The answer is don't. Use something like the html agility pack.


I wouldn't recommend to try it with regex.

I use the HTML Agility Pack to parse HTML and get what I want. It's a lovely HTML parser that is commonly recommended for this. It will take malformed HTML and massage it into XHTML and then a traversable DOM, like the XML classes. So, is very useful for the code you find in the wild.

There's also an HTML parser from Microsoft MSHTML but I haven't tried it.


 Regex regExfont = new Regex(@"<font name=""arial""[^>]*>.*</font>");
 MatchCollection rows = regExfont.Matches(string);

good website is http://www.regexlib.com/RETester.aspx

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜