开发者

Matching content between tags in web source

I was wondering what could be the fastest and the easiest way to grab text that is between tags in string.

For example i have this string: Lorem ipsum <a>dolor sit amet</a>, <b>consectetur</b> adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

And i need to find text that is between tags <a> </a> and <b> </b&g开发者_开发知识库t;.

Thank you.


Parsing HTML is very very difficult, because the web pages are rarely correct and you'll find a lot of mismatched tags and strange strange things.

Use the HTMLAgilityPack if this is for real world pages.


<a>(.*)</a>.*<b>(.*)</b> will work in this particular case, but in general it is not a good idea to parse html with regex. Use an HTML/XML parser instead.

Try HTMLAgilityPack: This SO post explains how to use it.


.+<a>(.+)</a>.+<b>(.+)</b>.+

First match group will contain the text between A-tags and second group - between B-tags.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜