Matching content between tags in web source
I was wondering what could be the fastest and the easiest way to grab text that is between tags in string.
For example i have this string:Lorem ipsum <a>dolor sit amet</a>, <b>consectetur</b> adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
And i need to find text that is between tags <a> </a>
and <b> </b&g开发者_开发知识库t;
.
Thank you.
Parsing HTML is very very difficult, because the web pages are rarely correct and you'll find a lot of mismatched tags and strange strange things.
Use the HTMLAgilityPack if this is for real world pages.
<a>(.*)</a>.*<b>(.*)</b>
will work in this particular case, but in general it is not a good idea to parse html with regex. Use an HTML/XML parser instead.
Try HTMLAgilityPack: This SO post explains how to use it.
.+<a>(.+)</a>.+<b>(.+)</b>.+
First match group will contain the text between A-tags and second group - between B-tags.
精彩评论