Regular expression to isolate text from some sample html?
I'm curious to find the C# regex expression that extracts the following:
<a id=sector href="?catid=us-58211593" >Financial</a>
... from this html string:
<div class="g-unit g-first">Sector: <a id=sector href="?catid=us-58211593" >Financial</a> > Industry: <a href="?catid=us-64965887" >Misc. Financial Services</a>
The text "href="?catid=us-58211593" is not relevant, so it should be matching on the "a" and "id=sector" elements.
Update
Indeed - RegEx is just not the right tool for the job. It only took 3 lines of code from the HTML Agility Pack to achieve the required result:
HtmlWeb hw =开发者_Python百科 new HtmlWeb();
HtmlDocument myDoc = hw.Load("http://www.google.com/finance?q=IBM");
var etc = myDoc.GetElementbyId("sector").InnerText;
Don't use Regex to parse HTML. There are better solutions, such as HTML Agility Pack.
精彩评论