开发者

Regular expression to isolate text from some sample html?

I'm curious to find the C# regex expression that extracts the following:

<a id=sector href="?catid=us-58211593" >Financial</a>

... from this html string:

<div class="g-unit g-first">Sector: <a id=sector href="?catid=us-58211593" >Financial</a> &gt; Industry: <a href="?catid=us-64965887" >Misc. Financial Services</a> 

The text "href="?catid=us-58211593" is not relevant, so it should be matching on the "a" and "id=sector" elements.

Update

Indeed - RegEx is just not the right tool for the job. It only took 3 lines of code from the HTML Agility Pack to achieve the required result:

HtmlWeb hw =开发者_Python百科 new HtmlWeb();
HtmlDocument myDoc = hw.Load("http://www.google.com/finance?q=IBM");
var etc = myDoc.GetElementbyId("sector").InnerText;


Don't use Regex to parse HTML. There are better solutions, such as HTML Agility Pack.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜