HTML Agility Pack Question (Attempting to parse string from source)

2023-02-23 04:35 问答作者：

I am attempting to use the Agility pack to parse certain bits of info from various pages. I am kind of worried that using this might be overkill for what I need, if that is case feel free to let me know. Anyway, I am attempting t开发者_如何转开发o parse a page from motley fool to get the name of a company based on the ticker. I will be parsing several pages to get stock info in a similar way.

The HTML that I want to parse looks like:

<h1 class="subHead"> 
    Microsoft Corp <span>(NASDAQ:MSFT)</span>
</h1>

Also, the page I want to parse is: http://caps.fool.com/Ticker/MSFT.aspx

So, I guess my question is how do I simply get the Microsoft Corp from the html and should I even be using the agility pack to do things like this?

Edit: Current code

public String getStockName(String ticker)
{
    String text ="";
    HtmlAgilityPack.HtmlWeb web = new HtmlAgilityPack.HtmlWeb();
    HtmlAgilityPack.HtmlDocument doc = web.Load("http://caps.fool.com/Ticker/" + ticker + ".aspx");

    var node = doc.DocumentNode.SelectSingleNode("/h1[@class='subHead']");
    text = node.FirstChild.InnerText.Trim();
    return text;
}

This would give you a list of all stock names, for your sample Html just of Microsoft:

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.Load("test.html");

var nodes = doc.DocumentNode.SelectNodes("//h1[@class='subHead']");
foreach (var node in nodes)
{
    string text = node.FirstChild.InnerText; //output: "Microsoft Corp"
    string textAll = node.InnerText; //output: "Microsoft Corp (NASDAQ:MSFT)"
}

Edit based on updated question - this should work for you:

string text = "";
HtmlWeb web = new HtmlWeb();

string url = string.Format("http://caps.fool.com/Ticker/{0}.aspx", ticker);
HtmlAgilityPack.HtmlDocument doc = web.Load(url);

var node = doc.DocumentNode.SelectSingleNode("//h1[@class='subHead']");
text = node.FirstChild.InnerText.Trim();
return text;

Use an xpath expression to select the element then pickup the text.

 foreach (var element in doc.DocumentNode.SelectNodes("//h1[@clsss='subHead']/span"))
 {
    Console.WriteLine (element.InnerText);
 }

继续阅读：html-agility-pack html-parsing

HTML Agility Pack Question (Attempting to parse string from source)

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？