开发者

C# - XmlNodeList - Getting inner xml/text between description tags without HTML

Right now I've got a list box that shows RSS article titles/urls of an RSS feed. The 开发者_Python百科title and URL extraction were no problem, but now I'm trying to have the description appear in a rich text box whenever the article title is selected in the list box. I can successfully get the description to show up in the text box, but it's always followed by a bunch of extra html. Example:

There's a silly rumor exploding on the Internet this weekend, alleging that Facebook is shutting down on March 15 because CEO Mark Zuckerberg "wants his old life back," and desires to "put an end to all the madness."<div class="feedflare">
<a href="http://rss.cnn.com/~ff/rss/cnn_topstories?a=at7OdUE16Y0:jsXll_RkIzI:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/rss/cnn_topstories?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://rss.cnn.com/~ff/rss/cnn_topstories?a=at7OdUE16Y0:jsXll_RkIzI:7Q72WNTAKBA"><img src="http://feeds.feedburner.com/~ff/rss/cnn_topstories?d=7Q72WNTAKBA" border="0"></img></a> <a href="http://rss.cnn.com/~ff/rss/cnn_topstories?a=at7OdUE16Y0:jsXll_RkIzI:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/rss/cnn_topstories?i=at7OdUE16Y0:jsXll_RkIzI:V_sGLiPBpWU" border="0"></img></a> <a href="http://rss.cnn.com/~ff/rss/cnn_topstories?a=at7OdUE16Y0:jsXll_RkIzI:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/rss/cnn_topstories?d=qj6IDK7rITs" border="0"></img></a> <a href="http://rss.cnn.com/~ff/rss/cnn_topstories?a=at7OdUE16Y0:jsXll_RkIzI:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/rss/cnn_topstories?i=at7OdUE16Y0:jsXll_RkIzI:gIN9vFwOqvQ" border="0"></img></a>

Code:

private void button1_Click(object sender, EventArgs e)
{

    {

        XmlTextReader rssReader = new XmlTextReader(txtUrl.Text);
        XmlDocument rssDoc = new XmlDocument();
        rssDoc.Load(rssReader);
        XmlNodeList titleList = rssDoc.GetElementsByTagName("title");
        XmlNodeList urlList = rssDoc.GetElementsByTagName("link");
        descList = rssDoc.GetElementsByTagName("description");


        for (int i = 0; i < titleList.Count; i++)
        {
            lvi = rowNews.Items.Add(titleList[i].InnerXml);
            lvi.SubItems.Add(urlList[i].InnerXml);
        }

    }

}

private void rowNews_SelectedIndexChanged(object sender, EventArgs e)
{
    if (rowNews.SelectedIndices.Count <= 0)
    {
        return;
    }
    int intselectedindex = rowNews.SelectedIndices[0]; // Get index of article title

    txtDesc.Text=(descList[intselectedindex].InnerText); 
    // Get description array index that matched list index 

}


You can strip html using approach from Using C# regular expressions to remove HTML tags


You can use InnerText instead of InnerHtml. This will only get the content of your child nodes without any markup.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜