开发者

Extract Specific Text from Html Page

Html page is look like this

<tr>
<th rowspan="4" scope="row">General</th>
<td class="ttl"><a href="network-bands.php3">2G Network</a></td>
<td class="nfo">GSM 850 / 900 / 1800 / 1900 </td>
</tr><tr>
<td class="ttl"><a href="network-bands.php3">3G Network</a></td>
<td class="nfo">HSDPA 900 / 1900 / 2100 </td>
</tr>

for that i am try to use

var text = document.getElementsByClassName("nfo")[0].innerHTML;

Provided By Alex

But i am getting this error Error 2 The name 'document' does not exist in the current context C:\Users\Nabi Javid\Documents\Visual Studio 2008\Projects\WpfApplication2\WpfApplication2\Window1.xaml.cs 30 22 WpfApplication2

Am i missing some Libary or something

Currently my code is like that

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Windows;
using System.Windows.Controls;
using System.Windows.Data;
using System.Windows.Documents;
using System.Windows.Input;
using System.Windows.Media;
using System.Windows.Media.Imaging;
using System.Windows.Navigation;
using System.Windows.Shapes;

namespace WpfApplication1
{
    /// <summary>
    /// Interaction logic for Window1.xaml
    /// </summary>
    public partial class 开发者_Python百科Window1 : Window
    {
        public Window1()
        {
            InitializeComponent();
        }

        private void button1_Click(object sender, RoutedEventArgs e)
        {
            HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
            htmlDoc.Load("nokia_c5_03-3578.html");
             var text = document.getElementsByClassName("nfo")[0].innerHTML;

        } 
    }

}


You are mixing C# code with javascript code.

Instead of this:

var text = document.getElementsByClassName("nfo")[0].innerHTML;

type this:

var text = htmlDoc.DocumentNode.SelectNodes("//td[@class='nfo']")[0].InnerHtml;

To keep it simple, I have refrained from checking exceptions.


I'm not very deep into .net but it looks like you are trying to mix JavaScript-code

var text = document.getElementsByClassName("nfo")[0].innerHTML;

with your .net code...?


You must use the htmlDoc variable to call methods in your case. By the way the HtmlDocument class does not have a method with that name. Try to see if you can find another match for your needs in this list.

As the error says, the document variable does not exits in your code.


do you want

var text = htmlDoc.getElementsByClassName("nfo")[0].innerHTML;

? Not familiar with HTML Agility Pack, but that would seem to make sense


You can get elements by class name using next method which return elements where are several classes defined in one class attribute:

private HtmlNodeCollection GetElementsByClassName(HtmlDocument htmlDocument, string className)
{
    string xpath =
        String.Format(
            "//*[contains(concat(' ', normalize-space(@class), ' '), ' {0} ')]",
            className);
    return htmlDocument.DocumentNode.SelectNodes(xpath);
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜