开发者

WebBrowser Control - Document.Body.InnerText Problem

Can somebody please help me fix my code.? I can't see where I'm going wrong. It just doesn't do what it should be doing.

It should read a file line by line (every line contains 1 url), and then foreach url in the string it will visit that url and extract title, url, and body text, and then save it to a file but it just doesn't do anything. The only error I am getting is: "Object reference not set to an instance of an object" which points to the following line of code:

u = w.Document.Body.InnerText;

Here's the full code:

    OpenFileDialog of =
        new OpenFileDialog();
    of.Title =
        "app name - Select File";
    using (of)
    {
        try
        {
            Cursor = Cursors.WaitCursor;
            if (of.ShowDialog() == DialogResult.OK)
            {
                string[] file =
                    File.ReadAllLines(
                    of.FileName);


                foreach (string line in file)
                {
                    w.Navigate(line);
                    string t,
                        d,
                        u,
                        path =
                        @"file.txt";

                        t =
                            w.DocumentTitle;
                        u =
                            w.Document.Body.InnerText;
                        d =
                            w.Url.AbsolutePath;
                        t =
                            t.Substring(0,
                            250);
                        t =
                            t.Replace(
                            "\"",
                            "\\\"");

                        a.Text += "\n" +
                            u;

                        File.AppendAllText(path,
                            "s[" +
                            an +
                     开发者_开发技巧       "] = \"" +
                            t +
                            "^" +
                            u +
                            "^" +
                            url1 +
                            u +
                            url2 +
                            d +
                            "\";" +
                            Environment.NewLine);
                        an++;
                }
            }
            Cursor = Cursors.Default;
        }
        catch (Exception exception)
        {
            MessageBox.Show(exception.Message);
        }
    }

I'd appreciate any suggestions/help at all and thank you :)

jase


WebBrowser.Navigate is, IIRC, async. It might be better here to use WebClient.DownloadString? or HTML Agility Pack / Load?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜