开发者

Extract data webpage

Folks, I'm tryning to extract data from web page using C#.. for the moment I used the Stream from the WebReponse and I parsed it as a big string. It's long and painfull. Someone know better way to extract data from webpage? I say WINHTTP but isn开发者_如何转开发't for c#..


To download data from a web page it is easier to use WebClient:

string data;
using (var client = new WebClient())
{
    data = client.DownloadString("http://www.google.com");
}

For parsing downloaded data, provided that it is HTML, you could use the excellent Html Agility Pack library.

And here's a complete example extracting all the links from a given page:

class Program
{
    static void Main(string[] args)
    {
        using (var client = new WebClient())
        {
            string data = client.DownloadString("http://www.google.com");
            HtmlDocument doc = new HtmlDocument();
            doc.LoadHtml(data);

            var nodes = doc.DocumentNode.SelectNodes("//a[@href]");
            foreach(HtmlNode link in nodes)
            {
                HtmlAttribute att = link.Attributes["href"];
                Console.WriteLine(att.Value);
            }
        }
    }
}


If the webpage is valid XHTML, you can read it into an XPathDocument and xpath your way quickly and easily straight to the data you want. If it's not valid XHTML, I'm sure there are some HTML parsers out there you can use.

Found a similar question with an answer that should help. Looking for C# HTML parser

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜