开发者

Need help extracting label from HTML page in C#

I want to load one label's value from a remote HTML page. I have done that by loading the whole page and than using regex. I found the desired result but this method is very slow I want it to quickly load only labels value not the whole web page. Any suggestions?

This is what I'm doing at the moment:

using (var client = new WebClient())
{
    string result = c          client.DownloadString("http://web.archive.org/http://profiles.yahoo.com/italy_");
    var regex = new Regex(@"\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*",
                          RegexOptions.Compiled);
    var s = result;
    f开发者_开发百科oreach (Match email in regex.Matches(s))
    {
        // Console.WriteLine(email.Value);
        label2.Text = email.Value;
    }
}


You must load the whole page - that's the way http requests generally work.

Maybe your regex could be improved? Not my area of expertise though, sorry.


I found the desired result but this method is very slow I want it to quickly load only labels value not the whole web page.

Couple of thoughts:

  • Archive.org is usually very slow in my experience. My guess is that's your bottleneck.

  • No, there is not a way to only make a partial request to a third-party page unless they have a response mechanism capable of returning more specific data (for example, a JSON-enabled web service that returns little snippets of HTML used on the page).

  • You will usually have better luck with parsing by loading data into some kind of HTML parser rather than using a regex.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜