Need help extracting label from HTML page in C#
I want to load one label's value from a remote HTML page. I have done that by loading the whole page and than using regex. I found the desired result but this method is very slow I want it to quickly load only labels value not the whole web page. Any suggestions?
This is what I'm doing at the moment:
using (var client = new WebClient())
{
string result = c client.DownloadString("http://web.archive.org/http://profiles.yahoo.com/italy_");
var regex = new Regex(@"\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*",
RegexOptions.Compiled);
var s = result;
f开发者_开发百科oreach (Match email in regex.Matches(s))
{
// Console.WriteLine(email.Value);
label2.Text = email.Value;
}
}
You must load the whole page - that's the way http requests generally work.
Maybe your regex could be improved? Not my area of expertise though, sorry.
I found the desired result but this method is very slow I want it to quickly load only labels value not the whole web page.
Couple of thoughts:
Archive.org is usually very slow in my experience. My guess is that's your bottleneck.
No, there is not a way to only make a partial request to a third-party page unless they have a response mechanism capable of returning more specific data (for example, a JSON-enabled web service that returns little snippets of HTML used on the page).
You will usually have better luck with parsing by loading data into some kind of HTML parser rather than using a regex.
精彩评论