Open webpage programmatically and retrieve its html contain as a string
I have a facebook account and I would like to extract my friend's photo and its personal detail such as "Date of birth", "Studied at" and so on. I am able to extract the address of the facebook's first page for each of my friends account but I don't know how to programmatically open webpage for each of my friends first page and save the html contain as a string so that I can extract out their personal detail and photos. Please help! Thank in advance!开发者_StackOverflow社区
You have Three options:
1- Using a WebClient object.
WebClient webClient = new webClient();
webClient.Credentials = new System.Net.NetworkCredential("UserName","Password", "Domain");
string pageHTML = WebClient .DownloadString("http://url");`
2- Using a WebRequest. This is the best solution because it gives you more control over your request.
WebRequest myWebRequest = WebRequest.Create("http://URL");
WebResponse myWebResponse = myWebRequest.GetResponse();
Stream ReceiveStream = myWebResponse.GetResponseStream();
Encoding encode = System.Text.Encoding.GetEncoding("utf-8");
StreamReader readStream = new StreamReader( ReceiveStream, encode );
string strResponse=readStream.ReadToEnd();
StreamWriter oSw=new StreamWriter(strFilePath);
oSw.WriteLine(strResponse);
oSw.Close();
readStream.Close();
myWebResponse.Close();
3- Using a WebBrowser (I bet you don't wanna do that)
WebBrowser wb = new WebBrowser();
wb.Navigate("http://URL");
string pageHTML = "";
wb.DocumentCompleted += (sender, e) => pageHTML = wb.DocumentText;
Excuse me if I misstyped any code because I improvised it and I don't have a syntax checker to check its correctness. But I think it should be fine.
EDIT: For facebook pages. You may consider using facebook Graph API:
http://developers.facebook.com/docs/reference/api/
Try this:
var html = new WebClient()
.DownloadString("the facebook account url goes here");
Also, once you have downloaded the HTML as a string I would highly recommend that you use the Html Agility Pack to parse it.
There are in general 2 things you can do here. The first thing you can do is called web scraping. That way you can download the source of the html with the following code:
var request = WebRequest.Create("http://example.com");
var response = request.GetResponse();
using (Stream responseStream = response.GetResponseStream())
{
StreamReader reader = new StreamReader(responseStream);
string stringResponse = reader.ReadToEnd();
}
stringResponse
then contains the Html source of the website http://example.com
However, this is probably not what you want to do. Facebook has an SDK that you can use to download this kind of information. You can read about this on the following pages
http://developers.facebook.com/docs/reference/api/user/
If you want to use the FaceBook API then I think it's worth changing your question or asking a new question about this, since it's quite more complicated and requires some autorization and other codings. However, it's the best way since it's unlikely that your code is every going to break and it warrents the privacy of the people you want to get information from.
For example, if you query me with the api, you get the following string:
{
"id": "1089655429",
"name": "Timo Willemsen",
"birthday": "08/29/1989",
"education": [
{
"school": {
"id": "115091211836927",
"name": "Stedelijk Gymnasium Arnhem"
},
"year": {
"id": "127668947248449",
"name": "2001"
},
"type": "High School"
}
]
}
You can see that I'm Timo Wilemsen, 21 years old and studyied @ Stedelijk Gymnasium Arnhem in 2001.
Use selenium 2.0 for C#. http://seleniumhq.org/download/
var driver = new FirefoxDriver();
driver.Navigate().GoToUrl("http://www.google.com");
String pageSource = driver.PageSource;
精彩评论