google search result html code c#
I have to create an app which will -- for a given term -- download links of first 10 results from the google search page, however there is a problem,
If I download the source code with the webClient, instead of html-- I get JS code.
I get same result if I take a look at source code in Google chrome (ctrl+u) BUT if I try to inspect that element with built in developer tools I can see real html code
Anyone know how can I download real html code so I can extract links开发者_运维知识库?
You should use the Google Custom Search API
http://code.google.com/apis/customsearch/v1/overview.html
Here is an example that shows the first 10 results of the search "cars"
<html>
<head>
<title>JSON/Atom Custom Search API Example</title>
</head>
<body>
<div id="content"></div>
<script>
function hndlr(response) {
for (var i = 0; i < response.items.length; i++) {
var item = response.items[i];
// in production code, item.htmlTitle should have the HTML entities escaped.
document.getElementById("content").innerHTML += "<br>" + item.htmlTitle;
}
}
</script>
<script src="https://www.googleapis.com/customsearch/v1?key=YOUR-KEY&cx=017576662512468239146:omuauf_lfve&q=cars&callback=hndlr">
</script>
</body>
</html>
You can make a Perl script to extract only the data you want, even if it contains a lot of JavaScript, the document is valid HTML, so you can use a HTML parser o convert to XHTML and use XML::Simple
or XML::Twig
This is some code that I've used for getting search results from google, using the API:
string googleUriPattern =
"http://ajax.googleapis.com/ajax/services/search/web?v=1.0&safe=off&rsz=large&userip={0}&q={1}";
var requestUri = new Uri(
string.Format(
googleUriPattern,
"A valid IP address",
"query"
));
var httpWebRequest = (HttpWebRequest)WebRequest.Create(requestUri);
httpWebRequest.Timeout = 5000;
using (var webResponse = httpWebRequest.GetResponse())
using (var sr = new StreamReader(webResponse.GetResponseStream()))
{
var result = JsonConvert.DeserializeXNode(sr.ReadToEnd(), "responseData");
var searchResultCount = Convert.ToInt32((string)result.Descendants("estimatedResultCount").FirstOrDefault());
}
As you can see, my case was to determine Googles estimated result count for the query, but you get the entire reply which you can read results from if you wish.
精彩评论