开发者

google search result html code c#

I have to create an app which will -- for a given term -- download links of first 10 results from the google search page, however there is a problem,

If I download the source code with the webClient, instead of html-- I get JS code.

I get same result if I take a look at source code in Google chrome (ctrl+u) BUT if I try to inspect that element with built in developer tools I can see real html code

Anyone know how can I download real html code so I can extract links开发者_运维知识库?


You should use the Google Custom Search API

http://code.google.com/apis/customsearch/v1/overview.html

Here is an example that shows the first 10 results of the search "cars"

<html>
  <head>
    <title>JSON/Atom Custom Search API Example</title>
  </head>
  <body>
    <div id="content"></div>
    <script>
      function hndlr(response) {
      for (var i = 0; i < response.items.length; i++) {
        var item = response.items[i];
        // in production code, item.htmlTitle should have the HTML entities escaped.
        document.getElementById("content").innerHTML += "<br>" + item.htmlTitle;
      }
    }
    </script>
    <script src="https://www.googleapis.com/customsearch/v1?key=YOUR-KEY&cx=017576662512468239146:omuauf_lfve&q=cars&callback=hndlr">
    </script>
  </body>
</html>


You can make a Perl script to extract only the data you want, even if it contains a lot of JavaScript, the document is valid HTML, so you can use a HTML parser o convert to XHTML and use XML::Simple or XML::Twig


This is some code that I've used for getting search results from google, using the API:

string googleUriPattern =
        "http://ajax.googleapis.com/ajax/services/search/web?v=1.0&safe=off&rsz=large&userip={0}&q={1}";
var requestUri = new Uri(
    string.Format(
        googleUriPattern,
        "A valid IP address",
        "query"
    ));

var httpWebRequest = (HttpWebRequest)WebRequest.Create(requestUri);
httpWebRequest.Timeout = 5000;

using (var webResponse = httpWebRequest.GetResponse())
using (var sr = new StreamReader(webResponse.GetResponseStream()))
{
    var result = JsonConvert.DeserializeXNode(sr.ReadToEnd(), "responseData");
    var searchResultCount = Convert.ToInt32((string)result.Descendants("estimatedResultCount").FirstOrDefault());
}

As you can see, my case was to determine Googles estimated result count for the query, but you get the entire reply which you can read results from if you wish.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜