Google Search from a Python App
I'm trying to run a google search query from a python app. Is there any python interface out there that would let me do this? If there isn't does anyone know which Goo开发者_运维知识库gle API will enable me to do this. Thanks.
There's a simple example here (peculiarly missing some quotes;-). Most of what you'll see on the web is Python interfaces to the old, discontinued SOAP API -- the example I'm pointing to uses the newer and supported AJAX API, that's definitely the one you want!-)
Edit: here's a more complete Python 2.6 example with all the needed quotes &c;-)...:
#!/usr/bin/python
import json
import urllib
def showsome(searchfor):
query = urllib.urlencode({'q': searchfor})
url = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&%s' % query
search_response = urllib.urlopen(url)
search_results = search_response.read()
results = json.loads(search_results)
data = results['responseData']
print 'Total results: %s' % data['cursor']['estimatedResultCount']
hits = data['results']
print 'Top %d hits:' % len(hits)
for h in hits: print ' ', h['url']
print 'For more results, see %s' % data['cursor']['moreResultsUrl']
showsome('ermanno olmi')
Here is Alex's answer ported to Python3
#!/usr/bin/python3
import json
import urllib.request, urllib.parse
def showsome(searchfor):
query = urllib.parse.urlencode({'q': searchfor})
url = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&%s' % query
search_response = urllib.request.urlopen(url)
search_results = search_response.read().decode("utf8")
results = json.loads(search_results)
data = results['responseData']
print('Total results: %s' % data['cursor']['estimatedResultCount'])
hits = data['results']
print('Top %d hits:' % len(hits))
for h in hits: print(' ', h['url'])
print('For more results, see %s' % data['cursor']['moreResultsUrl'])
showsome('ermanno olmi')
Here's my approach to this: http://breakingcode.wordpress.com/2010/06/29/google-search-python/
A couple code examples:
# Get the first 20 hits for: "Breaking Code" WordPress blog
from google import search
for url in search('"Breaking Code" WordPress blog', stop=20):
print(url)
# Get the first 20 hits for "Mariposa botnet" in Google Spain
from google import search
for url in search('Mariposa botnet', tld='es', lang='es', stop=20):
print(url)
Note that this code does NOT use the Google API, and is still working to date (January 2012).
I am new in python and I was investigating how to do this. None of the provided examples are working properly for me. Some are blocked by google if you make many (few) requests, some are outdated. Parsing the google search html (adding the header in the request) will work until google changes the html structure again. You can use the same logic to search in any other search engine, looking into the html (view-source).
import urllib2
def getgoogleurl(search,siteurl=False):
if siteurl==False:
return 'http://www.google.com/search?q='+urllib2.quote(search)
else:
return 'http://www.google.com/search?q=site:'+urllib2.quote(siteurl)+'%20'+urllib2.quote(search)
def getgooglelinks(search,siteurl=False):
#google returns 403 without user agent
headers = {'User-agent':'Mozilla/11.0'}
req = urllib2.Request(getgoogleurl(search,siteurl),None,headers)
site = urllib2.urlopen(req)
data = site.read()
site.close()
#no beatifulsoup because google html is generated with javascript
start = data.find('<div id="res">')
end = data.find('<div id="foot">')
if data[start:end]=='':
#error, no links to find
return False
else:
links =[]
data = data[start:end]
start = 0
end = 0
while start>-1 and end>-1:
#get only results of the provided site
if siteurl==False:
start = data.find('<a href="/url?q=')
else:
start = data.find('<a href="/url?q='+str(siteurl))
data = data[start+len('<a href="/url?q='):]
end = data.find('&sa=U&ei=')
if start>-1 and end>-1:
link = urllib2.unquote(data[0:end])
data = data[end:len(data)]
if link.find('http')==0:
links.append(link)
return links
Usage:
links = getgooglelinks('python','http://www.stackoverflow.com/')
for link in links:
print link
(Edit 1: Adding a parameter to narrow the google search to a specific site)
(Edit 2: When I added this answer I was coding a Python script to search subtitles. I recently uploaded it to Github: Subseek)
精彩评论