Convert google search results into json in python 3.1
I am writing a Python program that feeds a search term to google using the google search API and downloads the first 10 results. I was able to do this in Python 2.6 as follows:
query = urllib.parse.urlencode({'q' : 'searchterm','start' : k},doseq=false)
url = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&%s' \
% (query)
results = urllib.urlopen(url)
resultsjson = json.loads(results.read())
betterResults += resultsjson["responseData"]["results"]
Google's search API returns the results as a json, so I used the above code to download the results into a json of my and parse them into a list (betterResults).
When I switched over to Python 3, my program began throwing exceptions. Apparently, in Python 2.6 the object returned by urlopen() is a file-like object that can be loaded into a json. In Python 3.1, the object returned is an HTTPResponse object, which does开发者_开发问答 contain a read() method, as required by the json specifications, but is a byte object. I was therefore unable to access the information as I had in 2.6.
Is there any way to access the json returned by google? How can I get the results in Python 3 and be able to select which fields I want, as I was able to do with the json?
Thank you very much, bsg
You'll need to decode the byte object if you want to use it with json.loads
resultjson = json.loads(results.read().decode())
docs also suggest to pass encoding parameter to the loads
function:
json.loads(results.read(), encoding=<encoding-type>)
I think Lennart has an explanation how to get the encoding-type.
The object returned by urlopen is file like, you are wrong there. But you use json.loads(), which expects a string. json.load() expects a file like object.
However, json.load() expects the result of the read() method to be a string, while of course the read you get will be bytes, so you need to decode it from bytes to a string first.
So, something like this:
query = urllib.parse.urlencode({'q' : 'searchterm','start' : k},doseq=false)
url = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&%s' \
% (query)
results = urllib.urlopen(url)
encoding = input.getheader('content-type').split('=')[-1]
resultsjson = json.loads(results.read().decode(encoding))
betterResults += resultsjson["responseData"]["results"]
Might work. (I didn't test it).
精彩评论