xgoogle python library is not working any more?
I was using the xgoogle python library for one of my projects. It was wo开发者_如何学Pythonrking fine till recently. I am not getting the resultset that I used to get before. If anyone who has used this library written by Peter Krummins, faced a similar situation, can you please suggest a work around ?
The presence of BeautifulSoup.py
hints that this library uses web scraping to get its result.
A common problem with this is that it will easily break when the design/layout of the page being scraped changes. And the problem you see seems to coincide with the new search results layout that Google introduced just recently.
Another problem is that it often is against the terms of service of the site being scraped. And according to point 5.3 of the Google Terms Of Service it actually is:
You specifically agree not to access (or attempt to access) any of the Services through any automated means (including use of scripts or web crawlers) [...]
A better idea would be to use the Custom Search API.
Peter Krumin's product xgoogle looks to be extremely useful both to me and I image many others. https://github.com/pkrumins/xgoogle
For me the current version is 1.3 is not working. I tried a new install from GitHub, ran the examples and nothing is returned.
Adding a debugger to the source code and tracing the data captured in a query to its disappearance the problem occurs in a routine called search.py subroutine "_extract_results" at a parser command
results = soup.findAll('li', {'class': 'g'})
The soup object has material in it but the "findAll" fails to return anything.
Looks like its searching for lists and if there are none it returns nothing. I am unsure what html you would try to match to get a result. If anyone knows how to make the is work I am very interested.
A little more googling and it appears xgoogle is no longer supported or works. Part of the trouble is that Google changes the layout of its results pages every so often and so any scraping software that assumes some standard layout is in time doomed to failure.
There are however other search engines that are locally installed and thus provide a results layout that are less likely change with upgrades and will not change at all if you don't upgrade.
I am currently investigating Yacy. Easy to install and can be pointed at specific sites if you want.
精彩评论