Parsing Google's search results

2023-02-20 13:06 问答作者：

I'm "working" on a data mining project and I've chosen to parse Google search results. Now before I actually start, I want to consult you - experienced folks. I did a bit of research on how Google delivers results and I analyzed structure of a re开发者_Python百科sult page. That's all alright, I've already figured out regexes and data structures I'll use.

In between I encountered their CAPTCHA because I was searching too fast; oh, the irony. I've also discovered that they limit results to 1000 actually. Now, is there any way I could avoid those peripeties, perhaps slowing the rate of url fetching to solve the first one or reporting when encountering CAPTCHA so that it waits for my input; that might do it, but what about the other one ? Does Google provide some kind of an API that I can use for a workaround? I couldn't find one on their code.* page.

There is a Custom Search API.

It returns results in json or XML, so you won't even need to use regexes. However, you do need to pay for more than 100 searches a day.

What exactly are you trying to do? Maybe there is a better way to accomplish it.

Always look on CPAN first!

https://metacpan.org/pod/REST::Google

If someone hasn't already solved your problem, chances are it's a weird one :-)

继续阅读：data-mining google-search-api perl

Parsing Google's search results

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？