How do I extract the target URL from a Google search result?
I am trying to extract URLs from Google search results. I use Indy IdHTTP to get HTML results from Google, and I use Achmad Z's code for getting the link hrefs from the page. How can I get the real link target for each URL instead of the one that goes through Google's redirector?
I tried th开发者_开发知识库at but I get an "Operand no applicable" error in this part of the code:
function ToUTF8Encode(str: string): string;
var
b: Byte;
begin
for b in BytesOf(UTF8Encode(str)) do
begin
Result := Format('%s%s%.2x', [Result, '%', b]);
end;
end;
I use Delphi 7 with Indy 9.00.10. Maybe indy update will help ?
If I get it right you are trying to fetch the Google search results using TIdHTTP.Get
method. If so, then
you should definitely focus on some Google Search API implementation because
- it's impossible to fetch the results this way because you don't have any access to the document inside the iframe in which the search results are, so you won't get any search results by using HTTP GET in this case (or at least I haven't heard about the request which can do that)
- it's against Google policies and you should use proper Google Search API instead, for instance
Google SOAP Search API
, there are available also several types of Google Search API's for various purposes
You can find e.g. here
the Delphi wrapper with the demo
code for Google Search API. I've tested it with Delphi 2009 on Windows 7/64 and it works fine for me.
In the previous post here I've tried to explain why you should use Google Search API, in this one I'll try to provide you an example with a hope it will work in your Delphi 7.
You need to have the SuperObject
(JSON parser for Delphi), I've used this version
(latest at this time). Then you need Indy; the best would be to upgrade to the latest version if possible. I've used the one shipped with Delphi 2009, but I think the TIdHTTP.Get
method is so important that it must work fine also in your 9.00.10 version.
Now you need a list box and a button on your form, the following piece of code and a bit of luck (for compatibility :)
The URL request building you can see for instance in the DxGoogleSearchApi.pas mentioned before but the best is to follow the Google Web Search API reference
. In DxGoogleSearchApi.pas you can take the inspiration e.g. how to fetch several pages.
So take this as an inspiration
uses
IdHTTP, IdURI, SuperObject;
const
GSA_Version = '1.0';
GSA_BaseURL = 'http://ajax.googleapis.com/ajax/services/search/';
procedure TForm1.GoogleSearch(const Text: string);
var
I: Integer;
RequestURL: string;
HTTPObject: TIdHTTP;
HTTPStream: TMemoryStream;
JSONResult: ISuperObject;
JSONResponse: ISuperObject;
begin
RequestURL := TIdURI.URLEncode(GSA_BaseURL + 'web?v=' + GSA_Version + '&q=' + Text);
HTTPObject := TIdHTTP.Create(nil);
HTTPStream := TMemoryStream.Create;
try
HTTPObject.Get(RequestURL, HTTPStream);
JSONResponse := TSuperObject.ParseStream(HTTPStream, True);
if JSONResponse.I['responseStatus'] = 200 then
begin
ListBox1.Items.Add('Search time: ' + JSONResponse.S['responseData.cursor.searchResultTime']);
ListBox1.Items.Add('Fetched count: ' + IntToStr(JSONResponse['responseData.results'].AsArray.Length));
ListBox1.Items.Add('Total count: ' + JSONResponse.S['responseData.cursor.resultCount']);
ListBox1.Items.Add('');
for I := 0 to JSONResponse['responseData.results'].AsArray.Length - 1 do
begin
JSONResult := JSONResponse['responseData.results'].AsArray[I];
ListBox1.Items.Add(JSONResult.S['unescapedUrl']);
end;
end;
finally
HTTPObject.Free;
HTTPStream.Free;
end;
end;
procedure TForm1.Button1Click(Sender: TObject);
begin
GoogleSearch('Delphi');
end;
Answer to my question , maybe it can help someone: Fetching web page :
memo1.Lines.Text := idhttp1.Get('http://ajax.googleapis.com/aja...tart=1&rsz=large&q=max');
extracting URL's :
function ExtractText(const Str, Delim1, Delim2: string; PosStart: integer; var PosEnd: integer): string;
var
pos1, pos2: integer;
begin
Result := '';
pos1 := PosEx(Delim1, Str, PosStart);
if pos1 > 0 then
begin
pos2 := PosEx(Delim2, Str, pos1 + Length(Delim1));
if pos2 > 0 then
begin
PosEnd := pos2 + Length(Delim2);
Result := Copy(Str, pos1 + Length(Delim1), pos2 - (pos1 + Length(Delim1)));
end;
end;
end;
And on Button1 just put :
procedure TForm1.Button1Click(Sender: TObject);
var Pos: integer;
sText: string;
begin
sText := ExtractText(Memo1.Lines.Text, '"url":"', '","visibleUrl"', 1, Pos);
while sText <> '' do
begin
Memo2.Lines.Add(sText);
sText := ExtractText(Memo1.Lines.Text, '"url":"', '","visibleUrl"', Pos, Pos);
end;
end;
www.delphi.about.com has nice documentation on string manipulation , Zarko Gajic does great job on that site :) NOTE: if google changes it's source this will be useless.
精彩评论