Retrieve old searches from Google web history
I w开发者_运维技巧ant to retrieve old Google searches which I did a few years/months back and that are present in Google web history. How can I programmatically retrieve them all?
https://www.google.com/history/?output=rss only provides recent Google searches, but not all of them.
Also this question : How can I retrieve my Google search history? doesn't provide any answer for my question!
You can pass month, day and year as parameters to obtain history of a specific day.
E.g. https://www.google.com/history/lookup?month=12&day=1&yr=2010&output=rss for Dec, 1 2010.
There are no ways to obtain history for a full month or year, let alone the entire history. But this information about the parameters must at least enable you to obtain the entire history in some loop which goes one day further back in the time everytime. Be carecul that you don't leech too much in a too short time.
You really need to parse HTML page by page and then fetch your data, because i dont think there is any alternative!
I think this will be very difficult.
I know this doesn't answer you question completely but at least the web pages may be preserved. There are organizations and tools that allow you to recreate web pages from past dates - see for example http://www.mementoweb.org/.
UPDATE: I have just learnt that Memento has won a digital preservation award (http://www.dpconline.org/newsroom)
I know you're not looking to go back through every page, but you don't really need to parse the whole page, just look for the html that always precedes an entry. From me just starting up google web history and doing some simple searches, if you look through a history page, each String that you've searched follows: <td style="padding:3px 0"><table id=bkmk_view_ class=noborder ><tr><td><table class="elem noborder"><tr><td class="grey" nowrap>Searched for </td><td nowrap><a title="http://www.google.com/search?q=
and is followed by &
(ampersand). This sequence of preceding html is unique on the page, only occuring when historical search terms are listed.
If you use two terms, you get a + in between the terms. Other conventions for different searching modes, I didn't go through them all.
It looks like if you use BalusC's method to pass parameters, then you could retreive the html, search the document for the string I mentioned(be sure to \" and other special characters), then copy the next String until you reach a & character. Then, all you need to do is parse your search term, not the whole page. Go through source code until you reach the end, then go to your next iteration in the loop.
static void GetGoogleWebHistory(int month, int day, int yr, string UserName, string Pass)
{
string iURL = "http://www.google.com/history/lookup?month=" + month + "&day=" + day + "&yr=" + yr + "&output=rss";
WebClient client = new WebClient();
GDataCredentials gdc = new GDataCredentials(UserName, Pass);
RequestSettings rs = new RequestSettings(Guid.NewGuid().ToString(), gdc);
XmlDocument XDoc = new XmlDocument();
XDoc.LoadXml(client.DownloadString(iURL));
}
精彩评论