How to "manually" go back with a WebBrowser?
I'm working on a web scraper that sometimes needs to remember a particular page, then go to some other pages and then go back to that page. Currently I just save the URL of the page, but that doesn't work for pages like Google Maps, where the URL is always the same.
I can see that the GoBack
method does go back to the previous page, so somehow the WebBrowser
remembers what the previous pages was. How can I do this manually? I could count how many pages have been visited since the page I want to go back to and then call GoBack
as many times as necessary, but that's pretty unreliable and un-elegant. So I wonder how could I implement a GoBackToAParticularPage
method.
There is one thing I think would get me closer to a solution: saving the URL of all frames and then 开发者_Python百科putting them back when going back to that page. I think that would solve at lease the Google Maps problems. I have not tested it yet. I don't know exactly what would it be the proper way to do this. I would need to wait for the frames to exist before setting their URLs.
You can use
webBrowser1.Document.Window.History.Go(x);
where x is an int signifying the relative position in the browser's history.
x=-2 would navigate two pages back.
Update: More info on HtmlHistory.Go()
try this!
javascript:history.go(-1)"
I know a few things have been said, so i won't re-write that, however, if you really want to use a JavaScript method (ie: if you want to use the javascript history object instead of the webbrowser controls history object) and are wondering how, there are ways to do this. You can use .InvokeScript in .NET WB controls, or if you want pre-.NET & .NET compatible, you can use this:
You can use .execScript in pre-.NET versions of WB control and current/.NET versions of WB control. You can also choose the language of the script you want to execute, ie: "JScript" or "VBScript". Here is the one liner:
WebBrowser1.Document.parentWindow.execScript "alert('hello world');", "JScript"
The good thing about using the JavaScript history object is that if you kill history information in the webbrowser control by sending the number "2" into the .navigate method, going to the page where history was cancelled in WB control will not work, but it will work in the JavaScript's history object, this is an advantage.
Once again, this is just a backwards compatible supplement to the ideas discussed on this post already, including a few other tidbits not mentioned.
Let me know if i can be of further help to you since and answer was already accepted.
By javascript Location
object you may achieve you task.
<FORM><INPUT TYPE="BUTTON" VALUE="Go Back"
ONCLICK="history.go(-1)"></FORM>
also check
JavaScript History Object
for the history information
Browser history, by design, is opaque; otherwise it opens a security hole: Do you really want every page you visit to have visibility as to what pages/sites you've been visiting? Probably not.
To do what you want, you'll need to implement your own stack of URIs, tracking what needs to be revisited.
You don't want to use history.go(-1)
because it is unreliable. But, you can't use the URL, because there are pages like GoogleMaps where the URL is always the same.
If the URL is the same but the content is different, then it means that values to determine the page's content are being pulled from somewhere other than the URL.
Where could this be?
Your most likely suspect is the posted form-collection, but data could also be coming from the cookie.
I think it makes a lot more sense to index the absolute location than a relative location, because as you noted, relative locations can be unreliable. The problem is that you need to get all the data that is being sent to the web server, to understand what its actual absolute location is (because the URI is not sufficient).
The way to do this is to create a local copy of the page, and replace the submission url (this could be in a link, a form or in the javascript), with a URL on your server. Then when you click something on the GoogleMaps page to trigger a change (that seems not to affect the URL), you will receive that data on your server, and will be able to determine the actual location.
Think about it like a querystring.
If I have
<form action="http://myhost.com/page.html" method="get">
<input type="hidden" name="secret_location_parameter" value="mrbigglesworth" />
<input type="submit" />
</form>
and I click the submit button, I get taken to the url
http://myhost.com/page.html?secret_location_parameter=mrbigglesworth
However, If I have
<form action="http://myhost.com/page.html" method="post">
<input type="hidden" name="secret_location_parameter" value="mrbigglesworth" />
<input type="submit" />
</form>
and I click the submit button, then I get taken to the url
http://myhost.com/page.html
The server still receives secret_location_parameter=mrbigglesworth
, but it gets it as a form value instead of a querystring value, so it isn't visible from the url. The server might render a different page depending on the secret_location_parameter
value, but not change the url, and if a post method is used, then it will appear that multiple pages reside at the same url.
My point is that you may be addressing the problem from the wrong angle, because you didn't understand what was going on under the hood. I am certainly making assumptions, but based on the way you asked your question I think this may be helpful for you
If you don't need to visually see whats going on, there's probably more elegant ways of navigating and parsing urls with the WebClient classes, perhaps elaborating on your particular program would yield clearer results.
Assuming that you have a webbrowser control on a form and you are trying to implement go back.
Following is the solution. (If the assumption is wrong. Please correct me)
Add a webbrowser, textbox, button as btnBack
History variable also has the url data for navigation(but not used currently).
C# solution
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
namespace WindowsFormsApplication1
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e)
{
WebBrowser1.Url = new Uri("http://maps.google.com");
}
Stack< String> History = new Stack<String>();
private void WebBrowser1_Navigating(object sender, WebBrowserNavigatingEventArgs e)
{
TextBox1.Text = e.Url.ToString();
History.Push(e.Url.ToString());
}
private void btnBack_Click(object sender, EventArgs e)
{
if(WebBrowser1.CanGoBack)
{
WebBrowser1.GoBack();
}
}
}
}
Vb solution
Public Class Form1
Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
WebBrowser1.Url = New Uri("http://maps.google.com")
End Sub
Private Sub WebBrowser1_Navigating(ByVal sender As Object, ByVal e As System.Windows.Forms.WebBrowserNavigatingEventArgs) Handles WebBrowser1.Navigating
TextBox1.Text = e.Url.ToString
History.Push(e.Url.ToString)
End Sub
Dim History As New Stack(Of String)
Private Sub btnBack_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnBack.Click
If WebBrowser1.CanGoBack Then
WebBrowser1.GoBack()
End If
End Sub
End Class
Programmatically add a marker element to the DOM for those pages you will later want to go back to. When backtracking through the browser history, check for that marker after each history.go(-1)
and stop when you encounter it. This might prove unreliable in some cases, in which case remembering the depth level may serve as a backup approach.
You may need to experiment with the right time to insert the element, to make sure it is properly recorded in the history.
In case anyone else can benefit from it, here is how I ended up doing it. The only caveat is that if the travel log to has too many pages in between, the entry might not exist any more. There is probably a way to increase the history size, but since there have to be some limit, I use the TravelLog.GetTravelLogEntries
method to see whether the entry still exists or not and if not, use the URL instead.
Most of this code came from PInvoke.
using System;
using System.Runtime.InteropServices;
using System.Windows.Forms;
using System.Collections.Generic;
namespace TravelLogUtils
{
[ComVisible(true), ComImport()]
[InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
[GuidAttribute("7EBFDD87-AD18-11d3-A4C5-00C04F72D6B8")]
public interface ITravelLogEntry
{
[return: MarshalAs(UnmanagedType.I4)]
[PreserveSig]
int GetTitle([Out] out IntPtr ppszTitle); //LPOLESTR LPWSTR
[return: MarshalAs(UnmanagedType.I4)]
[PreserveSig]
int GetURL([Out] out IntPtr ppszURL); //LPOLESTR LPWSTR
}
[ComVisible(true), ComImport()]
[InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
[GuidAttribute("7EBFDD85-AD18-11d3-A4C5-00C04F72D6B8")]
public interface IEnumTravelLogEntry
{
[return: MarshalAs(UnmanagedType.I4)]
[PreserveSig]
int Next(
[In, MarshalAs(UnmanagedType.U4)] int celt,
[Out] out ITravelLogEntry rgelt,
[Out, MarshalAs(UnmanagedType.U4)] out int pceltFetched);
[return: MarshalAs(UnmanagedType.I4)]
[PreserveSig]
int Skip([In, MarshalAs(UnmanagedType.U4)] int celt);
void Reset();
void Clone([Out] out ITravelLogEntry ppenum);
}
public enum TLMENUF
{
/// <summary>
/// Enumeration should include the current travel log entry.
/// </summary>
TLEF_RELATIVE_INCLUDE_CURRENT = 0x00000001,
/// <summary>
/// Enumeration should include entries before the current entry.
/// </summary>
TLEF_RELATIVE_BACK = 0x00000010,
/// <summary>
/// Enumeration should include entries after the current entry.
/// </summary>
TLEF_RELATIVE_FORE = 0x00000020,
/// <summary>
/// Enumeration should include entries which cannot be navigated to.
/// </summary>
TLEF_INCLUDE_UNINVOKEABLE = 0x00000040,
/// <summary>
/// Enumeration should include all invokable entries.
/// </summary>
TLEF_ABSOLUTE = 0x00000031
}
[ComVisible(true), ComImport()]
[InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
[GuidAttribute("7EBFDD80-AD18-11d3-A4C5-00C04F72D6B8")]
public interface ITravelLogStg
{
[return: MarshalAs(UnmanagedType.I4)]
[PreserveSig]
int CreateEntry([In, MarshalAs(UnmanagedType.LPWStr)] string pszUrl,
[In, MarshalAs(UnmanagedType.LPWStr)] string pszTitle,
[In] ITravelLogEntry ptleRelativeTo,
[In, MarshalAs(UnmanagedType.Bool)] bool fPrepend,
[Out] out ITravelLogEntry pptle);
[return: MarshalAs(UnmanagedType.I4)]
[PreserveSig]
int TravelTo([In] ITravelLogEntry ptle);
[return: MarshalAs(UnmanagedType.I4)]
[PreserveSig]
int EnumEntries([In] int TLENUMF_flags, [Out] out IEnumTravelLogEntry ppenum);
[return: MarshalAs(UnmanagedType.I4)]
[PreserveSig]
int FindEntries([In] int TLENUMF_flags,
[In, MarshalAs(UnmanagedType.LPWStr)] string pszUrl,
[Out] out IEnumTravelLogEntry ppenum);
[return: MarshalAs(UnmanagedType.I4)]
[PreserveSig]
int GetCount([In] int TLENUMF_flags, [Out] out int pcEntries);
[return: MarshalAs(UnmanagedType.I4)]
[PreserveSig]
int RemoveEntry([In] ITravelLogEntry ptle);
[return: MarshalAs(UnmanagedType.I4)]
[PreserveSig]
int GetRelativeEntry([In] int iOffset, [Out] out ITravelLogEntry ptle);
}
[ComImport, ComVisible(true)]
[Guid("6d5140c1-7436-11ce-8034-00aa006009fa")]
[InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
public interface IServiceProvider
{
[return: MarshalAs(UnmanagedType.I4)]
[PreserveSig]
int QueryService(
[In] ref Guid guidService,
[In] ref Guid riid,
[Out] out IntPtr ppvObject);
}
public class TravelLog
{
public static Guid IID_ITravelLogStg = new Guid("7EBFDD80-AD18-11d3-A4C5-00C04F72D6B8");
public static Guid SID_STravelLogCursor = new Guid("7EBFDD80-AD18-11d3-A4C5-00C04F72D6B8");
//public static void TravelTo(WebBrowser webBrowser, int
public static ITravelLogEntry GetTravelLogEntry(WebBrowser webBrowser)
{
int HRESULT_OK = 0;
SHDocVw.IWebBrowser2 axWebBrowser = (SHDocVw.IWebBrowser2)webBrowser.ActiveXInstance;
IServiceProvider psp = axWebBrowser as IServiceProvider;
if (psp == null) throw new Exception("Could not get IServiceProvider.");
IntPtr oret = IntPtr.Zero;
int hr = psp.QueryService(ref SID_STravelLogCursor, ref IID_ITravelLogStg, out oret);
if ((oret == IntPtr.Zero) || (hr != HRESULT_OK)) throw new Exception("Failed to query service.");
ITravelLogStg tlstg = Marshal.GetObjectForIUnknown(oret) as ITravelLogStg;
if (null == tlstg) throw new Exception("Failed to get ITravelLogStg");
ITravelLogEntry ptle = null;
hr = tlstg.GetRelativeEntry(0, out ptle);
if (hr != HRESULT_OK) throw new Exception("Failed to get travel log entry with error " + hr.ToString("X"));
Marshal.ReleaseComObject(tlstg);
return ptle;
}
public static void TravelToTravelLogEntry(WebBrowser webBrowser, ITravelLogEntry travelLogEntry)
{
int HRESULT_OK = 0;
SHDocVw.IWebBrowser2 axWebBrowser = (SHDocVw.IWebBrowser2)webBrowser.ActiveXInstance;
IServiceProvider psp = axWebBrowser as IServiceProvider;
if (psp == null) throw new Exception("Could not get IServiceProvider.");
IntPtr oret = IntPtr.Zero;
int hr = psp.QueryService(ref SID_STravelLogCursor, ref IID_ITravelLogStg, out oret);
if ((oret == IntPtr.Zero) || (hr != HRESULT_OK)) throw new Exception("Failed to query service.");
ITravelLogStg tlstg = Marshal.GetObjectForIUnknown(oret) as ITravelLogStg;
if (null == tlstg) throw new Exception("Failed to get ITravelLogStg");
hr = tlstg.TravelTo(travelLogEntry);
if (hr != HRESULT_OK) throw new Exception("Failed to travel to log entry with error " + hr.ToString("X"));
Marshal.ReleaseComObject(tlstg);
}
public static HashSet<ITravelLogEntry> GetTravelLogEntries(WebBrowser webBrowser)
{
int HRESULT_OK = 0;
SHDocVw.IWebBrowser2 axWebBrowser = (SHDocVw.IWebBrowser2)webBrowser.ActiveXInstance;
IServiceProvider psp = axWebBrowser as IServiceProvider;
if (psp == null) throw new Exception("Could not get IServiceProvider.");
IntPtr oret = IntPtr.Zero;
int hr = psp.QueryService(ref SID_STravelLogCursor, ref IID_ITravelLogStg, out oret);
if ((oret == IntPtr.Zero) || (hr != HRESULT_OK)) throw new Exception("Failed to query service.");
ITravelLogStg tlstg = Marshal.GetObjectForIUnknown(oret) as ITravelLogStg;
if (null == tlstg) throw new Exception("Failed to get ITravelLogStg");
//Enum the travel log entries
IEnumTravelLogEntry penumtle = null;
tlstg.EnumEntries((int)TLMENUF.TLEF_ABSOLUTE, out penumtle);
hr = 0;
ITravelLogEntry ptle = null;
int fetched = 0;
const int MAX_FETCH_COUNT = 1;
hr = penumtle.Next(MAX_FETCH_COUNT, out ptle, out fetched);
Marshal.ThrowExceptionForHR(hr);
HashSet<ITravelLogEntry> results = new HashSet<ITravelLogEntry>();
for (int i = 0; 0 == hr; i++)
{
if (ptle != null) results.Add(ptle);
hr = penumtle.Next(MAX_FETCH_COUNT, out ptle, out fetched);
Marshal.ThrowExceptionForHR(hr);
}
Marshal.ReleaseComObject(penumtle);
Marshal.ReleaseComObject(tlstg);
return results;
}
}
}
精彩评论