开发者

Problem with MSHTML COM clicking on submit button

I'm having a problem screenscraping some data from this website using the MSHTML COM component. I have a WebBrowser control on my WPF form. The code where I retrieve the HMTL elements is in the WebBrowser LoadCompleted events. After I set the values of the data to the HTMLInputElement and call the click method on the HTMLInputButtonElement, it is refusing to submit the the request and display the next page.

I analyse the HTML for the onclick attribute on the button, it is actually calling a JavaScript function and it processes my request. Which makes me not sure if calling the JavaScript function is causing the problem? But funny enough when I take my code out of the LoadCompleted method and put it inside a button click event it actually takes me to the next page where as the LoadCompleted method didn't do. Doing that sort of thing defeats the point of trying to screenscrape the page automatically.

On another thought: when I had the code inside the LoadCompleted method, I'm thinking the HTMLInputButtonElement is not fully rendered on to the page which result in click event not firing, despite the fact when I looked at the object in run time it is actually held the submit button element there and the state is saying I completed which baffles me even more.

Here is the code I used inside the LoadCompleted method and the click method on the button:

private void browser_LoadCompleted(object sender, NavigationEventArgs e)
{
    HTMLDocument dom = (HTMLDocument)browser.Document;
    IHTMLElementCollection elementCollection = dom.getElementsByName("PCL_NO_FROM.PARCEL_RANGE.XTRACKING.1-1-1.");
    HTMLInputElement inputBox = null;
    if (elementCollection.length > 0)
    {
        foreach (HTMLInputElement element in elementCollection)
        {
            if (element.name.Equals("PCL_NO_FROM.PARCEL_RANGE.XTRACKING.1-1-1."))
            {
                inputBox = element;
            }
        }
    }
    inputBox.value = "Test";

    elementCollection = dom.getElementsByName("SUBMIT.DUM_CONTROLS.XTRACKING.1-1.");
    HTMLInputButtonElement submitButton = null;
    if (elementCollection.length > 0)
    {
        foreach (HTMLInputButtonElement element in elementCollection)
        {
            if (element.name.Equals("SUBMIT.DUM_CONTROLS.XTRACKING.1-1."))
            {
                submitButton = element;
            }
        }
    }
    submitButton.click();
}

FYI: This is the URL of the web page I开发者_如何学Python'm trying to access using MSHTML, http://track.dhl.co.uk/tracking/wrd/run/wt_xtrack_pw.entrypoint.


There are many possibilities:

  • You may try to put your code at other events, such as on Navigation Completed, or on Download Completed.

  • You may need to explicitly evaluate the OnClick event after the click() function.

  • Using the MS WebBrowser control is easier than using the MSHTML COM.

  • To make life easier, you may just use a webscraping library such as the IRobotSoft ActiveX control to automate your entire process.


Delay in OnBeforeNavigate can cause click actions to fail.

We have noticed that with some submit actions OnBeforeNavigate is called twice, especially where onClick is used. The first call is before the onClick action is performed, the second is after it is complete.

Turn off your BHO, put a breakpoint on onClick, step over the submit action return jsSubmit() and then wait a bit and you should be able to cause the same issue without your automation.

Any delay >150ms on the second call to OnBeforeNavigate causes some failure in page load/navigation to the result.

Edit:
Having tried our own automation of this DHL page we don't currently have an issue with the timing described above.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜