开发者

How to click an element (hyperlink) on a webform from C#, when it does not have any ID and Name

For the last two weeks I have been kind of stuck on a problem.

I am developing some web scrapers using C# and I am using a WinForms WebBrowser control in my application. I am able to fill up the web form which is opened in my browser and submit it automatically by using the following code:

HtmlElement submitButton = document.GetElementById("Element_ID″);
submitButton.InvokeMember(“click”);

So far everything is fine, but the problem is that there is one another element in the web form that I want to click too, but this element does not have any id or name so I don't know how to click this one.

Please help me as soon as possible I need it for my master thesis.

(I want to click the next page arrow button in the give website: http://www.gelbeseiten.de/yp/11//subscriberlist_pageAction.yp?sessionDataString=H4sIAAAAAAAAAI2PQU8CMRCFfw0XSEmns9128k5KongwGjFeSZftIqILbhcVf70NSgg3X-pbyXjLfvCFpqsbbIMpwbVRRuaBELKm6iew5T4gLFUpdmKpewJAGD8xV7JaxalfpdZX6mP31bH4WQfZblJehXcd2tGvr0WwbunVIKbYIZjjKmoa3atct4RSh-pA/S912oY4qhWzyjJkLvPZV4P4JetNFHYWOG2OoCH4pZlyU-pjWdhjS/LY2sp7-p1lLCLOGXwTLqpT1XSqOiXcpE3Xzw-pncUtGSDNp0ZZwR0we92TxSHjIX0x-pIQM-p0AZuciLl7M/kGE-pmcGjIOsvEpTB-pADJS0suGAQAA&page=0&filterTrade=-&filterFunction=-开发者_高级运维&sortBy=sort_trade&availableLetters=ABCDEFGHIJKLMNOPQRSTUVW )


I've written many web-scrapers in the past using embedded WebBrowsers, so you've come to the right place.

When the element does not have a name you need to find it by either content, or another associated element that is named.

  • In the first instance we wrote helper methods to iterate the hierachy looking for a specific piece of content within an element.
  • For the second option you get the named element and use a specific index for the desired child.
  • A combination of both (find a specific parent then look for a child with the right content)

In your specific example webpage, the next page anchor has a class type of "arrow next" you can search for.


You could do

HtmlElement next_arrow =  document.GetElementsByTagName("a")
                               .Cast<HtmlElement>()
                               .Where(e => e.GetAttribute("class") == "arrow next")
                               .FirstOrDefault();
if (next_arrow != null)
{
     next_arrow.InvokeMember("click");
}


Here's a trick, not by InvokeMember("click") rather just "simulating the click" -

this is the link for the first page:

gelbeseiten.de/yp/11//subscriberlist_pageAction.yp?sessionDataString=H4sIAAAAAAAAAI2PQU8CMRCFfw0XSEmns9128k5KongwGjFeSZftIqILbhcVf70NSgg3X-pbyXjLfvCFpqsbbIMpwbVRRuaBELKm6iew5T4gLFUpdmKpewJAGD8xV7JaxalfpdZX6mP31bH4WQfZblJehXcd2tGvr0WwbunVIKbYIZjjKmoa3atct4RSh-pA/S912oY4qhWzyjJkLvPZV4P4JetNFHYWOG2OoCH4pZlyU-pjWdhjS/LY2sp7-p1lLCLOGXwTLqpT1XSqOiXcpE3Xzw-pncUtGSDNp0ZZwR0we92TxSHjIX0x-pIQM-p0AZuciLl7M/kGE-pmcGjIOsvEpTB-pADJS0suGAQAA&page=0&filterTrade=-&filterFunction=-&sortBy=sort_trade&availableLetters=ABCDEFGHIJKLMNOPQRSTUVW

as you see page=0; clicking next, gives the link -

gelbeseiten.de/yp/11//subscriberlist_pageAction.yp?sessionDataString=H4sIAAAAAAAAAI2PQU/DMAyFf00vmzLFdprE8gkmwTggEENcp3RNxxh0o-pmA8euJBlO1G0-p-pvCf58zNwUzW-pDKyQalSmckExl6DqJpKnPCEuVbDaYFUvBcEIFXgVu1Ws2nV6Xac-pZn89X5xFwoed2MvQbmI73rf1eL4L3SakFFsJOBpnzcJbte9W4hSI-pQ/S912oY4qhWz5LDSC992Dl/QR60ahPki2OZKeNfCgiba18oicmLV8lTcoS8t6BJ8zsHMo3yEU1VE1D1ZmWm7Tt-psXxtNwCMmjS4BhJ7oDAy72WR5CH/MT0l1HQEVa46QDK2Z/JsTyhcdIAWrZeGy8/k7LJ5YQBAAA-e&page=1&filterTrade=-&filterFunction=-&sortBy=sort_trade&availableLetters=ABCDEFGHIJKLMNOPQRSTUVW

now page=1

and so on... in general clicking next means page=(x+1) clicking prev means page=(x-1). so build a string according the requirements. this addresses ur problem, however there are some other data also sent with querystring, that u have to append to the string as well.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜