开发者

How to getelementbyid() from a table without an id

Well I can't think of any easier way to word that question guys, but it's not as complex as it seems. Basically I have a little project going to help myself move up in my workplace (tech support agent at the moment, looking to go part time in web dev: I'm hungry for code at the moment and tech support isn't satisfying)

So I said I'd make a small program that would update tech agents on problems or site issues when they arose. It takes the information from a small webpage called outage (which is disastrous in my opinion, 177 errors on wcc validator)

The web dev guys won't just give the table and id, some sort of security hole? Don't know how but I'm not going to question the guys above me. Trying to work with them, not against them.

The table itself doesn't have an id, but the columns inside do (span id), e.g

<table width="100%" border="0">
<tbody>
<tr id="title">
    <td width="9%">Date/Time</td>
    <td width="24%">program/site</td>
    <td width="5%">Ticket</td>
    <td width="*">Issue</td>
    <td width="2%">More</td>
</tr>

<tr>
    <td><span id="date">2011-01-27 17:32</span></td>
    <td><span id="site"><a id="fus_00001"></a>sample area or program affected</span></td>
    <td><span id="site"><a href="https://sample php file i cant give you" target="_blank">12345671</a></span></td>
    <td><span id="issue">problem identified/ investiating</span></td> 
    <td><span id="ticket"></span></td>
</tr><tr>

I'm using java for this and for all intents and purposes, it draws, does everything i need it to. To parse the information I'm using htmlunit 2.8

Here's the code that I'm using at the moment. I just don't know how to get those tables without an id.

String update = "blank";

final WebClient webClient = new WebClient();
webClient.setJavaScriptEnabled(false);// javascript causes some serious problems.
webClient.setCssEnabled(false);

HtmlPage page;

try 
{
    URL outageURL = new URL("file:\\C:\\Users\\MYDRIVE\\Desktop\\version control\\OUTAGE\\Outages.htm"); //local drive at home

    page = webClient.getPage(outageURL);

    //final HtmlTable table = page.getHtmlElementById("outages");// if the table had the id "outages, this would be perfect! but alas it doesnt

    final HtmlTable table = page.get//the cells int eh table by some other means

    update = (table.getCellAt(1,0).asText() + "   " + table.getCellAt(1,1).asText() + "   " + table.getCellAt(1,2).asText() + "   " + table.getCellAt(1,3).asText());
// above code takes the cells and combines them
} catch and everything else

return update;

So bottom line, has anyone got any ideas of how to access these tables by some other way without the id. Maybe the span id? p.s I've looked through the api hor html unit, 开发者_如何学编程not really sure I can find anything useful.


final String stringHtmlTable = page.getPage().asXml();

If I was to do this, how would I use xpath to take me to the desired cell as per mark's response. p.s. not familiar with xml at all


finding a good example of xpath was absolutely ridiculously hard.

In the end, this got the details of each one by the span id's

        Object[] dates = page.getByXPath("//span[@id='date']/text()").toArray();
        Object[] sites = page.getByXPath("//span[@id='site']/text()").toArray();
        Object[] issues = page.getByXPath("//span[@id='issue']/text()").toArray();

        System.out.println("" + dates[0].toString());
        System.out.println("" + sites[0].toString());
        System.out.println("" + issues[0].toString());

        update = (dates[0].toString() + "   " + sites[0].toString() + "   " +issues[0].toString());


If you can't get at the table tag itself directly (e.g by ID), then you can dig deeper inside for something that would be unique for just that table. For instance, if this is the only table on the page that would have <td width="24%">program/site</td>, you can have XPath look for that cell, then use getParent() to dig back upwards to the parent <table> element.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜