Jsoup to extract data from html table
I've started using JSoup today to use for an android app so I have this table which I need to extract data from, but from it seems, it's going to be tough. Need some help; the html for the table is as below:
<TR BGCOLOR='#999999'>
<TD ALIGN='left'><span class='S09W80'><font color=#DDDDDD>CODE</span></TD>
<TD ALIGN='left'><span class='S09W80'><font color=#DDDDDD>SUBJECT NAME</span></TD>
<TD ALIGN='right'><span class='S09W80'><font color=#DDDDDD>PERIOD FROM</span></TD>
<TD ALIGN='right'><span class='S09W80'><font color=#DDDDDD>PERIOD TO</span></TD>
<TD ALIGN='right'><span class='S09W80'><font color=#DDDDDD>ENROL DATE</span></TD>
<T开发者_如何学GoD ALIGN='right'><span class='S09W80'><font color=#DDDDDD>GRADE</span></TD>
</TR>
followed by repetitions of
<TR BGCOLOR='#FFFFFF'>
<TD ALIGN='left'><span class='S09W50'>IT142</span></TD>
<TD ALIGN='left'><span class='S09W50'>INTRODUCTION TO GRAPHICS DEVELOPMENT</span></TD>
<TD ALIGN='right'><span class='S09W50'>21-FEB-11</span></TD>
<TD ALIGN='right'><span class='S09W50'>17-JUN-11</span></TD>
<TD ALIGN='right'><span class='S09W50'>22-FEB-11</span></TD>
<TD ALIGN='center'><span class='S09W80'>B-</span></TD>
</TR>
but how do I use the doc.select (what selector to use?); here ?
Not really an Android question, but a CSS selector question. You can read more about it at http://www.w3.org/TR/CSS2/selector.html
Doing screen scraping like this is always tricky and there is no "right" solution.
You will need to perform multiple select steps.
- A selector like "body > table > tr". Take the first element. This will give you the initial TR element.
- Validate the TR element, get its child elements and validate one of them has the text "SUBJECT NAME".
- Then the other TR elements can be processed in order.
精彩评论