开发者

parse HTML in Java to create an XML

<TBODY>
 <TR>
 <TD colSpan=4>Detail of your Trip</TD></TR>
 <TR></TR>
  <TR>
  <TD colSpan=4>Booking Ref. : XXX</TD></TR>
   <TR></TR>
  <TR>
  <TD>Client</TD>
    <TD colSpan=2>Ticket Number</TD>
    <TD>FOID</TD></TR>
    <TR>
     <TD>Person (ADT)</TD>
   <TD colSpan=2>000000<开发者_如何学编程;/TD>
  <TD>XXXX</TD></TR>
  <TR></TR>
  <TR>
  <TD>From: Location 1</TD>
  <TD>To : Location 2</TD>
   <TD colSpan=2>Flight : LLL</TD></TR>
     <TR>
  <TD colSpan=2></TD>
   <TD colSpan=2>Departure : 14Aug, 15:55 Latest check-in time limit : 15:25 </TD></TR>
    <TR>
    <TD colSpan=2></TD>
   <TD colSpan=2>Arrival : 17:25</TD></TR>
   <TR>
   <TD colSpan=2></TD>
   <TD colSpan=2>Class N</TD></TR>
   <TR>
  <TD>From : Location 2</TD>
  <TD>To :Location1</TD>
  <TD colSpan=2>Flight : AF2585 Resa : OK</TD></TR>
   <TR>
   <TD colSpan=2></TD>
   <TD colSpan=2>Departure : "Time" Latest check-in time limit : "Time" </TD></TR>
  <TR>
  <TD colSpan=2></TD>
  <TR>
  <TD colSpan=2></TD>

I would like to parse this HTML and get the details like traveler name..trip Date

and to create an XML .


I have some good experience with HTMLCleaner (http://htmlcleaner.sourceforge.net/javause.php). It is simple and creates well-formed XML.


Because XSLT is some kind of holy grail which solves nearly every problem i recommend you to get your html to xhtml with "Html Tidy" or with an java library which can convert html to xhtml and then use XSLT to extract the data you'd like to use.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜