Parsing HTML into JSON
I've been tasked with getting all the SMS updates from this page and putting them into a JSON feed using Yahoo Pipes.开发者_如何学C I'm not entirely sure how I would get each update, as they are not individual elements, but just a collection of title, etc. Any shared wisdom would be much appreciated!
<h1 id="blogtitle">SMS Update</h1>
<div class="blogposttime blogdetail">Left at 2nd January 2010 at 01:12</div>
<div class="blogcategories blogdetail">Recieved by SMS (Location: Pokhara - Nepal)</div>
<p class="blogpostmessage">
RACE DAY! We took the extra day off to pimp the rick some more, including a huge Australian flag. Quiet night at a pub with 6 other teams. Time for brekkie and then we're off to the rickshaw grounds for 8:30 for 10am start.
</p>
That seems a fairely easy job for a DOM/XML parser.
Since the blocks are not enclosed in XML tags you could look for elements that are present in each block, for example the <h1 id="blogtitle">SMS Update</h1>
defines the start of a new block.
Use your DOM parser to look for all the elements with id blogtitle. At this point you can use a DOM function to reference the nextSibling of the blogtitle element. All you need is the 3 siblings after the blogtitle element.
With a little work you can easily use this logic to build your JSON object.
精彩评论