开发者

Parsing HTML into JSON

I've been tasked with getting all the SMS updates from this page and putting them into a JSON feed using Yahoo Pipes.开发者_如何学C I'm not entirely sure how I would get each update, as they are not individual elements, but just a collection of title, etc. Any shared wisdom would be much appreciated!


<h1 id="blogtitle">SMS Update</h1> 
<div class="blogposttime blogdetail">Left at 2nd January 2010 at 01:12</div> 
<div class="blogcategories blogdetail">Recieved by SMS (Location: Pokhara - Nepal)</div> 
<p class="blogpostmessage"> 
RACE DAY! We took the extra day off to pimp the rick some more, including a huge Australian flag. Quiet night at a pub with 6 other teams. Time for brekkie and then we're off to the rickshaw grounds for 8:30 for 10am start.
</p> 

That seems a fairely easy job for a DOM/XML parser.

Since the blocks are not enclosed in XML tags you could look for elements that are present in each block, for example the <h1 id="blogtitle">SMS Update</h1> defines the start of a new block.

Use your DOM parser to look for all the elements with id blogtitle. At this point you can use a DOM function to reference the nextSibling of the blogtitle element. All you need is the 3 siblings after the blogtitle element.

With a little work you can easily use this logic to build your JSON object.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜