开发者

How to ignore extra html tags while parsing RSS XML in Objective-c/xcode?

I just want the following text from "discription" tag using Objective-c for iPhone programming;

Neither the government nor private sector in Nepal has off-site backup of data and applications at a distance that can be safe after a disaster at one location. Office of the Controller of Certification chief Rajan Raj Panta warned that as the ...

<description>
<table border="0" cellpadding="2" cellspacing="7" style="vertical-align:top;">
<tr>
<td width="80" align="center" valign="top">
<font style="font-size:85%;font-family:arial,sans-serif"></font></td>
<td valign="top" class="j">
<font style="font-size开发者_Python百科:85%;font-family:arial,sans-serif">
<br />
<div style="padding-top:0.8em;">
<img alt="" height="1" width="1" /></div>
<div class="lh">
<a href="http://news.google.com/news/url?sa=t&amp;fd=R&amp;usg=AFQjCNG5gNh3aGY3uxIlUjnsJ_C4ugrnrg&amp;url=http://www.thehimalayantimes.com/fullNews.php?headline%3DJapan%2Bquake%2Ba%2Bwake-up%2Bcall%2Bfor%2BNepal%2BIT%2Bsector%26NewsID%3D280789">
<b>Japan quake a wake-up call for 
<b>Nepal</b> IT sector</b></a>
<br />
<font size="-1">
<b>
<font color="#6f6f6f">Himalayan Times</font></b></font>
<br />
<font size="-1">Neither the government nor private sector in 
<b>Nepal</b> has off-site backup of data and applications at a distance that can be safe after a disaster at one 
<b>location</b>. Office of the Controller of Certification chief Rajan Raj Panta warned that as the 
<b>...</b></font>
<br />
<font size="-1" class="p"></font>
<br />
<font class="p" size="-1">
<a class="p" href="http://news.google.com/news/more?pz=1&amp;ned=uk&amp;ncl=dxKbHaltcQfMZ4M">
<nobr>
<b></b></nobr></a></font></div></font></td></tr></table>
</description>

Please help me how do i ignore all those unwanted html tags and texts?

Actually I am using Google news search rss, like this : http://news.google.com/news?q=location:london&output=rss is there any other way to get location based rss news?


So you've done one parse of the raw XML, giving you the text of everything inside the tags (which is escaped in the original, so the first parse won't have looked into very deeply), but they're sending HTML format RSS feeds and you want plain text? Would it be acceptable to, say, extract all text within a tag that has a size of -1? If so then something like this might suffice:

// relevant class members are:
BOOL acceptText;
NSMutableString *totalText;

// when a new element starts, check if it's a 'font' tag, and if so,
// decide whether to accept subsequent text based on its size
- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qualifiedName attributes:(NSDictionary *)attributeDict
{
    if([elementName isEqualToString:@"font"])
    {
        acceptText = [[attributeDict objectForKey:@"size"] intValue] == -1;
    }
}

// upon receiving new characters, copy them into the string only if
// that's what we're doing right now
- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
{
    if(acceptText)
        [totalText appendString:string];
}

It's a bit of a dirty fix, to be considered screen scraping at best. All it'd take is for them to change their HTML layout and your scraping would break.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜