How to ignore extra html tags while parsing RSS XML in Objective-c/xcode?

2023-02-22 23:02 问答作者：

I just want the following text from "discription" tag using Objective-c for iPhone programming;

Neither the government nor private sector in Nepal has off-site backup of data and applications at a distance that can be safe after a disaster at one location. Office of the Controller of Certification chief Rajan Raj Panta warned that as the ...

<description>
<table border="0" cellpadding="2" cellspacing="7" style="vertical-align:top;">
<tr>
<td width="80" align="center" valign="top">
<font style="font-size:85%;font-family:arial,sans-serif"></font></td>
<td valign="top" class="j">
<font style="font-size开发者_Python百科:85%;font-family:arial,sans-serif">
<br />
<div style="padding-top:0.8em;">
<img alt="" height="1" width="1" /></div>
<div class="lh">
<a href="http://news.google.com/news/url?sa=t&amp;fd=R&amp;usg=AFQjCNG5gNh3aGY3uxIlUjnsJ_C4ugrnrg&amp;url=http://www.thehimalayantimes.com/fullNews.php?headline%3DJapan%2Bquake%2Ba%2Bwake-up%2Bcall%2Bfor%2BNepal%2BIT%2Bsector%26NewsID%3D280789">
<b>Japan quake a wake-up call for 
<b>Nepal</b> IT sector</b></a>
<br />
<font size="-1">
<b>
<font color="#6f6f6f">Himalayan Times</font></b></font>
<br />
<font size="-1">Neither the government nor private sector in 
<b>Nepal</b> has off-site backup of data and applications at a distance that can be safe after a disaster at one 
<b>location</b>. Office of the Controller of Certification chief Rajan Raj Panta warned that as the 
<b>...</b></font>
<br />
<font size="-1" class="p"></font>
<br />
<font class="p" size="-1">
<a class="p" href="http://news.google.com/news/more?pz=1&amp;ned=uk&amp;ncl=dxKbHaltcQfMZ4M">
<nobr>
<b></b></nobr></a></font></div></font></td></tr></table>
</description>

Please help me how do i ignore all those unwanted html tags and texts?

Actually I am using Google news search rss, like this : http://news.google.com/news?q=location:london&output=rss is there any other way to get location based rss news?

So you've done one parse of the raw XML, giving you the text of everything inside the tags (which is escaped in the original, so the first parse won't have looked into very deeply), but they're sending HTML format RSS feeds and you want plain text? Would it be acceptable to, say, extract all text within a tag that has a size of -1? If so then something like this might suffice:

// relevant class members are:
BOOL acceptText;
NSMutableString *totalText;

// when a new element starts, check if it's a 'font' tag, and if so,
// decide whether to accept subsequent text based on its size
- (void)parser:(NSXMLParser *)parser didStartElement:(NSString *)elementName namespaceURI:(NSString *)namespaceURI qualifiedName:(NSString *)qualifiedName attributes:(NSDictionary *)attributeDict
{
    if([elementName isEqualToString:@"font"])
    {
        acceptText = [[attributeDict objectForKey:@"size"] intValue] == -1;
    }
}

// upon receiving new characters, copy them into the string only if
// that's what we're doing right now
- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string
{
    if(acceptText)
        [totalText appendString:string];
}

It's a bit of a dirty fix, to be considered screen scraping at best. All it'd take is for them to change their HTML layout and your scraping would break.

继续阅读：objective-c xcode xml

How to ignore extra html tags while parsing RSS XML in Objective-c/xcode?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？