开发者

Parsing inner HTML iteratively using Hpple parser and NSXMLParser

I have been working on school newspaper app for iPad platfrom. I am using NSXMLParser to get the titles, brief descriptions, and links for each article. In order to get HTML items from each parsed link, I decided to use Hpple parser. I think I am parsing and storing RSS items correctly, but when I try to parse HTML items from each parsed link using for loop, it tells me that I have an empty array for RSS items. However, I can display the content of RSS item holder on console. So, it is NOT empty. I will put some portion of my code and display from console. Please help me out. Due date for this project is soon. Thanks in advance.

Here is how I start loading my RSS parser (articleParser):

- (void)loadData {
    [self loadInitData];

    //[self loadDataWithLink];

}

- (void)loadInitData {
    if (sections == nil) {
        [activityIndicator startAnimating];

        NSLog(@"STARTING ARTICLE PARSER FROM MAIN URL!!!");

        Parser *articleParser = [[Parser alloc] init];
        [articleParser parseRssFeed:@"http://theaggie.org/rss/headlines.xml" withDelegate:self];
        [articleParser release];
    } else {

    }

}

And below is how I store the recieved Article items in NSMutable array called "sections". Then I used for loop to iterate over each link of parsed articles.

- (void)receivedArticleItems:(Article *)theArticle {
    if (sections == nil) {
        sections = [[NSMutableArray al开发者_如何学JAVAloc] init];
    }
    [sections addObject:theArticle];

    NSLog(@"We recieved the article!");
    NSLog(@"Article: %@", theArticle);
    NSLog(@"What is in sections: %@", sections);

for (int i = 1; i < 5; i++) {
        NSLog(@"articleItems: %@",[sections objectAtIndex:0]);
        NSLog(@"articleItems at index 0: %@",[[[sections objectAtIndex:0] articleItems] objectAtIndex:0]);

        [self loadDataWithLink:[[[[sections objectAtIndex:0] articleItems] objectAtIndex:0] objectForKey:@"link"]];
    }
    [activityIndicator stopAnimating];
}

Below is how I used TFFHpple parser to get HTML items from each parsed link:

- (void)loadDataWithLink:(NSString *)urlString{

 NSData *htmlData = [NSData dataWithContentsOfURL:[NSURL URLWithString:urlString]];

 // Create parser
 TFHpple *xpathParser = [[TFHpple alloc] initWithHTMLData:htmlData];

 //Get all the cells main body
 htmlElements  = [xpathParser search:@"//div[@id='main']/div[@id='mainCol1']/div[@id='main-body']"];

 // Access the first cell
 TFHppleElement *htmlElement = [htmlElements objectAtIndex:0];

 // NSString *title = [htmlElement content];

 NSLog(@"What is in element: %@", htmlElement);

 [xpathParser release];
 //[htmlData release];
}

And this is what I am getting on the console:

2011-05-02 22:58:35.355 TheCalAggie[2443:207] Parsing started for article!
2011-05-02 22:58:35.356 TheCalAggie[2443:207] Adding story title: Students say, 'No time for books'
2011-05-02 22:58:35.356 TheCalAggie[2443:207] From the link: http://theaggie.org/article/2011/05/03/students-say-no-time-for-books
2011-05-02 22:58:35.357 TheCalAggie[2443:207] Summary: The last book managerial economics major Kiyan Parsa read for fun was The Lord of the Rings. That was in high school.
2011-05-02 22:58:35.358 TheCalAggie[2443:207] Published on: Tue, 03 May 2011 00:00:00 -0700
2011-05-02 22:58:35.359 TheCalAggie[2443:207] Parsing started for article!
2011-05-02 22:58:35.360 TheCalAggie[2443:207] Adding story title: UC Davis craft center one of largest college crafting centers
2011-05-02 22:58:35.360 TheCalAggie[2443:207] From the link: http://theaggie.org/article/2011/05/02/uc-davis-craft-center-one-of-largest-college-crafting-centers
2011-05-02 22:58:35.361 TheCalAggie[2443:207] Summary: Hidden away in the South Silo, the UC Davis Craft Center offers 10 craft studios and more than a hundred classes for students looking to learn or perfect their crafting skills.
2011-05-02 22:58:35.362 TheCalAggie[2443:207] Published on: Mon, 02 May 2011 00:00:00 -0700
2011-05-02 22:58:35.362 TheCalAggie[2443:207] We recieved the article!
2011-05-02 22:58:35.363 TheCalAggie[2443:207] Article: *nil description*
2011-05-02 22:58:35.364 TheCalAggie[2443:207] What is in sections: (
    (null)
)
2011-05-02 22:58:35.374 TheCalAggie[2443:207] articleItems: *nil description*
2011-05-02 22:58:35.375 TheCalAggie[2443:207] articleItems at index 0: {
    link = "http://theaggie.org/article/2011/05/03/peaceful-rally-held-on-campus-after-killing-of-bin-laden\n";
    pubDate = "Tue, 03 May 2011 00:00:00 -0700";
    summary = "The announcement of Osama bin Laden's death sent a wave of patriotism across the nation and UC Davis. Bin Laden was the leader of al-Qaeda - the organization allegedly behind the Sept. 11, 2001 attacks that killed over 3,000 Americans.\n";
    title = "Peaceful rally held on campus after killing of bin Laden \n";
}
2011-05-02 22:59:35.376 TheCalAggie[2443:207] Unable to parse.
2011-05-02 22:59:35.379 TheCalAggie[2443:207] *** Terminating app due to uncaught exception 'NSRangeException', reason: '*** -[NSMutableArray objectAtIndex:]: index 0 beyond bounds for empty array'
*** Call stack at first throw:

Any help will be greatly appreciated. Thanks again.


2011-05-02 22:59:35.376 TheCalAggie[2443:207] Unable to parse.

The parser is struggling to parse the HTML. That parser is not perfect at parsing HTML. Its a complicated thing for a parse to run XPath over a potentially broken/invalid HTML document.

Passing the link you are trying to parse over the W3C validator here is throwing up some errors; so its not entirely valid HTML. If it's too broken to parse with that parser is something you'll have to debug and find out. To really get to the bottom of this you will need to set breakpoints in the TFHpple parser you are using to find out more.


Damien is right. First you have to fix the html to get your code working. The data it parses is different every time. That proves the HTML is buggy. So the code might work on some occasions. try running it a couple times. You wil see it working occasionally.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜