开发者

How to parse HTML with TouchXML or some other alternative

I'm trying to parse the HTML presented below with TouchXML but it keeps crashing when I try to extract certain attributes. I'm totally new to the parser world so I apologize for being a complete idiot. I need help to parse this HTML. What I'm trying to accomplish is to parse each attribute and value or what not and copy them to a string. I've been trying to find a good parser to parse HTML and I believe TouchXML is the best I've seen because of Tidy. Speaking of Tidy, How could I run this HTML through Tidy first then parse it? I'm not sure how to do this. Here is the code that I have so far that doesn't work due to it's not pulling everything I need from the HTML. Any help or advice would be much appreciated. Thanks

My current code:

NSMutableArray *res = [[NSMutableArray alloc] init];

//  using local resource file
NSString *XMLPath   = [[[NSBundle mainBundle] resourcePath] stringByAppendingPathComponent:@"example.html"];
NSData *XMLData     = [NSData dataWithContentsOfFile:XMLPath];
CXMLDocument *doc   = [[[CXMLDocument alloc] initWithData:XMLData options:0 error:nil] autorelease];

NSArray *nodes = NULL;

nodes = [doc nodesForXPath:@"//div" error:nil];

for (CXMLElement *node in nodes) {
    NSMutableDictionary *item = [[NSMutableDictionary alloc] init];



    [item setObject:[[node attributeForName:@"id"] stringValue] forKey:@"id"];

    [res addObject:item];
    [item release];
}


NSLog(@"%@", res);
[res release];

HTML file that needs to be parsed:

<html> 
<head> 
<base target="_blank" /> 
</head> 
<body style="margin:2;"> 
<div id="group"> 
<div id="groupURL"><a href="http://www.example.com/groups">Group URL</a></div> 
<img id="grouplogo" src="http://images.example.com/groups/image.png" /> 
<div id="groupcomputer"><a href="http://www.example.com/groups/page" title="Group Title">Group title this would be here</a></div> 
<div id="groupinfos"> 
    <div id="groupinfo-l">Person</div><div id="groupinfo-r">Ralph</div> 
    <div id="groupinfo-l">Years</div><div id="groupinfo-r">4 years</div> 
    <div id="groupinfo-l">Salary</div><div id="groupinfo-r">100K</div> 
    <div id="groupinfo-l">Other</div><div id="groupoth" styl开发者_运维技巧e="width:15px">other info</div> 
</body> 
</html>

EDIT: I could use Element Parser but I need to know how to extract the Person's Name from the following example which would be Ralph in this case.

<div id="groupinfo-l">Person</div><div id="groupinfo-r">Ralph</div>


I don't know if you are doing something wrong, but I recommend you to use element parser, the best parser for XML and HTML i've found. Hope this helps.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜