开发者

NSXMLParser chokes on ampersand &

I'm parsing some HTML with NSXMLParser and it hits a parser error anytime it encounters an ampersand. I could filter out ampersands before I parse it, but I'd rather parse everything that's there.

It's giving me error 68, NSXMLParserNAMERequiredError: Name is required.

My best guess is that it's a character set issue. I'm a little fuzzy on the world of character sets, so I'm thinking my ignorance is biting me in the ass. The source HTML uses charset iso-8859-1, so I'm using this code to initialize the Parser:

NSString *dataString = [[[NSString alloc] initWithData:data encoding:NSISOLatin1StringEncoding] autorelease];
NSData *da开发者_如何学GotaEncoded = [[dataString dataUsingEncoding:NSUTF8StringEncoding allowLossyConversion:YES] autorelease];
NSXMLParser *theParser = [[NSXMLParser alloc] initWithData:dataEncoded];

Any ideas?


To the other posters: of course the XML is invalid... it's HTML!

You probably shouldn't be trying to use NSXMLParser for HTML, but rather libxml2

For a closer look at why, check out this article.


Are you sure you have valid XML? You are required to have special characters like & escaped, in the raw XML file you should see &


Encoding the Data through a NSString worked for me, anyway you are autoreleasing an object that was not allocated by yourself (dataUsingEncoding), so it crashes, the solution is :

NSString *dataString = [[NSString alloc] initWithData:data
                             encoding:NSISOLatin1StringEncoding];

NSData *dataEncoded = [dataString dataUsingEncoding:NSUTF8StringEncoding 
                                     allowLossyConversion:YES];

[dataString release];

NSXMLParser *theParser = [[NSXMLParser alloc] initWithData:dataEncoded];
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜