NSXMLParser chokes on ampersand &
I'm parsing some HTML with NSXMLParser and it hits a parser error anytime it encounters an ampersand. I could filter out ampersands before I parse it, but I'd rather parse everything that's there.
It's giving me error 68, NSXMLParserNAMERequiredError: Name is required.
My best guess is that it's a character set issue. I'm a little fuzzy on the world of character sets, so I'm thinking my ignorance is biting me in the ass. The source HTML uses charset iso-8859-1, so I'm using this code to initialize the Parser:
NSString *dataString = [[[NSString alloc] initWithData:data encoding:NSISOLatin1StringEncoding] autorelease];
NSData *da开发者_如何学GotaEncoded = [[dataString dataUsingEncoding:NSUTF8StringEncoding allowLossyConversion:YES] autorelease];
NSXMLParser *theParser = [[NSXMLParser alloc] initWithData:dataEncoded];
Any ideas?
To the other posters: of course the XML is invalid... it's HTML!
You probably shouldn't be trying to use NSXMLParser for HTML, but rather libxml2
For a closer look at why, check out this article.
Are you sure you have valid XML? You are required to have special characters like & escaped, in the raw XML file you should see &
Encoding the Data through a NSString
worked for me, anyway you are autoreleasing an object that was not allocated by yourself (dataUsingEncoding), so it crashes, the solution is :
NSString *dataString = [[NSString alloc] initWithData:data
encoding:NSISOLatin1StringEncoding];
NSData *dataEncoded = [dataString dataUsingEncoding:NSUTF8StringEncoding
allowLossyConversion:YES];
[dataString release];
NSXMLParser *theParser = [[NSXMLParser alloc] initWithData:dataEncoded];
精彩评论