Why are entities in libxml2 SAX-parsed attribute values encoded?
I'm fetching the value of an XML entity in an libxml2 SAX parser similarly 开发者_运维百科to how the ansewr to this question suggests. Specifically, my code looks like so (attributes[i].value
is *xmlChar
):
int valueLength = (int) (attributes[i].end - attributes[i].value);
value = [[[NSString alloc] initWithBytes:attributes[i].value
length:valueLength
encoding:NSUTF8StringEncoding
] autorelease];
However, for some reason, when the attribute value (a URL in this case) has the entity &
in the original XML, the value I get has &
.
Say what?
How do I get libxml2 to decode attribute entities (it seems to do it fine for text node entities), so that I just get &
?
libxml2 does not replace entities by default, you have to turn that on when you create the xmlReader.
This code has an example
http://xmlsoft.org/examples/reader2.c
The docs for XML_PARSE_NOENT are here;
http://xmlsoft.org/html/libxml-parser.html
Although it has been a while since I used the entity bits from libxml2 I recall having to do something to get the default entity resolver in place. Docs on that here;
http://xmlsoft.org/xmlio.html
If this does not wrap it up please ping me back and I'll look in the source for Foto Brisko, I had to handle it there...
Although the blog post is long winded I think the sample from here
http://bill.dudney.net/roller/objc/entry/libxml2_push_parsing
might have the entity stuff turned on as well but its been so long I've forgotten and I don't have time right now to go back through it.
Good luck!
精彩评论