Losing whitespace around escaped symbols in CDATA using Expat XML parser in C++
I'm using XML to send project information between applications. One of the pieces of information is the project description. So I have:
<ProjectDescription>Test & spaces around&some & amps!</ProjectDescription>
Or: "Test & spaces around&some & amps!" <-- GOOD!
When I then use Expat to parse it, my data handler gets just parts of the entire string at a time. "Test", then "&", then "spaces around", the next "&", etc, etc. When I then try to reconstruct the original string, all the spacing around the &'s is dropped because the data handler never gets to see them. When I then re-write the XML I get:
<ProjectDescription>Test&spaces around&some&amps!</ProjectDescription>
开发者_如何转开发
Or: "Test&spaces around&some&s!" <-- BAD!
Is this a known problem with existing workarounds? Is there some setting I can give Expat to control its behavior around escaped symbols?
My attempts at Googling an answer have met with dismal failure.
EDIT: In response to a question in the comments: I have my own handler, which I register with the parser:
parser=XML_ParserCreate(NULL);
XML_SetUserData(parser,&depth);
XML_SetElementHandler(parser,startElement,endElement);
XML_SetCharacterDataHandler(parser,dataHandler);
The handler is declared as follows:
static void dataHandler(void *userData,const XML_Char *s,int l)
And then "s" contains the data in the element. Without any & stuff, it's the entire string between the open and close tags, in the case of "a string with spaces".
I have just run a test with my own library that uses expat. My handler looks like this, with debug statements to display what is going on:
void CharDataHandler( void * parser,
const XML_Char *s,
int len ) {
std::cerr << "[" << s << "]\n";
std::cerr << len << "\n";
// my own processing here - not important
}
I don't see the behaviour you are talking about. For the input data:
XXX & YYY
I get three events with the char * and length data set as folows:
char * = "XXX & YYY"
length = 4
char * = "&"
length = 1
char * = " YYY"
length = 4
So the spaces are retained. As far as I know I am not using any specal settings. What version & platform of Expat are you using?
精彩评论