RapidXML reading from file - what is wrong here?
What's the difference between these two methods of reading an input file?
1) Using 'ifstream.get()'
and
2) Using a vector<char>
with ifstreambuf_iterator<char>
(less understood by me!)
(other than the obvious answer of having nifty vector methods to work with)
The input file is XML, and as you see below, immediately parsed into a rapidxml document. (initialized elsewhere, see example main function.)
First, let me show you two ways to write the 'load_config' function, one using ifstream.get()
and one using vector<char>
Method 1 ifstream.get()
provides working code, and a safe rapidXML document object:
rapidxml::xml_document<> *load_config(rapidxml::xml_document<> *doc){
ifstream myfile("inputfile");
//read in config file
char ch;
char buffer[65536];
size_t chars_read = 0;
while(myfile.get(ch) && (chars_read < 65535)){
buffer[chars_read++] = ch;
}
buffer[chars_read++] = '\0';
cout<<"clearing old doc"<<endl;
doc->clear();
doc->parse<0>(buffer);
//debug returns as expected here
cout << "load_config: Name of my first node is: " << doc->first_node()->name() << "\n";
return doc;
}
Method 2 results in a cloberred rapidXML document by another library - specifically, a call to curl_global_init(CURL_GLOBAL_SSL) [see main code below] - but I'm not开发者_运维知识库 blaming it on curl_global_init just yet.
rapidxml::xml_document<> *load_config(rapidxml::xml_document<> *doc){
ifstream myfile("inputfile");
vector<char> buffer((istreambuf_iterator<char>(inputfile)),
istreambuf_iterator<char>( ));
buffer.push_back('\0');
cout<<"file looks like:"<<endl; //looks fine
cout<<&buffer[0]<<endl;
cout<<"clearing old doc"<<endl;
doc->clear();
doc->parse<0>(&buffer[0]);
//debug prints as expected
cout << "load_config: Name of my first node is: " << doc->first_node()->name() << "\n";
return doc;
}
main code:
int main(void){
rapidxml::xml_document *doc;
doc = new rapidxml::xml_document;
load_config(doc);
// this works fine:
cout << "Name of my first node is: " << doc->first_node()->name() << "\n";
curl_global_init(CURL_GLOBAL_SSL); //Docs say do this first.
// debug broken object instance:
// note a trashed 'doc' here if using vector<char> method
// - seems to be because of above line... name is NULL
// and other nodes are now NULL
// causing segfaults down stream.
cout << "Name of my first node is: " << doc->first_node()->name() << "\n";
I am pretty darn sure this is all executed in a single thread, but maybe there is something going on beyond my understanding.
I'm also worried that I only fixed a symptom, not a cause... by simply changing my file load function. Looking to the community for help here!
Question: Why would moving away from the vector to a character array fix this?
Hint: I'm aware that rapidXML uses some clever memory management that actually accesses the input string directly.
Hint: The main function above creates a dynamic (new) xml_document. This was not in the original code, and is an artifact of debugging changes. The original (failing) code declared it and did not dynamically allocate it, but identical problems occurred.
Another Hint for full disclosure (although I don't see why it matters) - there is another instance of a vector in this mess of code that is populated by the data in the rapidxml::xml_document object.
The only difference between the two is that the vector
version works correctly and the char
array version causes undefined behavior when the file is longer than 65535 characters (it writes the \0
to the 65535th or 65536th position, which are out-of-bounds).
Another problem that is common to both versions, is that you read the file into a memory that has shorter life-time than the xml_document
. Read the documentation:
The string must persist for the lifetime of the document.
When load_config
exits the vector
is destroyed and the memory is freed. Attempt to access the document cause reading invalid memory (undefined behavior).
In the char
array version the memory is allocated on the stack. It is still 'freed' when load_config
exists (accessing it causes undefined behavior). But you don't see the crash because it has not yet been overwritten.
精彩评论