Segmentation fault in std::less<char>
I have the following code (C++0x):
const set<char> s_special_characters = { '(', ')', '{', '}', ':' };
void nectar_loader::tokenize( string &line, const set<char> &special_characters )
{
auto it = line.begin();
const auto not_found = special_characters.end();
// first character special case
if( it != line.end() && special_characters.find( *it ) != not_found )
it = line.insert( it+1, ' ' ) + 1;
while( it != line.end() )
{
// check if we're dealing with a special character
if( special_characters.find(*it) != not_found ) // <----------
{
// ensure a space before
if( *(it-1) != ' ' )
it = line.insert( it, ' ' ) + 1;
// ensure a space after
if( (it+1) != line.end() && *(it+1) != ' ' )
it = line.insert( it+1, ' ');
else
line.append(" ");
}
++it;
}
}
with the crash pointing at the indicated line. This results in a segfault with this gdb backtrace:
#0 0x000000000040f043 in std::less<char>::operator() (this=0x622a40, __x=@0x623610, __y=@0x644000)
at /usr/lib/gcc/x86_64-unknown-linux-gnu/4.5.2/../../../../include/c++/4.5.2/bits/stl_function.h:230
#1 0x000000000040efa6 in std::_Rb_tree<char, char, std::_Identity<char>, std::less<char>, std::allocator<char> >::_M_lower_bound (this=0x622a40, __x=0x6235f0, __y=0x622a48, __k=@0x644000)
at /usr/lib/gcc/x86_64-unknown-linux-gnu/4.5.2/../../../../include/c++/4.5.2/bits/stl_tree.h:1020
#2 0x000000000040e840 in std::_Rb_tree<char, char, std::_Identity<char>, std::less<char>, std::allocator<char> >::find (this=0x622a40, __k=@0x644000)
at /usr/lib/gcc/x86_64-unknown-linux-gnu/4.5.2/../../../../include/c++/4.5.2/bits/stl_tree.h:1532
#3 0x000000000040e4fd in std::set<char, std::less<char>, std::allocator<char> >::find (this=0x622a40, __x=@0x644000)
at /usr/lib/gcc/x86_64-unknown-linux-gnu/4.5.2/../../../../include/c++/4.5.2/bits/stl_set.h:589
#4 0x000000000040de51 in ambrosia::nectar_loader::tokenize (this=0x7fffffffe3b0, line=..., special_characters=...)
at ../../ambrosia/Library/Source/Ambrosia/nectar_loader.cpp:146
#5 0x000000000040dbf5 in ambrosia::nectar_loader::fetch_line (this=0x7fffffffe3b0)
at ../../ambrosia/Library/Source/Ambrosia/nectar_loader.cpp:112
#6 0x000000000040dd11 in ambrosia::nectar_loader::fetch_token (this=0x7fffffffe3b0, token=...)
at ../../ambrosia/Library/Source/Ambrosia/nectar_loader.cpp:121
#7 0x000000000040d9c4 in ambrosia::nectar_loader::next_token (this=0x7fffffffe3b0)
at ../../ambrosia/Library/Source/Ambrosia/nectar_loader.cpp:72
#8 0x000000000040e472 in ambrosia::nectar_loader::extract_nectar<std::back_insert_iterator<std::vector<ambrosia::target> > > (this=0x7fffffffe3b0, it=...)
at ../../ambrosia/Library/Source/Ambrosia/nectar_loader.cpp:43
#9 0x000000000040d46d in ambrosia::drink_nectar<std::back_insert_iterator<std::vector<ambrosia::target> > > (filename=..., it=...)
at ../../ambrosia/Library/Source/Ambrosia/nectar.cpp:75
#10 0x00000000004072ae in ambrosia::reader::event (this=0x623770)
I'm at a loss, and have no clue where I'm doing something wrong. Any help is much appreciated.
EDIT: the string at the moment o开发者_如何学运维f the crash is
sub Ambrosia : lib libAmbrosia
UPDATE:
I replaced the above function following suggestions in comments/answers. Below is the result.
const string tokenize( const string &line, const set<char> &special_characters )
{
const auto not_found = special_characters.end();
const auto end = line.end();
string result;
if( !line.empty() )
{
// copy first character
result += line[0];
char previous = line[0];
for( auto it = line.begin()+1; it != end; ++it )
{
const char current = *it;
if( special_characters.find(previous) != not_found )
result += ' ';
result += current;
previous = current;
}
}
return result;
}
Another guess is that line.append(" ")
will sometimes invalidate it
, depending on the original capacity of the line.
You don't check that it != line.end()
before the first time you dereference it
.
I could not spot the error, I would suggest iterating slowly with the debugger since you have identitied the issue.
I'll just that in general, modifying what you are iterating over is extremely prone to failure.
I'd recommend using Boost Tokenizer, and more precisely: boost::token_iterator
combined with boost::char_separator
(code example included).
You could then simply build a new string
from the first, and return the new string from the function. The speed up on computation should cover the memory allocation.
精彩评论