开发者

C++ removing punctuation on strings, erase()/iterator issue

I know I'm not the first person to bring up the issue with reverse iterators trying to call the erase() method on strings. However, I wasn't able to find any good ways around this.

I'm reading the contents of a file, which contains a bunch of words. When I read in a word, I want to pass it to a function I have called stripPunct. However, I ONLY want to strip punctuation at the beginning and end of a string, not in the middle.

So for instance:

(word) should strip '(' and ')' resulting in just word

don't! should strip '!' resulting in just don't

So my logic (which I'm sure could be improved) was to have two while loops, one starting at the end and one at the beginning, traversing and erasing until it hits a non-punctuation char.

void stripPunct(string & str) {
    string::iterator itr1 = str.begin();
    string::reverse_iterator itr2 = str.rbegin();

    while ( ispunct(*itr1) ) {
        str.erase(itr1);
        itr1++;
    }

    while ( ispunct(*itr2) ) {
        str.erase(itr2);
        itr2--;
    }
}

However, obviously it's not working because erase() requires a regular iterator and not a reverse_iterator. But anyways, I feel like that logic is pretty inefficient.

Also, I tried instead of a reverse_iterator using just a regular iterator, starting it at str.end(), then decremented it, but it says I cannot dereference the iterator if I start it at str.end().

Can anyone help me with a good way to do th开发者_开发知识库is? Or maybe point out a workaround for what I already have?

Thank you so much in advance!

------------------ [ EDIT ] ----------------------------

found a solution, although it may not be the best solution:

// Call the stripPunct method:

stripPunct(str);
if ( !str.empty() ) { // make sure string is still valid
  // perform other code
}

And here is the stripPunct method:

void stripPunct(string & str) {
   string::iterator itr1 = str.begin();
   string::iterator itr2 = str.end();

   while ( !(str.empty()) && ispunct(*itr1) ) 
       itr1 = str.erase(itr1);

   itr2--;
   if ( itr2 != str.begin() ) {

       while ( !(str.empty()) && ispunct(*itr2) ) {
           itr2 = str.erase(itr2);
           itr2--;
       }
   }
}


First, note a couple problems with your code:

  • after you call erase() using itr1, you've invalidated itr2.
  • when using a reverse_iterator to go backwards through a sequence, you want to use ++, not -- (that's kind of the reason reverse iterators exist).

Now, to improve the logic, you can avoid erasing each character individually by finding the first charater you don't want to erase and erase everything up to that point. find_if() can be used to help with that:

int not_punct(char c) {
    return !ispunct((unsigned char) c);
}

void stripPunct(string & str) {
    string::iterator itr = find_if( str.begin(), str.end(), not_punct);

    str.erase( str.begin(), itr);

    string::reverse_iterator ritr = find_if( str.rbegin(), str.rend(), not_punct);

    str.erase( ritr.base(), str.end());
}

Note that I've used base() to get the 'regular' iterator corresponding to the reverse_iterator. I find the logic for whether base() needs to be adjusted confusing (reverse iterators in general confuse me)- in this case it doesn't because we happen to want to start the erase after the character that's found.

This article by Scott Meyers, http://drdobbs.com/cpp/184401406, has a good treatment of reverse_iterator::base() in the section. "Guideline 3: Understand How to Use a reverse_iterator's Base iterator". The information in that article has also been incorporated into Meyer's "Effective STL" book.


You can't dereference iterator::end() because it points to invalid memory (memory right after the end of the array), so you have to decrement it first.

And one final note: if the word consists only of punctuations, your program will fail, be sure to handle that.


If you don't mind negative logic, you can do the following:

string tmp_str="";
tmp_str.reserve(str.length());
for (string::iterator itr1 = str.begin(); itr1 != str.end(); itr1++)
{
   if (!ispunct(*itr1))
   {
      tmp_str.push_back(*itr1);
   }
}
str = tmp_str;
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜