开发者

C++: Removing all asterisks from a string where the asterisks are NOT multiplication symbols

So basically, I might have some string that looks like: "hey this is a string * this string is awesome 97 * 3 = 27 * this string is cool".

However, this string might be huge. I'm trying to remove all the asterisks from the string, unless that asterisk appears to represent multiplication. Efficiency is somewhat important here, and I'm having trouble coming up with a good algorithm to remove all the non-multiplication asterisks from this.

In order to determine whether an asterisk is for multiplication, I can obviously just check whether it'开发者_开发知识库s sandwiched in between two numbers.

Thus, I was thinking I could do something like (pseudocode):

wasNumber = false
Loop through string
   if number 
      set wasNumber = true
   else
      set wasNumber = false
   if asterisk
      if wasNumber
         if the next word is a number
            do nothing
         else
            remove asterisk
      else
         remove asterisk

However, that^ is ugly and inefficient on a huge string. Can you think of a better way to accomplish this in C++?

Also, how could I actually check whether a word is a number? It's allowed to be a decimal. I know there's a function to check if a character is a number...


Fully functioning code:

#include <iostream>
#include <string>
using namespace std;

string RemoveAllAstericks(string);
void RemoveSingleAsterick(string&, int);
bool IsDigit(char);

int main()
{
    string myString = "hey this is a string * this string is awesome 97 * 3 = 27 * this string is cool";
    string newString = RemoveAllAstericks(myString);

    cout << "Original: " << myString << "\n";
    cout << "Modified: " << newString << endl;

    system("pause");
    return 0;
}

string RemoveAllAstericks(string s)
{
    int len = s.size();
    int pos;

    for(int i = 0; i < len; i++)
    {
       if(s[i] != '*') 
          continue;

       pos = i - 1;
       char cBefore = s[pos];
       while(cBefore == ' ')
       {
          pos--;
          cBefore = s[pos];
       }

       pos = i + 1;
       char cAfter  = s[pos];
       while(cAfter == ' ')
       {
          pos++;
          cAfter = s[pos];
       }

       if( IsDigit(cBefore) && IsDigit(cAfter) )
          RemoveSingleAsterick(s, i);
    }

    return s;
}

void RemoveSingleAsterick(string& s, int i)
{
    s[i] = ' '; // Replaces * with a space, but you can do whatever you want
}

bool IsDigit(char c)
{
   return (c <= 57 && c >= 48);
}

Top level overview:

Code searches the string until it encounters an *. Then, it looks at the first non-whitespace character before AND after the *. If both characters are numeric, the code decides that this is a multiplication operation, and removes the asterick. Otherwise, it is ignored.

See the revision history of this post if you'd like other details.

Important Notes:

  • You should seriously consider adding boundary checks on the string (i.e. don't try to access an index that is less than 0 or greater than len
  • If you are worried about parentheses, then change the condition that checks for whitespaces to also check for parentheses.
  • Checking whether every single character is a number is a bad idea. At the very least, it will require two logical checks (see my IsDigit() function). (My code checks for '*', which is one logical operation.) However, some of the suggestions posted were very poorly thought out. Do not use regular expressions to check if a character is numeric.

Since you mentioned efficiency in your question, and I don't have sufficient rep points to comment on other answers:

A switch statement that checks for '0' '1' '2' ..., means that every character that is NOT a digit, must go through 10 logical operations. With all due respect, please, since chars map to ints, just check the boundaries (char <= '9' && char >= '0')


You can start by implementing the slow version, it could be much faster than you think. But let's say it's too slow. It then is an optimization problem. Where does the inefficiency lies?

  • "if number" is easy, you can use a regex or anything that stops when it finds something that is not a digit
  • "if the next word is a number" is just as easy to implement efficiently.

Now, it's the "remove asterisk" part that is an issue to you. The key point to notice here is that you don't need to duplicate the string: you can actually modify it in place since you are only removing elements.

Try to run through this visually before trying to implement it.

Keep two integers or iterators, the first one saying where you are currently reading your string, and the second one saying where you are currently writing your string. Since you only erase stuff, the read one will always be ahead of the writing one.

If you decide to keep the current string, you just need to advance each of your integers/iterators one by one, and copying accordingly. If you don't want to keep it, just advance the reading string! Then you only have to cut the string by the amount of asterisks you removed. The complexity is simply O(n), without any additional buffer used.

Also note that your algorithm would be simpler (but equivalent) if written like this:

wasNumber = false
Loop through string
   if number 
      set wasNumber = true
   else
      set wasNumber = false
      if asterisk and wasNumber and next word is a number
          do nothing // using my algorithm, "do nothing" actually copies what you intend to keep
      else
          remove asterisk


I found your little problem interesting and I wrote (and tested) a small and simple function that would do just that on a std::string. Here u go:

// TestStringsCpp.cpp : Defines the entry point for the console application.
//
#include "stdafx.h"
#include <string>
#include <iostream>

using namespace std;

string& ClearAsterisk(string& iString)
{
    bool bLastCharNumeric = false;
    string lString = "0123456789";

    for (string::iterator it = iString.begin(); it != iString.end() ; ++it) {
        switch (*it) {
        case ' ':   break;//ignore whitespace characters
        case '*':
            if (bLastCharNumeric) {
                //asterisk is preceded by numeric character. we have to check if
                //the following non space character is numeric also
                for (string::iterator it2 = it + 1; it2 != iString.end() ; ++it2) {
                    if (*it2 != ' ') {
                        if (*it2 <= '9' && *it2 >= '0') break;
                        else iString.erase(it);
                        break;  //exit current for
                    }
                }
            }
            else iString.erase(it);;
            break;

        default:
            if (*it <= '9' && *it >= '0') bLastCharNumeric= true;
            else bLastCharNumeric = false;  //reset flag
        }
    }
    return iString;
}

int _tmain(int argc, _TCHAR* argv[])
{
    string testString = "hey this is a string * this string is awesome 97 * 3 = 27 * this string is cool";

    cout<<ClearAsterisk(testString).c_str();
    cin >> testString;  //this is just for the app to pause a bit :)

    return 0;
}

It will work perfectly with your sample string but it will fail if you have a text like this: "this is a happy 5 * 3day menu" because it checks only for the first nonspace character after the '*'. But frankly I can't immagine a lot of cases you would have this kind of construct in a sentence.

HTH,
JP.


A regular expression wouldn't necessarily be any more efficient, but it would let you rely on somebody else to do your string parsing and manipulation.

Personally, if I were worried about efficiency, I would implement your pseudocode version while limiting needless memory allocations. I might even mmap the input file. I highly doubt that you'll get much faster than that.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜