开发者

Separating alphabetic characters in C++ STL

I've been practicing C++ for a competition next week. And in the sample problem I've been working on, requires splitting of paragraphs into words. Of course, that's easy. But this problem is so weird, that the words like: isn't should be separated as well: isn and t. I know it's weird but I have to follow this.

I have a function split() that takes a constant char delimiter as one of the parameter. It's what I use to separate words from spaces. But I can't figure out this one. Even numbers like: 开发者_运维百科phil67bs should be separated as phil and bs.

And no, I don't ask for full code. A pseudocode will do, or something that will help me understand what to do. Thanks!

PS: Please no recommendations for external libs. Just the STL. :)


Filter out numbers, spaces and anything else that isn't a letter by using a proper locale. See this SO thread about treating everything but numbers as a whitespace. So use a mask and do something similar to what Jerry Coffin suggests but only for letters:

struct alphabet_only: std::ctype<char> 
{
    alphabet_only(): std::ctype<char>(get_table()) {}

    static std::ctype_base::mask const* get_table()
    {
        static std::vector<std::ctype_base::mask> 
            rc(std::ctype<char>::table_size,std::ctype_base::space);

        std::fill(&rc['A'], &rc['['], std::ctype_base::upper);
        std::fill(&rc['a'], &rc['{'], std::ctype_base::lower);
        return &rc[0];
    }
};

And, boom! You're golden.

Or... you could just do a transform:

char changeToLetters(const char& input){ return isalpha(input) ? input : ' '; }

vector<char> output;
output.reserve( myVector.size() );
transform( myVector.begin(), myVector.end(), insert_iterator(output), ptr_fun(changeToLetters) );

Which, um, is much easier to grok, just not as efficient as Jerry's idea.

Edit:

Changed 'Z' to '[' so that the value 'Z' is filled. Likewise with 'z' to '{'.


This sounds like a perfect job for the find_first_of function which finds the first occurrence of a set of characters. You can use this to look for arbitrary stop characters and generate words from the spaces between such stop characters.

Roughly:

size_t previous = 0;
for (; ;) {
    size_t next = str.find_first_of(" '1234567890", previous);
    // Do processing
    if (next == string::npos)
        break;
    previous = next + 1;
};


Just change your function to delimit on anything that isn't an alphabetic character. Is there anything in particular that you are having trouble with?

Break down the problem: First, write a function that gets the first "word" from the sentence. This is easy; just look for the first non-alphabetic character. The next step is to remove all leading non-alphabetic character from the remaining string. From there, just repeat.


You can do something like this:

vector<string> split(const string& str)
{
    vector<string> splits;

    string cur;
    for(int i = 0; i < str.size(); ++i)
    {
        if(str[i] >= '0' && str[i] <= '9')
        {
            if(!cur.empty())
            {
                splits.push_back(cur);
            }
            cur="";
        }
        else
        {
            cur += str[i];
        }
    }
    if(! cur.empty())
    {
        splits.push_back(cur);
    }

    return splits;

}


let's assume that the input is in a std::string (use std::getline(cin, line) for example to read a full line from cin)

std::vector<std::string> split(std::string const& input)
{
  std::string::const_iterator it(input), end(input.end());
  std::string current;
  vector<std::string> words;
  for(; it != end; ++it)
  {
    if (isalpha(*it))
    { 
      current.push_back(*it); // add this char to the current word
    }
    else
    {
      // push the current word in to the result list
      words.push_back(current);
      current.clear(); // next word
    }
  }
  return words;
}

I've not tested it, but I guess it ought to work...

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜