开发者

C++ function to count all the words in a string

I was asked this during an interview and apparently it's an easy question but it wasn't and still isn't obvious to me.

Given a string, count all the words in it. Doesn't matter if they are repeated. Just the total coun开发者_如何学运维t like in a text files word count. Words are anything separated by a space and punctuation doesn't matter, as long as it's part of a word.

For example: A very, very, very, very, very big dog ate my homework!!!! ==> 11 words

My "algorithm" just goes through looking for spaces and incrementing a counter until I hit a null. Since i didn't get the job and was asked to leave after that I guess My solution wasn't good? Anyone have a more clever solution? Am I missing something?


Assuming words are white space separated:

unsigned int countWordsInString(std::string const& str)
{
    std::stringstream stream(str);
    return std::distance(std::istream_iterator<std::string>(stream), std::istream_iterator<std::string>());
}

Note: There may be more than one space between words. Also this does not catch other white space characters like tab new line or carriage return. So counting spaces is not enough.

The stream input operator >> when used to read a string from a stream. Reads one white space separated word. So they were probably looking for you to use this to identify words.

std::stringstream  stream(str);
std::string        oneWord;

stream >> oneWord; // Reads one space separated word.

When can use this to count words in a string.

std::stringstream  stream(str);
std::string        oneWord;
unsigned int       count = 0;

while(stream >> oneWord) { ++count;}
// count now has the number of words in the string.

Getting complicated:
Streams can be treated just like any other container and there are iterators to loop through them std::istream_iterator. When you use the ++ operator on an istream_iterator it just read the next value from the stream using the operator >>. In this case we are reading std::string so it reads a space separated word.

std::stringstream  stream(str);
std::string        oneWord;
unsigned int       count = 0;

std::istream_iterator loop = std::istream_iterator<std::string>(stream);
std::istream_iterator end  = std::istream_iterator<std::string>();

for(;loop != end; ++count, ++loop) { *loop; }

Using std::distance just wraps all the above in a tidy package as it find the distance between two iterators by doing ++ on the first until we reach the second.

To avoid copying the string we can be sneaky:

unsigned int countWordsInString(std::string const& str)
{
    std::stringstream stream;

    // sneaky way to use the string as the buffer to avoid copy.
    stream.rdbuf()->pubsetbuf (str.c_str(), str.length() );
    return std::distance(std::istream_iterator<std::string>(stream), std::istream_iterator<std::string>());
}

Note: we still copy each word out of the original into a temporary. But the cost of that is minimal.


A less clever, more obvious-to-all-of-the-programmers-on-your-team method of doing it.

#include <cctype>

int CountWords(const char* str)
{
   if (str == NULL)
      return error_condition;  // let the requirements define this...

   bool inSpaces = true;
   int numWords = 0;

   while (*str != '\0')
   {
      if (std::isspace(*str))
      {
         inSpaces = true;
      }
      else if (inSpaces)
      {
         numWords++;
         inSpaces = false;
      }

      ++str;
   }

   return numWords;
}


You can use the std::count or std::count_if to do that. Below a simple example with std::count:

//Count the number of words on string
#include <iostream>
#include <string>
#include <algorithm> //count and count_if is declared here

int main () {
    std::string sTEST("Text to verify how many words it has.");

    std::cout << std::count(sTEST.cbegin(), sTEST.cend(), ' ')+1;

    return 0;
}

UPDATE: Due the observation made by Aydin Özcan (Nov 16) I made a change to this solution. Now the words may have more than one space between them. :)

//Count the number of words on string
#include <string>
#include <iostream>

int main () {
    std::string T("Text to   verify :  How many words does it   have?");

    size_t NWords = T.empty() || T.back() == ' ' ? 0 : 1;
    for (size_t s = T.size(); s > 0; --s)
        if (T[s] == ' ' && T[s-1] != ' ') ++NWords;

    std::cout << NWords;

    return 0;
}


Another boost based solution that may work (untested):

vector<string> result;
split(result, "aaaa bbbb cccc", is_any_of(" \t\n\v\f\r"), token_compress_on);

More information can be found in the Boost String Algorithms Library


This can be done without manually looking at every character or copying the string.

#include <boost/iterator/transform_iterator.hpp>
#include <cctype>

boost::transform_iterator
    < int (*)(int), std::string::const_iterator, bool const& >
    pen( str.begin(), std::isalnum ), end( str.end(), std::isalnum );

size_t word_cnt = 0;

while ( pen != end ) {
    word_cnt += * pen;
    pen = std::mismatch( pen+1, end, pen ).first;
}

return word_cnt;

I took the liberty of using isalnum instead of isspace.

This is not something I would do at a job interview. (It's not like it compiled the first time.)

Or, for all the Boost haters ;v)

if ( str.empty() ) return 0;

size_t word_cnt = std::isalnum( * str.begin() );

for ( std::string::const_iterator pen = str.begin(); ++ pen != str.end(); ) {
    word_cnt += std::isalnum( pen[ 0 ] ) && ! std::isalnum( pen[ -1 ] );
}

return word_cnt;


An O(N) solution that is also very simple to understand and implement:

(I haven't checked for an empty string input. But I am sure you can do that easily.)

#include <iostream>
#include <string>
using namespace std;

int countNumberOfWords(string sentence){
    int numberOfWords = 0;
    size_t i;

    if (isalpha(sentence[0])) {
        numberOfWords++;
    }

    for (i = 1; i < sentence.length(); i++) {
        if ((isalpha(sentence[i])) && (!isalpha(sentence[i-1]))) {
            numberOfWords++;
        }
    }

    return numberOfWords;
}

int main()
{
    string sentence;
    cout<<"Enter the sentence : ";
    getline(cin, sentence);

    int numberOfWords = countNumberOfWords(sentence);
    cout<<"The number of words in the sentence is : "<<numberOfWords<<endl;

    return 0;
}


Here is a single pass, branchless (almost), locale-aware algorithm which handles cases with more than one space between words:

  1. If the string is empty return 0
  2. let transitions = number of adjacent char pairs (c1, c2) where c1 == ' ' and c2 != ' '
  3. if the sentence starts with a space, return transitions else return transitions + 1

Here is an example with string = "A very, very, very, very, very big dog ate my homework!!!!"

 i | 0123456789
c1 | A very, very, very, very, very big dog ate my homework!!!!
c2 |  A very, very, very, very, very big dog ate my homework!!!!
   |  x     x     x     x     x    x   x   x   x  x

Explanation

Let `i` be the loop counter.

When i=0: c1='A' and c2=' ', the condition `c1 == ' '` and `c2 != ' '` is not met
When i=1: c1=' ' and c2='A', the condition is met
... and so on for the remaining characters

Here are 2 solutions I came up with

Naive solution

size_t count_words_naive(const std::string_view& s)
{
    if (s.size() == 0) return 0;
    size_t count = 0;
    bool isspace1, isspace2 = true;
    for (auto c : s) {
        isspace1 = std::exchange(isspace2, isspace(c));
        count += (isspace1 && !isspace2);
    }
    return count;
}

If you think carefully, you will be able to reduce this set of operations into an inner product (just for fun, I don't recommend this as this is arguably much less readable).

Inner product solution

size_t count_words_using_inner_prod(const std::string_view& s)
{
    if (s.size() == 0) return 0;
    auto starts_with_space = isspace(s.front());
    auto num_transitions = std::inner_product(
            s.begin()+1, s.end(), s.begin(), 0, std::plus<>(),
            [](char c2, char c1) { return isspace(c1) && !isspace(c2); });
    return num_transitions + !starts_with_space;
}


I think that will help

the complexty O(n)

#include <iostream>
#include <string>
#include <ctype.h>
using namespace std;


int main()
{
    int count = 0, size;
    string sent;

    getline(cin, sent);

    size = sent.size();

check if the char is in alpha and the next char not in alpha

     for (int i = 0; i < size - 1; ++i) { 
       if (isalpha(sent[i]) && !isalpha(sent[i+1])) { 
            ++count;
        }
     }

if the word in the last of sentence didn't count above so it count here

     if (isalpha(sent[size - 1]))++count;

    cout << count << endl;
     return 0;
}


A very concise O(N) approach:

bool is_letter(char c) { return c >= 'a' && c <= 'z' || c >= 'A' && c <= 'Z'; }

int count_words(const string& s) {
    int i = 0, N = s.size(), count = 0;
    while(i < N) {
        while(i < N && !is_letter(s[i])) i++;
        if(i == N) break;
        while(i < N && is_letter(s[i])) i++;
        count++;
    }
    return count;
}

A divide-and-conquer approach, complexity is also O(N):

int DC(const string& A, int low, int high) {
    if(low > high) return 0;
    int mid = low + (high - low) / 2;

    int count_left = DC(A, low, mid-1);
    int count_right = DC(A, mid+1, high);

    if(!is_letter(A[mid])) 
        return count_left + count_right;
    else {
        if(mid == low && mid == high) return 1;
        if(mid-1 < low) {
            if(is_letter(A[mid+1])) return count_right;
            else return count_right+1;
        } else if(mid+1 > high) {
            if(is_letter(A[mid-1])) return count_left;
            else return count_left+1;
        }
        else {
            if(!is_letter(A[mid-1]) && !is_letter(A[mid+1])) 
                return count_left + count_right + 1;
            else if(is_letter(A[mid-1]) && is_letter(A[mid+1]))
                return count_left + count_right - 1;
            else
                return count_left + count_right;
        }
    }
}

int count_words_divide_n_conquer(const string& s) {
    return DC(s, 0, s.size()-1);
}


Efficient version based on map-reduce approach

#include <iostream>
#include <string_view>
#include <numeric>

std::size_t CountWords(std::string_view s) {
    if (s.empty())
        return 0;

    std::size_t wc = (!std::isspace(s.front()) ? 1 : 0);
    wc += std::transform_reduce(
        s.begin(),
        s.end() - 1,
        s.begin() + 1,
        std::size_t(0),
        std::plus<std::size_t>(),
                [](char left, char right) {
        return std::isspace(left) && !std::isspace(right);
    });

    return wc;
}

int main() {
    std::cout << CountWords("  pretty  little octopus "sv) << std::endl;

    return 0;
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜