Selective iterator

2023-01-03 08:05 问答作者：

FYI: no boost, yes it has this, I want to reinvent the wheel ;)

Is there some form of a selective iterator (possible) in C++? What I want is to seperate strings like this:

some:word{or other

to a form like this:开发者_如何学Python

some : word { or other

I can do that with two loops and find_first_of(":") and ("{") but this seems (very) inefficient to me. I thought that maybe there would be a way to create/define/write an iterator that would iterate over all these values with for_each. I fear this will have me writing a full-fledged custom way-too-complex iterator class for a std::string.

So I thought maybe this would do:

std::vector<size_t> list;
size_t index = mystring.find(":");
while( index != std::string::npos )
{
    list.push_back(index);
    index = mystring.find(":", list.back());
}
std::for_each(list.begin(), list.end(), addSpaces(mystring));

This looks messy to me, and I'm quite sure a more elegant way of doing this exists. But I can't think of it. Anyone have a bright idea? Thanks

PS: I did not test the code posted, just a quick write-up of what I would try

UPDATE: after taking all your answers into account, I came up with this, and it works to my liking :). this does assume the last char is a newline or something, otherwise an ending {,}, or : won't get processed.

void tokenize( string &line )
{
    char oneBack = ' ';
    char twoBack = ' ';
    char current = ' ';
    size_t length = line.size();

    for( size_t index = 0; index<length; ++index )
    {
        twoBack = oneBack;
        oneBack = current;
        current = line.at( index );
        if( isSpecial(oneBack) )
        {
            if( !isspace(twoBack) ) // insert before
            {
                line.insert(index-1, " ");
                ++index;
                ++length;
            }
            if( !isspace(current) ) // insert after
            {
                line.insert(index, " ");
                ++index;
                ++length;
            }
        }
    }

Comments are welcome as always :)

That's relatively easy using the std::istream_iterator.

What you need to do is define your own class (say Term). Then define how to read a single "word" (term) from the stream using the operator >>.

I don't know your exact definition of a word is, so I am using the following definition:

Any consecutive sequence of alpha numeric characters is a term
Any single non white space character that is also not alpha numeric is a word.

Try this:

#include <string>
#include <sstream>
#include <iostream>
#include <iterator>
#include <algorithm>

class Term
{
    public:

        // This cast operator is not required but makes it easy to use
        // a Term anywhere that a string can normally be used.
        operator std::string const&() const {return value;}

    private:
        // A term is just a string
        // And we friend the operator >> to make sure we can read it.
        friend std::istream& operator>>(std::istream& inStr,Term& dst);
        std::string     value;
};

Now all we have to do is define an operator >> that reads a word according to the rules:

// This function could be a lot neater using some boost regular expressions.
// I just do it manually to show it can be done without boost (as requested)
std::istream& operator>>(std::istream& inStr,Term& dst)
{
   // Note the >> operator drops all proceeding white space.
   // So we get the first non white space
   char first;
   inStr >> first;

   // If the stream is in any bad state the stop processing.
   if (inStr)
   {
       if(std::isalnum(first))
       {
           // Alpha Numeric so read a sequence of characters
           dst.value = first;

           // This is ugly. And needs re-factoring.
           while((first = insStr.get(), inStr) && std::isalnum(first))
           {
               dst.value += first;
           }

           // Take into account the special case of EOF.
           // And bad stream states.
           if (!inStr)
           {
               if (!inStr.eof())
               {
                   // The last letter read was not EOF and and not part of the word
                   // So put it back for use by the next call to read from the stream.
                   inStr.putback(first);
               }
               // We know that we have a word so clear any errors to make sure it
               // is used. Let the next attempt to read a word (term) fail at the outer if.
               inStr.clear();
           }
       }
       else
       {
           // It was not alpha numeric so it is a one character word.
           dst.value   = first;
       }
  }
  return inStr;
}

So now we can use it in standard algorithms by just employing the istream_iterator

int main()
{
    std::string         data    = "some:word{or other";
    std::stringstream   dataStream(data);


    std::copy(  // Read the stream one Term at a time.
                std::istream_iterator<Term>(dataStream),
                std::istream_iterator<Term>(),

                // Note the ostream_iterator is using a std::string
                // This works because a Term can be converted into a string.
                std::ostream_iterator<std::string>(std::cout, "\n")
             );

}

The output:

> ./a.exe
some
:
word
{
or
other

std::string const str = "some:word{or other";

std::string result;
result.reserve(str.size());
for (std::string::const_iterator it = str.begin(), end = str.end();
     it != end; ++it)
{
  if (isalnum(*it))
  {
    result.push_back(*it);
  }
  else
  {
    result.push_back(' '); result.push_back(*it); result.push_back(' ');
  }
}

Insert version for speed-up

std::string str = "some:word{or other";

for (std::string::iterator it = str.begin(), end = str.end(); it != end; ++it)
{
  if (!isalnum(*it))
  {
    it = str.insert(it, ' ') + 2;
    it = str.insert(it, ' ');
    end = str.end();
  }
}

Note that std::string::insert inserts BEFORE the iterator passed and returns an iterator to the newly inserted character. Assigning is important since the buffer may have been reallocated at another memory location (the iterators are invalidated by the insertion). Also note that you can't keep end for the whole loop, each time you insert you need to recompute it.

a more elegant way of doing this exists.

I do not know how BOOST implements that, but traditional way is by feeding input string character by character into a FSM which detects where tokens (words, symbols) start and end.

I can do that with two loops and find_first_of(":") and ("{")

One loop with std::find_first_of() should suffice.

Though I'm still a huge fan of FSMs for such parsing tasks.

P.S. Similar question

How about something like:

std::string::const_iterator it, end = mystring.end();
for(it = mystring.begin(); it != end; ++it) {
  if ( !isalnum( *it ))
    list.push_back(it);
}

This way, you'll only iterate once through the string, and isalnum from ctype.h seems to do what you want. Of course, the code above is very simplistic and incomplete and only suggests a solution.

Are you looking to tokenize the input string, ala strtok?

If so, here is a tokenizing function that you can use. It takes an input string and a string of delimiters (each char int he string is a possible delimitter), and it returns a vector of tokens. Each token is a tuple with the delimitted string, and the delimiter used in that case:

#include <cstdlib>
#include <vector>
#include <string>
#include <functional>
#include <iostream>
#include <algorithm>
using namespace std;

//  FUNCTION :      stringtok(char const* Raw, string sToks)
//  PARAMATERS :    Raw     Pointer to NULL-Terminated string containing a string to be tokenized.
//                  sToks   string of individual token characters -- each character in the string is a token
//  DESCRIPTION :   Tokenizes a string, much in the same was as strtok does.  The input string is not modified.  The
//                  function is called once to tokenize a string, and all the tokens are retuned at once.
//  RETURNS :       Returns a vector of strings.  Each element in the vector is one token.  The token character is
//                  not included in the string.  The number of elements in the vector is N+1, where N is the number
//                  of times the Token character is found in the string.  If one token is an empty string (as with the
//                  string "string1##string3", where the token character is '#'), then that element in the vector
//                  is an empty string.
//  NOTES :         
//
typedef pair<char,string> token;    // first = delimiter, second = data
inline vector<token> tokenize(const string& str, const string& delims, bool bCaseSensitive=false)   // tokenizes a string, returns a vector of tokens
{
    bCaseSensitive;

    // prologue
    vector<token> vRet;
    // tokenize input string
    for( string::const_iterator itA = str.begin(), it=itA; it != str.end(); it = find_first_of(++it,str.end(),delims.begin(),delims.end()) )
    {
        // prologue
        // find end of token
        string::const_iterator itEnd = find_first_of(it+1,str.end(),delims.begin(),delims.end());
        // add string to output
        if( it == itA ) vRet.push_back(make_pair(0,string(it,itEnd)));
        else            vRet.push_back(make_pair(*it,string(it+1,itEnd)));
        // epilogue
    }
    // epilogue
    return vRet;
}

using namespace std;

int main()
{
    string input = "some:word{or other";
    typedef vector<token> tokens;
    tokens toks = tokenize(input.c_str(), " :{");
    cout << "Input: '" << input << " # Tokens: " << toks.size() << "'\n";
    for( tokens::iterator it = toks.begin(); it != toks.end(); ++it )
    {
        cout << "  Token : '" << it->second << "', Delimiter: '" << it->first << "'\n";
    }
    return 0;

}

继续阅读：algorithm find iterator stl

Selective iterator

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？