Tokenize text into type,string pairs
I am looking for a way to tokenize a string and produce a list of tokens and token types. Before I waste my effort I'd like to know if boost can already do what I want.
I want a function with a signature essentially like this:
typedef pair<size_t,string> token;
void tokenize( string input, vector<regex> match, vector<token> & output );
The input
is the textual input to be tokenized. The match
is a list of all the regular expressions that denote tokens. output
will become a list of all the matched tokens along with the index of the matching token from the match
vector.
I know how to use sregex_token_iterator
but I'd like to somehow avoid doing duplicate matching on all the tokens. That is, I can produce a list of tokens, but they lack the type information, and I'd like to get that ty开发者_如何转开发pe information without rematching each token.
For tool chain and integration simplicity I'd prefer to stick with the boost regex library and not use a separate tool (like ANTLR).
The scenario you're describing is exactly the domain of Boost.Spirit.Qi.
精彩评论