开发者

Tokenize text into type,string pairs

I am looking for a way to tokenize a string and produce a list of tokens and token types. Before I waste my effort I'd like to know if boost can already do what I want.

I want a function with a signature essentially like this:

typedef pair<size_t,string> token;
void tokenize( string input, vector<regex> match, vector<token> & output );

The input is the textual input to be tokenized. The match is a list of all the regular expressions that denote tokens. output will become a list of all the matched tokens along with the index of the matching token from the match vector.

I know how to use sregex_token_iterator but I'd like to somehow avoid doing duplicate matching on all the tokens. That is, I can produce a list of tokens, but they lack the type information, and I'd like to get that ty开发者_如何转开发pe information without rematching each token.

For tool chain and integration simplicity I'd prefer to stick with the boost regex library and not use a separate tool (like ANTLR).


The scenario you're describing is exactly the domain of Boost.Spirit.Qi.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜