Boost spirit is too greedy
I'm in between a deep admiration about boost::spirit and eternal frustration not to understand it ;)
I have problems with strings that are too greedy and therefore it doesn't match. Below a minimal example that doesn't parse as the txt rule eats up end.
More information about what i'd like to do : the goal is to parse some pseudo-SQL and I skip whitespaces. In a statement like
select foo.id, bar.id 开发者_JS百科from foo, baz
I need to treat from
as a special keyword. The rule is something like
"select" >> txt % ',' >> "from" >> txt % ','
but it obviously doesn't work at it sees bar.id from foo
as one item.
#include <boost/spirit/include/qi.hpp>
#include <iostream>
namespace qi = boost::spirit::qi;
int main(int, char**) {
auto txt = +(qi::char_("a-zA-Z_"));
auto rule = qi::lit("Hello") >> txt % ',' >> "end";
std::string str = "HelloFoo,Moo,Bazend";
std::string::iterator begin = str.begin();
if (qi::parse(begin, str.end(), rule))
std::cout << "Match !" << std::endl;
else
std::cout << "No match :'(" << std::endl;
}
Here's my version, with changes marked:
#include <boost/spirit/include/qi.hpp>
#include <iostream>
namespace qi = boost::spirit::qi;
int main(int, char**) {
auto txt = qi::lexeme[+(qi::char_("a-zA-Z_"))]; // CHANGE: avoid eating spaces
auto rule = qi::lit("Hello") >> txt % ',' >> "end";
std::string str = "Hello Foo, Moo, Baz end"; // CHANGE: re-introduce spaces
std::string::iterator begin = str.begin();
if (qi::phrase_parse(begin, str.end(), rule, qi::ascii::space)) { // CHANGE: used phrase_parser with a skipper
std::cout << "Match !" << std::endl << "Remainder (should be empty): '"; // CHANGE: show if we parsed the whole string and not just a prefix
std::copy(begin, str.end(), std::ostream_iterator<char>(std::cout));
std::cout << "'" << std::endl;
}
else {
std::cout << "No match :'(" << std::endl;
}
}
This compiles and runs with GCC 4.4.3 and Boost 1.4something; output:
Match !
Remainder (should be empty): ''
By using lexeme
, you can avoid eating spaces conditionally, so that txt
matches up to a word boundary only. This yields the desired result: because "Baz"
is not followed by a comma, and txt
doesn't eat spaces, we never accidentally consume "end"
.
Anyway, I'm not 100% sure this is what you're looking for -- in particular, is str
missing spaces as an illustrative example, or are you somehow forced to use this (spaceless) format?
Side note: if you want to make sure that you've parsed the entire string, add a check to see if begin == str.end()
. As stated, your code will report a match even if only a non-empty prefix of str
was parsed.
Update: Added suffix printing.
精彩评论