boost spirit with alternative operator '|' Fail! when there are two possibles rules to go
I am working on a http parser. It found a promblem when I try to p开发者_运维问答arse using alternative operator. it is not about the values in attribute that I can fix them using hold[]. The problem occurs when there are two rules that are similar in the beginning of the rule. here are some simple rules to demonstrate my problem;
qi::rule<string_iterator> some_rule(
(char_('/') >> *char_("0-9")) /*first rule accept /123..*/
| (char_('/') >> *char_("a-z")) /*second rule accept /abc..*/
);
Then I parse this rule using qi::parse
it will fail if the input string likes;
"/abcd"
However when I switch the second rule before the first rule. The parser will return true I think the problem is because when the parser consume the input with the first rule and then it finds that the first rule is Fail. It wont return to the second rule which is an alternative of the first rule.
I try to put hold[]
to the first rule but it only helps for generating an attribute. It
doesn't fix this problem. I have no idea how to fix this problem since HTTP have a lot of
rules that they have the beginning of the rules are same as others.
===========more info about my code============================
here is my function for parsing a stringtypedef std::string::const_iterator string_iterator;
typedef qi::rule<string_iterator, std::string()> rules_t;
void parse_to_string(const std::string& s, rules_t& r, std::string& result)
{
using namespace rule;
using qi::parse;
std::string::const_iterator iter = s.begin();
std::string::const_iterator end = s.end();
bool err = parse(iter, end, r, result);
if ( err && (iter==end) )
{
std::cout << "[correct]" << result << std::endl;
}
else
{
std::cout << "[incorrect]" << s << std::endl;
std::cout << "[dead with]" << result << std::endl;
}
}
In main I have this code;
std::string result;
result = "";
str = "/htmlquery?";
qi::rule<string_iterator, std::string()> rule_wo_question( char_('/') >> *char_("a-z"));
qi::rule<string_iterator, std::string()> rule_w_question( char_('/') >> *char_("a-z") >> char_('?'));
qi::rule<string_iterator, std::string()> whatever_rule( rule_wo_question
| rule_w_question
);
parse_to_string(str, whatever_rule, result);
I get this result;
[incorrect]/htmlquery? [dead with]/htmlquery <= you can see it cannot consume '?'
however when I switch the rule like this; (I put "rule_w_question" before "rule_wo_question")
std::string result;
result = "";
str = "/htmlquery?";
qi::rule<string_iterator, std::string()> rule_wo_question( char_('/') >> *char_("a-z"));
qi::rule<string_iterator, std::string()> rule_w_question( char_('/') >> *char_("a-z") >> char_('?'));
qi::rule<string_iterator, std::string()> whatever_rule( rule_w_question
| rule_wo_question
);
parse_to_string(str, whatever_rule, result);
The output will be; [correct]/htmlquery?
The first verions (wrong one) seems like the parse consume '/htmlquery' ("rule_wo_question")and then it finds that it cannot consume '?' which make this rule fail. Then this rule cannot go to an alternative rule ("rule_w_question") . Finally the program return "[incorrect]"
The second version I switch the "rule_w_question" before "rule_wo_question". This is the reason why the parser return "[correct]" as a result.
============================================================== my whole code with boost 1.47 linked with pthread and boost_filesystem here is my main .c
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/network/protocol.hpp>
#include <boost/spirit/include/phoenix_stl.hpp>
#include <boost/spirit/include/phoenix_fusion.hpp>
#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_object.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/fusion/include/io.hpp>
#include <boost/bind.hpp>
#include <boost/spirit/include/qi_uint.hpp>
using namespace boost::spirit::qi;
namespace qi = boost::spirit::qi;
typedef std::string::const_iterator string_iterator;
typedef qi::rule<string_iterator, std::string()> rules_t;
void parse_to_string(const std::string& s, rules_t& r, std::string& result)
{
using qi::parse;
std::string::const_iterator iter = s.begin();
std::string::const_iterator end = s.end();
bool err = parse(iter, end, r, result);
if ( err && (iter==end) )
{
std::cout << "[correct]" << result << std::endl;
}
else
{
std::cout << "[incorrect]" << s << std::endl;
std::cout << "[dead with]" << result << std::endl;
}
}
int main()
{
std::string str, result;
result = "";
str = "/htmlquery?";
qi::rule<string_iterator, std::string()> rule_wo_question( char_('/') >> *char_("a-z"));
qi::rule<string_iterator, std::string()> rule_w_question( char_('/') >> *char_("a-z") >> char_('?'));
qi::rule<string_iterator, std::string()> whatever_rule( rule_wo_question
| rule_w_question
);
parse_to_string(str, whatever_rule, result);
return 0;
}
the result is
[incorrect]/htmlquery?
[dead with]/htmlquery
Spirit tries given alternatives in the sequence they are specified and stops parsing after it matched the first one. No exhaustive matching is performed. If one alternative matches it stops looking. IOW, the sequence of alternatives is important. You should always list the 'longest' alternatives first.
Any reason why you don't do this instead?
some_rule(
char_('/')
>> (
*char_("0-9") /\*first rule accept /123..\*/
| *char_("a-z") /\*second rule accept/abc..\*/
)
);
Edit: Actually that would match /
followed by empty ("0-9" 0 times) and won't bother looking for "a-z", change *
to +
.
qi::rule<string_iterator> some_rule(
(char_('/') >> *char_("0-9")) >> qi::eol /*first rule accept /123..*/
| (char_('/') >> *char_("a-z")) >> qi::eol /*second rule accept /abc..*/
);
Instead of eol
you could use ',' or some other terminator. The problem is that char_('/') >> *char_("0-9"))
matches '/' followed by 0 or more numbers. So "/abcd" matches the "/" and then stops parseing. K-ballo's solution is the way I would do this case, but this solution is provided as an alternate in case (for some reason) his is not acceptable.
It's because there is a match for your first rule, and Spirit is greedy.
(char_('/') >> *char_("0-9"))
Feeding "/abcd" into this rule will result in the following logic:
- "/abcd" -> Is '/' the next character? Yes. Subrule matches. -> "abcd" remains.
- "abcd" -> Are there 0 or more digits? Yes. There are 0 digits. Subrule matches. -> "abcd" remains.
- First clause of alternative ('|') statement matches; skip remaining alternative clauses. -> "abcd" remains.
- Rule matches with "abcd" remaining. Which probably then doesn't parse and causes your failure.
You might consider changing the '*', which means "0 or more", to a '+', which means "1 or more".
精彩评论