boost spirit with alternative operator '|' Fail! when there are two possibles rules to go

2023-04-05 10:10 问答作者：

I am working on a http parser. It found a promblem when I try to p开发者_运维问答arse using alternative operator. it is not about the values in attribute that I can fix them using hold[]. The problem occurs when there are two rules that are similar in the beginning of the rule. here are some simple rules to demonstrate my problem;

qi::rule<string_iterator> some_rule(
        (char_('/') >> *char_("0-9")) /*first rule accept  /123..*/
      | (char_('/') >> *char_("a-z")) /*second rule accept /abc..*/
    );

Then I parse this rule using qi::parse it will fail if the input string likes; "/abcd"

However when I switch the second rule before the first rule. The parser will return true I think the problem is because when the parser consume the input with the first rule and then it finds that the first rule is Fail. It wont return to the second rule which is an alternative of the first rule.

I try to put hold[] to the first rule but it only helps for generating an attribute. It doesn't fix this problem. I have no idea how to fix this problem since HTTP have a lot of rules that they have the beginning of the rules are same as others.

===========more info about my code============================

here is my function for parsing a string

typedef std::string::const_iterator string_iterator;
typedef qi::rule<string_iterator, std::string()> rules_t;
void parse_to_string(const std::string& s, rules_t& r, std::string& result)
{
    using namespace rule;
    using qi::parse;

    std::string::const_iterator iter = s.begin();
    std::string::const_iterator end = s.end();

    bool err = parse(iter, end, r, result);

    if ( err && (iter==end) )
    {
           std::cout << "[correct]" << result << std::endl;
    }
    else
    {
          std::cout << "[incorrect]" << s << std::endl;
          std::cout << "[dead with]" << result << std::endl;
    }
}

In main I have this code;

std::string result;
result = "";
str = "/htmlquery?";
qi::rule<string_iterator, std::string()> rule_wo_question( char_('/') >> *char_("a-z"));
qi::rule<string_iterator, std::string()> rule_w_question( char_('/') >> *char_("a-z") >> char_('?'));
qi::rule<string_iterator, std::string()> whatever_rule( rule_wo_question
                                                        | rule_w_question
                                                       );
parse_to_string(str, whatever_rule, result);

I get this result;

[incorrect]/htmlquery? [dead with]/htmlquery <= you can see it cannot consume '?'

however when I switch the rule like this; (I put "rule_w_question" before "rule_wo_question")

std::string result;
    result = "";
    str = "/htmlquery?";
    qi::rule<string_iterator, std::string()> rule_wo_question( char_('/') >> *char_("a-z"));
    qi::rule<string_iterator, std::string()> rule_w_question( char_('/') >> *char_("a-z") >> char_('?'));
    qi::rule<string_iterator, std::string()> whatever_rule( rule_w_question
                                                            | rule_wo_question
                                                           );
    parse_to_string(str, whatever_rule, result);

The output will be; [correct]/htmlquery?

The first verions (wrong one) seems like the parse consume '/htmlquery' ("rule_wo_question")and then it finds that it cannot consume '?' which make this rule fail. Then this rule cannot go to an alternative rule ("rule_w_question") . Finally the program return "[incorrect]"

The second version I switch the "rule_w_question" before "rule_wo_question". This is the reason why the parser return "[correct]" as a result.

============================================================== my whole code with boost 1.47 linked with pthread and boost_filesystem here is my main .c

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/network/protocol.hpp>
#include <boost/spirit/include/phoenix_stl.hpp>
#include <boost/spirit/include/phoenix_fusion.hpp>
#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_object.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/fusion/include/io.hpp>
#include <boost/bind.hpp>
#include <boost/spirit/include/qi_uint.hpp>

using namespace boost::spirit::qi;
namespace qi = boost::spirit::qi;

typedef std::string::const_iterator string_iterator;
typedef qi::rule<string_iterator, std::string()> rules_t;
void parse_to_string(const std::string& s, rules_t& r, std::string& result)
{
    using qi::parse;

    std::string::const_iterator iter = s.begin();
    std::string::const_iterator end = s.end();

    bool err = parse(iter, end, r, result);

    if ( err && (iter==end) )
    {
           std::cout << "[correct]" << result << std::endl;
    }
    else
    {
          std::cout << "[incorrect]" << s << std::endl;
          std::cout << "[dead with]" << result << std::endl;
    }
}





int main()
{
    std::string str, result;
    result = "";
    str = "/htmlquery?";
    qi::rule<string_iterator, std::string()> rule_wo_question( char_('/') >> *char_("a-z"));
    qi::rule<string_iterator, std::string()> rule_w_question( char_('/') >> *char_("a-z") >> char_('?'));
    qi::rule<string_iterator, std::string()> whatever_rule( rule_wo_question
                                                           | rule_w_question
                                                           );
    parse_to_string(str, whatever_rule, result);
    return 0;
}

the result is

[incorrect]/htmlquery?

[dead with]/htmlquery

Spirit tries given alternatives in the sequence they are specified and stops parsing after it matched the first one. No exhaustive matching is performed. If one alternative matches it stops looking. IOW, the sequence of alternatives is important. You should always list the 'longest' alternatives first.

Any reason why you don't do this instead?

some_rule(
     char_('/')
     >> (
         *char_("0-9")  /\*first rule accept /123..\*/
       | *char_("a-z") /\*second rule accept/abc..\*/
     )
);

Edit: Actually that would match / followed by empty ("0-9" 0 times) and won't bother looking for "a-z", change * to +.

qi::rule<string_iterator> some_rule(
    (char_('/') >> *char_("0-9")) >> qi::eol /*first rule accept  /123..*/
  | (char_('/') >> *char_("a-z")) >> qi::eol /*second rule accept /abc..*/
);

Instead of eol you could use ',' or some other terminator. The problem is that char_('/') >> *char_("0-9")) matches '/' followed by 0 or more numbers. So "/abcd" matches the "/" and then stops parseing. K-ballo's solution is the way I would do this case, but this solution is provided as an alternate in case (for some reason) his is not acceptable.

It's because there is a match for your first rule, and Spirit is greedy.

(char_('/') >> *char_("0-9"))

Feeding "/abcd" into this rule will result in the following logic:

"/abcd" -> Is '/' the next character? Yes. Subrule matches. -> "abcd" remains.
"abcd" -> Are there 0 or more digits? Yes. There are 0 digits. Subrule matches. -> "abcd" remains.
First clause of alternative ('|') statement matches; skip remaining alternative clauses. -> "abcd" remains.
Rule matches with "abcd" remaining. Which probably then doesn't parse and causes your failure.

You might consider changing the '*', which means "0 or more", to a '+', which means "1 or more".

继续阅读：boost-spirit-qi

boost spirit with alternative operator '|' Fail! when there are two possibles rules to go

更多精彩内容

精彩评论

最新问答

看不孕不育哪家医院强？

不孕症医院排名？

怎样增加精子活力和数量？

如果奥运延期，会对在中国举办的世俱杯、亚运会、冬奥会产生影响吗？？

治输卵管堵怎么治？

问答排行榜

Escaping "<" in Perl-generated XML

微信重新建群怎么建？

imessage会显示已读吗？

太快了能不能慢一点好爽~好大~不要拔出来了？

二年级家长回音怎么写大全简短的（二年级家长回音怎么写）？