开发者

C++::Boost::Regex Iterate over the submatches

I am using Named Capture Groups with Boost Regex / Xpressive.

I would like to iterate over all submatches, and get both the value and KEY of each submatch (i.e. what["开发者_高级运维type"]).

sregex pattern = sregex::compile(  "(?P<type>href|src)=\"(?P<url>[^\"]+)\""    );

sregex_iterator cur( web_buffer.begin(), web_buffer.end(), pattern );
sregex_iterator end;

for( ; cur != end; ++cur ){
    smatch const &what = *cur;

    //I know how to access using a string key: what["type"]
    std::cout << what[0] << " [" << what["type"] << "] [" << what["url"] <<"]"<< std::endl;

    /*I know how to iterate, using an integer key, but I would
      like to also get the original KEY into a variable, i.e.
      in case of what[1], get both the value AND "type"
    */
    for(i=0; i<what.size(); i++){
        std::cout << "{} = [" << what[i] << "]" << std::endl;
    }

    std::cout << std::endl;
}


With Boost 1.54.0 this is even more difficult because the capture names are not even stored in the results. Instead, Boost just hashes the capture names and stores the hash (an int) and the associated pointers to the original string.

I've written a small class derived from boost::smatch that saves capture names and provides an iterator for them.

class namesaving_smatch : public smatch
{
public:
    namesaving_smatch(const regex& pattern)
    {
        std::string pattern_str = pattern.str();
        regex capture_pattern("\\?P?<(\\w+)>");
        auto words_begin = sregex_iterator(pattern_str.begin(), pattern_str.end(), capture_pattern);
        auto words_end = sregex_iterator();

        for (sregex_iterator i = words_begin; i != words_end; i++)
        {
            std::string name = (*i)[1].str();
            m_names.push_back(name);
        }
    }

    ~namesaving_smatch() { }

    std::vector<std::string>::const_iterator names_begin() const
    {
        return m_names.begin();
    }

    std::vector<std::string>::const_iterator names_end() const
    {
        return m_names.end();
    }

private:
    std::vector<std::string> m_names;
};

The class accepts the regular expression containing the named capture groups in its constructor. Use the class like so:

namesaving_smatch results(re);
if (regex_search(input, results, re))
    for (auto it = results.names_begin(); it != results.names_end(); ++it)
        cout << *it << ": " << results[*it].str();


After looking at this for more than an hour, I feel fairly safe saying, "it can't be done captain". Even in the boost code, they iterate over the private named_marks_ vector when doing the lookup. It is just not setup to allow that. I'd say the best bet would be to iterate over the ones you think should be there and catch the exception for those that aren't found.

const_reference at_(char_type const *name) const
{
    for(std::size_t i = 0; i < this->named_marks_.size(); ++i)
    {
        if(this->named_marks_[i].name_ == name)
        {
            return this->sub_matches_[ this->named_marks_[i].mark_nbr_ ];
        }
    }
    BOOST_THROW_EXCEPTION(
        regex_error(regex_constants::error_badmark, "invalid named back-reference")
    );
    // Should never execute, but if it does, this returns
    // a "null" sub_match.
    return this->sub_matches_[this->sub_matches_.size()];
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜