Assign RegEx submatches to variables or map (C++/C)
I need to extract the same type of information (e.g. First name, Last Name, Telephone, ...), from numerous different text sources, each with a different format & different order of the variables of interest.
I want a function that does the extraction based on a regular expression and returns the result as descriptive variables. In other words, instead of returning each match result as submatch[0], submatch[1], submatch[2], ...,
have it do either of the following:
return
std::map
so that the submatches can be accessed via:submatch["first_name"], submatch["last_name"], submatch["telephone"]
return a variables with the submatches so that the submatches can be accessed via:
开发者_Go百科submatch_first_name, submatch_last_name, submatch_telephone
I can write a wrapper class around boost::regex
to do the first one, but I was hoping there would be a built-in or a more elegant way to do this in C++/Boost/STL/C.
You can always use enumerations or integral constants to get named indices, e.g.:
enum NamedIndices {
FirstName = 0,
LastName = 1,
// ...
};
// ...
std::string first = submatch[FirstName];
std::string last = submatch[LastName ];
Boost::Regex also has built-in support for named capture groups via the following syntax
(?<name>.*)
with the regular expression added after the name e.g.
const boost::wregex dateTimeRegex(L"(?<year>[[:d:]]{4})\\.(?<month>[[:d:]]{2})\\.(?<day>[[:d:]]{2}) (?<hr>[[:d:]]{2}):(?<min>[[:d:]]{2}):(?<sec>[[:d:]]{2})");
boost::wsmatch result;
if(boost::regex_match(currentLine, result, dateTimeRegex))
{
if(result[0].matched)
{
const int year = boost::lexical_cast<int>(result.str(L"year"));
const int month = boost::lexical_cast<int>(result.str(L"month"));
//...
}
}
Can you use "named capture groups"? It seems like returning a map is exactly what you want.
For example, in RE2
Check wikipedia see if your favorite regex library supports named captures.
精彩评论