Regexp in C - match group
I've been struggling with regular expressions in C (just /usr/include/regex.h
).
I have (let's say) hundreds of regexps and one of them can match input string. Currently I'm doing it (generating it actually) like this: hundreds of do-while with match inside, break if not matching 开发者_开发知识库and going to another. One by one:
do {
if ( regex_match(str, my_regex1) != MY_REGEX_SUCCESS ) DO_FAIL; //break
...
if ( sscanf(str, " %d.%d.%d.%d / %d ", &___ip1, &___ip2, &___ip3, &___ip4, &___pref) != 5 ) DO_FAIL; //break
...
} while (0);
do {
if ( regex_match(str, my_regex2) != MY_REGEX_SUCCESS ) DO_FAIL; //break
...
...
} while (0);
do {
if ( regex_match(str, my_regex3) != MY_REGEX_SUCCESS ) DO_FAIL; //break
...
...
} while (0);
What I'd like to have is something like:
const char * match1 = "^([[:space:]]*)([$]([._a-zA-Z0-9-]{0,118})?[._a-zA-Z0-9])([[:space:]]*)$";
const char * match2 = "^([[:space:]]*)(target|origin)([[:space:]]*):([[:space:]]*)([$]([._a-zA-Z0-9-]{0,118})?[._a-zA-Z0-9])([[:space:]]*):([[:space:]]*)\\*([[:space:]]*)$";
const char * match3 = "^([[:space:]]*)(target|origin)([[:space:]]*):([[:space:]]*)([$]([._a-zA-Z0-9-]{0,118})?[._a-zA-Z0-9])([[:space:]]*)/([[:space:]]*)(([0-2]?[0-9])|(3[0-2]))([[:space:]]*):([[:space:]]*)(([1-9][0-9]{0,3})|([1-5][0-9]{4})|(6[0-4][0-9]{3})|(65[0-4][0-9]{2})|(655[0-2][0-9])|(6553[0-5]))([[:space:]]*)$";
char * my_match;
asprintf(&my_match, "(%s)|(%s)|(%s)", match1, match2, match3);
int num_gr = give_me_number_of_regex_group(str, my_match)
switch (num_gr) {
...
}
and don't have an idea how to do that...
Any suggestions?
Thanks!I assume your regex_match
is some combination of regcomp
and regexec
. To enable grouping, you need to call regcomp
with the REG_EXTENDED
flag, but without the REG_NOSUB
flag (in the third argument).
regex_t compiled;
regcomp(&compiled, "(match1)|(match2)|(match3)", REG_EXTENDED);
Then allocate space for the groups. The number of groups is stored in compiled.re_nsub
. Pass this number to regexec
:
size_t ngroups = compiled.re_nsub + 1;
regmatch_t *groups = malloc(ngroups * sizeof(regmatch_t));
regexec(&compiled, str, ngroups, groups, 0);
Now, the first invalid group is the one with a -1 value in both its rm_so
and rm_eo
fields:
size_t nmatched;
for (nmatched = 0; nmatched < ngroups; nmatched++)
if (groups[nmatched].rm_so == (size_t)(-1))
break;
nmatched
is the number of parenthesized subexpressions (groups) matched. Add your own error checking.
You could have them give you a array of strings that contain your regexps and test each one of them.
//count is the number of regexps provided
int give_me_number_of_regex_group(const char *needle,const char** regexps, int count ){
for(int i = 0; i < count; ++i){
if(regex_match(needle, regexp[i])){
return i;
}
}
return -1; //didn't match any
}
or am i overseeing something?
"I have (let's say) hundreds of regexps ..."
It looks like you are trying to comparing the quad parts of ip addresses. In general, in using regular expressions, its usually a red flag when using that many regex's on a single target and stopping after a match.
example: Which group will correctly match first?
target ~'American' , pattern ~ /(Ame)|(Ameri)|(American)/
This does not even include quantifiers in the subgroups.
If its the case of a constant form the regex's are composed of/from, for instance data, it might be better to use C's string functions to split out the data from the form into an array, then compare the array items with the target. C is much faster for this than regex's.
精彩评论