Why doesn't Python "grouping" work for regular expressions in C?
Here is my Python program:
import re
print re.findall( "([se]{2,30})ting", "testingtested" )
Its output is:
['es']
Which is what I expect. I expect to get back "es" because I searched for 2-30 characters of "e" or "s" which are followed by "ting".
Here is my C program:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <regex.h>
int main(void) {
regex_t preg;
regmatch_t pmatch;
char strin开发者_高级运维g[] = "testingtested";
//Compile the regular expression
if ( regcomp( &preg, "([se]{2,30})ting", REG_EXTENDED ) ) {
printf( "ERROR!\n" );
return -1;
} else {
printf( "Compiled\n" );
}
//Do the search
if ( regexec( &preg, string, 1, &pmatch, REG_NOTEOL ) ) {
printf( "No Match\n" );
} else {
//Allocate memory on the stack for this
char substring[pmatch.rm_eo - pmatch.rm_so + 1];
//Copy the substring over
printf( "%d %d\n", pmatch.rm_so, pmatch.rm_eo );
strncpy( substring, &string[pmatch.rm_so], pmatch.rm_eo - pmatch.rm_so );
//Make sure there's a null byte
substring[pmatch.rm_eo - pmatch.rm_so] = 0;
//Print it out
printf( "Match\n" );
printf( "\"%s\"\n", substring );
}
//Release the regular expression
regfree( &preg );
return EXIT_SUCCESS;
}
It's output is:
Compiled
1 7
Match
"esting"
Why is the C program including the "ting" in the result? And is there a way for me to exclude the "ting" portion?
pmatch
is the whole match, not the first parenthesized subexpression.
Try changing pmatch
to an array of 2 elements, then passing 2 in place of 1 to regexec
and using the [1]
element to get the subexpression match.
To others who have cited differences between C and Python and different types of regular expressions, that's all unrelated. This expression is very simple and that's not coming into play.
While regular expressions are "more or less the same everywhere", the exact supported features differ from implementation to implementation.
Unfortunately, you need to consult each regex library's documentation separately when designing your regular expressions.
精彩评论