开发者

Why doesn't Python "grouping" work for regular expressions in C?

Here is my Python program:

import re

print re.findall( "([se]{2,30})ting", "testingtested" )

Its output is:

['es']

Which is what I expect. I expect to get back "es" because I searched for 2-30 characters of "e" or "s" which are followed by "ting".

Here is my C program:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <regex.h>

int main(void) {

    regex_t preg;
    regmatch_t pmatch;

    char strin开发者_高级运维g[] = "testingtested";

    //Compile the regular expression
    if ( regcomp( &preg, "([se]{2,30})ting", REG_EXTENDED ) ) {
        printf( "ERROR!\n" );
        return -1;
    } else {
        printf( "Compiled\n" );
    }

    //Do the search
    if ( regexec( &preg, string, 1, &pmatch, REG_NOTEOL ) ) {
        printf( "No Match\n" );
    } else {

        //Allocate memory on the stack for this
        char substring[pmatch.rm_eo - pmatch.rm_so + 1];

        //Copy the substring over
        printf( "%d %d\n", pmatch.rm_so, pmatch.rm_eo );
        strncpy( substring, &string[pmatch.rm_so], pmatch.rm_eo - pmatch.rm_so );

        //Make sure there's a null byte
        substring[pmatch.rm_eo - pmatch.rm_so] = 0;

        //Print it out
        printf( "Match\n" );
        printf( "\"%s\"\n", substring );
    }

    //Release the regular expression
    regfree( &preg );

    return EXIT_SUCCESS;
}

It's output is:

Compiled
1 7
Match
"esting"

Why is the C program including the "ting" in the result? And is there a way for me to exclude the "ting" portion?


pmatch is the whole match, not the first parenthesized subexpression.

Try changing pmatch to an array of 2 elements, then passing 2 in place of 1 to regexec and using the [1] element to get the subexpression match.

To others who have cited differences between C and Python and different types of regular expressions, that's all unrelated. This expression is very simple and that's not coming into play.


While regular expressions are "more or less the same everywhere", the exact supported features differ from implementation to implementation.

Unfortunately, you need to consult each regex library's documentation separately when designing your regular expressions.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜