开发者

Finding results from and between groups of parentheses with regexp

Text format:

(Superships)    
Eirik Raude - olajkutató fúrósziget
(Eirik Raude - Oil Patch Explorer)
  1. I need regex to match text beetween first set of parentheses. Results: text1.
  2. I need regex to match text beetween first set of parentheses and second set of parentheses. Results: text2.
  3. I need regex to match text beetween second set of parentheses. Results: text3.

    • text1: Superships, represent english title,
    • text2: Eirik Raude - olajkutató fúrósziget, represent hungarian subtitle,
    • text3: Eirik Raude - Oil Patch Explorer, represent english subtitle.

I need regex for perl script to match this title and subtitle. Example script:

($anchor) = $tree->look_down(_tag=>"h1", class=>"blackbigtitle"); 
if ($anchor) { 
    $elem = $anchor;  
    my ($engtitle, $engsubtitle,  $hunsubtitle @tmp); 
    while (($elem = $elem->right()) && 
            ((ref $elem) && ($elem->tag() ne "table"))) { 
        @tmp = get_all_text($elem); 
        push @lines, @tmp; 
        $line = join(' ', @tmp); 
        if (($engtitle) = $line =~ m/**regex need that return text1**/) { 
            push @{$prog->{q(title)}}, [$engtitle, 'en']; 
            t "english-title added: $engtitle"; 
        } 
        elsif (($engsubtitle) = $line =~ m/**rege开发者_C百科x need that return text3**/) { 
            push @{$prog->{q(sub-title)}}, [$subtitle, 'en']; 
            t "english_subtitle added: $engsubtitle"; 
        } 
        elsif (($hunsubtitle) = $line =~ m/**regex need that return text2**/) { 
            push @{$prog->{q(hun-subtitle)}}, [$hunsubtitle, 'hu']; 
            t "hungarinan_subtitle added: $hunsubtitle"; 
        } 
    } 
}


Considering your comment, you can do something like :

if (($english_title) = $line =~ m/^\(([^)]+)\)$/)  {
    $found_english_title = 1;
    # do stuff
} elsif (($english-subtitle) = $line =~ m/^([^()]+)$/) {
    # do stuff
} elsif ($found_english_title && ($hungarian-title) = $line =~ m/^\(([^)]+)\)$/) {
    # do stuff
}


If you need to match them all in one expression:

\(([^)]+)\)([^(]+)\(([^)]+)\)

This matches (, then anything that's not ), then ), then anything that's not (, then, (, ... I think you get the picture.

First group will be text1, second group will be text2, third group will be text3.

You can also just make a more generix regex that matches something like "(text1)", "(text1)text2(text3)" or "text1(text2)" when applied several times:

(?:^|[()])([^()])(?:[()]|$)

This matches the beginning of the string or ( or ), then characters that are not ( or ), then ( or ) or the end of the string. :? is for non-capturing group, so the first group will have the string. Something more complex is necessary to match ( with ) every time, i.e., it can match "(text1(".

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜