Finding results from and between groups of parentheses with regexp
Text format:
(Superships)
Eirik Raude - olajkutató fúrósziget
(Eirik Raude - Oil Patch Explorer)
- I need regex to match text beetween first set of parentheses. Results: text1.
- I need regex to match text beetween first set of parentheses and second set of parentheses. Results: text2.
I need regex to match text beetween second set of parentheses. Results: text3.
- text1: Superships, represent english title,
- text2: Eirik Raude - olajkutató fúrósziget, represent hungarian subtitle,
- text3: Eirik Raude - Oil Patch Explorer, represent english subtitle.
I need regex for perl script to match this title and subtitle. Example script:
($anchor) = $tree->look_down(_tag=>"h1", class=>"blackbigtitle");
if ($anchor) {
$elem = $anchor;
my ($engtitle, $engsubtitle, $hunsubtitle @tmp);
while (($elem = $elem->right()) &&
((ref $elem) && ($elem->tag() ne "table"))) {
@tmp = get_all_text($elem);
push @lines, @tmp;
$line = join(' ', @tmp);
if (($engtitle) = $line =~ m/**regex need that return text1**/) {
push @{$prog->{q(title)}}, [$engtitle, 'en'];
t "english-title added: $engtitle";
}
elsif (($engsubtitle) = $line =~ m/**rege开发者_C百科x need that return text3**/) {
push @{$prog->{q(sub-title)}}, [$subtitle, 'en'];
t "english_subtitle added: $engsubtitle";
}
elsif (($hunsubtitle) = $line =~ m/**regex need that return text2**/) {
push @{$prog->{q(hun-subtitle)}}, [$hunsubtitle, 'hu'];
t "hungarinan_subtitle added: $hunsubtitle";
}
}
}
Considering your comment, you can do something like :
if (($english_title) = $line =~ m/^\(([^)]+)\)$/) {
$found_english_title = 1;
# do stuff
} elsif (($english-subtitle) = $line =~ m/^([^()]+)$/) {
# do stuff
} elsif ($found_english_title && ($hungarian-title) = $line =~ m/^\(([^)]+)\)$/) {
# do stuff
}
If you need to match them all in one expression:
\(([^)]+)\)([^(]+)\(([^)]+)\)
This matches (, then anything that's not ), then ), then anything that's not (, then, (, ... I think you get the picture.
First group will be text1, second group will be text2, third group will be text3.
You can also just make a more generix regex that matches something like "(text1)", "(text1)text2(text3)" or "text1(text2)" when applied several times:
(?:^|[()])([^()])(?:[()]|$)
This matches the beginning of the string or ( or ), then characters that are not ( or ), then ( or ) or the end of the string. :? is for non-capturing group, so the first group will have the string. Something more complex is necessary to match ( with ) every time, i.e., it can match "(text1(".
精彩评论