What does this Perl regex mean: m/(.?):(.?)$/g?

2023-01-17 00:36 问答作者：

I am editing a Perl file, but I don't understand this regexp comparison. Can someone please explain it to me?

if ($lines =~ m/(.*?):(.*?)$/g) { } ..

What happens here? $lines is a line from a text fil开发者_如何学Goe.

Break it up into parts:

$lines =~ m/ (.*?)      # Match any character (except newlines)
                        # zero or more times, not greedily, and
                        # stick the results in $1.
             :          # Match a colon.
             (.*?)      # Match any character (except newlines)
                        # zero or more times, not greedily, and
                        # stick the results in $2.
             $          # Match the end of the line.
           /gx;

So, this will match strings like ":" (it matches zero characters, then a colon, then zero characters before the end of the line, $1 and $2 are empty strings), or "abc:" ($1 = "abc", $2 is an empty string), or "abc:def:ghi" ($1 = "abc" and $2 = "def:ghi").

And if you pass in a line that doesn't match (it looks like this would be if the string does not contain a colon), then it won't process the code that's within the brackets. But if it does match, then the code within the brackets can use and process the special $1 and $2 variables (at least, until the next regular expression shows up, if there is one within the brackets).

There is a tool to help understand regexes: YAPE::Regex::Explain.

Ignoring the g modifier, which is not needed here:

use strict;
use warnings;
use YAPE::Regex::Explain;

my $re = qr/(.*?):(.*?)$/;
print YAPE::Regex::Explain->new($re)->explain();

__END__

The regular expression:

(?-imsx:(.*?):(.*?)$)

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    .*?                      any character except \n (0 or more times
                             (matching the least amount possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  :                        ':'
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    .*?                      any character except \n (0 or more times
                             (matching the least amount possible))
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

See also perldoc perlre.

It was written by someone who either knows too much about regular expressions or not enough about the $' and $` variables.

THis could have been written as

if ($lines =~ /:/) {
    ... # use $` ($PREMATCH)  instead of $1
    ... # use $' ($POSTMATCH) instead of $2
}

if ( ($var1,$var2) = split /:/, $lines, 2 and defined($var2) ) {
    ... # use $var1, $var2 instead of $1,$2
}

(.*?) captures any characters, but as few of them as possible.

So it looks for patterns like <something>:<somethingelse><end of line>, and if there are multiple : in the string, the first one will be used as the divider between <something> and <somethingelse>.

That line says to perform a regular expression match on $lines with the regex m/(.*?):(.*?)$/g. It will effectively return true if a match can be found in $lines and false if one cannot be found.

An explanation of the =~ operator:

Binary "=~" binds a scalar expression to a pattern match. Certain operations search or modify the string $_ by default. This operator makes that kind of operation work on some other string. The right argument is a search pattern, substitution, or transliteration. The left argument is what is supposed to be searched, substituted, or transliterated instead of the default $_. When used in scalar context, the return value generally indicates the success of the operation.

The regex itself is:

m/    #Perform a "match" operation
(.*?) #Match zero or more repetitions of any characters, but match as few as possible (ungreedy)
:     #Match a literal colon character
(.*?) #Match zero or more repetitions of any characters, but match as few as possible (ungreedy)
$     #Match the end of string
/g    #Perform the regex globally (find all occurrences in $line)

So if $lines matches against that regex, it will go into the conditional portion, otherwise it will be false and will skip it.

继续阅读：perl regex

What does this Perl regex mean: m/(.?):(.?)$/g?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？