开发者

perl regex style, using m!$regex! versus

source string:

  1. Mandarin Chinese (1.1 billion)
  2. Hindi/Urdu (350 million)
  3. Spanish (330 million)
  4. English (300 million)
  5. Arabic (200 million)

Trying to extract just the language name.

I have this code currently that works

 if($line =~ m!\s(.*)\(!)
    {
      print $1 . "\n" ;
    }

But I am trying to use the quotemeta function to do it which I cant seem to be able to do.

  my开发者_如何学C $regex = quotemeta( "\s(.*)\(" );
#Also tried as i suspect the \s is my problem.
  my $regex = quotemeta( "\\s(.*)\(" );


  if($line =~ m/$regex/)
    {
      print $1 . "\n" ;
    }

Is any style preferred over the other one?


I don't understand what you're trying to use quotemeta for.

If you've got actual regex language, then you don't want to quote it.

my $regex = qr/\s(.*)\(/;
if ($line =~ /$regex/) ...

If you do want to quote it (you want to exactly match the string \s(..*)\(), you don't need quotemeta explicitly, but this is effectively what you're doing now.

my $str = '\s(.*)\(';
if ($line =~ /\Q$str\E/) ...


I agree with @ephemient that quotemeta isn't needed here.

I would use the /x modifier to make the regexp more readable :

  if($line =~ m/ \s (.*) \( /x )

and taking it step further

if($line =~ m/ 
    \s      # space
    (.*)    # capture  anything 
    \(      # up to and not including a (
  /x ) 

A refinement. Currently you are capturing the space after English. I would add

if($line =~ m/ 
    \s      # space
    (.*)    # capture  anythin 
    \s+    # up to and not including space
    \(      #  a (
  /x ) 

Finally - look and see what your program does if you give it

1. English (GB) (300 million) 
2. Arabic (200 million (2005 value))

One works, the other doesnt. It might be worht understanding why!

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜