开发者

Need help with Perl reg ex?

Here is my text file forms.

S1,F2  title including several white spaces  (abbr) single,Here<->There,reply
S1,F2  title including several white spaces  (abbr) single,Here<->T开发者_StackOverflow中文版here
S1,F2  title including several white spaces  (abbr) single,Here<->There,[reply]

How to change my reg ex to work on all the three forms above?

/^S(\d),F(\d)\s+(.*?)\((.*?)\)\s+(.*?),(.*?)[,](.*?)$/

I tried replace (.*?)$/ with [.*?]$/. It doesn't work. I guess I shouldn't use [](square brackets) to match the possible word of [reply](including the []).

Actually, my general question should be how to match the possible characters better in Reg exp using Perl? I looked up the online PerlDoc webpages. But it is hard for me to find out the useful information based on my Perl knowledge level. That's why I also asked some stupid questions.

Appreciated for your comments and suggestions.


What about using negated character classes:

 /^S(\d),F(\d)\s+([^()]*?)\s+\(([^()]+)\)\s+([^,]*),([^,]*)(?:,(.*?))?$/

When incorporated into this script:

#!/bin/perl
use strict;
use warnings;
while (<>)
{
    chomp;
    my($s,$f,$title,$abbr,$single,$here,$reply) =
        $_ =~ m/^S(\d),F(\d)\s+([^()]*?)\s+\(([^()]+)\)\s+([^,]*),([^,]*)(?:,(.*?))?$/;
    $reply ||= "<no reply>";
    print "S$s F$f <$title> ($abbr) $single : $here : $reply\n";
}

And run on the original data file, it produces:

S1 F2 <title including several white spaces> (abbr) single : Here<->There : reply
S1 F2 <title including several white spaces> (abbr) single : Here<->There : <no reply>
S1 F2 <title including several white spaces> (abbr) single : Here<->There : [reply]

You should probably also use the 'xms' suffix to the expression to allow you to document it more easily:

#!/bin/perl
use strict;
use warnings;

while (<>)
{
    chomp;

    my($s,$f,$title,$abbr,$single,$here,$reply) =
        $_ =~ m/^
                S(\d) ,             # S1
                F(\d) \s+           # F2
                ([^()]*?) \s+       # Title
                \(([^()]+)\) \s+    # (abbreviation)
                ([^,]*) ,           # Single
                ([^,]*)             # Here or There
                (?: , (.*?) )?      # Optional reply
                $
               /xms;

    $reply ||= "<no reply>";
    print "S$s F$f <$title> ($abbr) $single : $here : $reply\n";
}

I confess I'm still apt to write one-line monsters - I'm trying to mend my ways.


You know that brackets in regular expression are reserved for declaring sets of characters that you want to match? So, for a real bracket, you need to escape it, or to enclose it in brackets ([[] or []]), isn't that obfuscated?!.

Try (\[.*?\]|.*?) to indicate that optional brackets.


Try

/^S(\d),F(\d)\s+(.*?)\((.*?)\)\s+(.*?),(.*?)(,(\[reply\]|reply))?$/

This will match the optional (?) part ,(\[reply\]|reply) which is either ,[reply] or ,reply, i.e.,

  • (nothing)
  • ,reply
  • [,reply]

BTW, your [,] means "one character of the following: ,". Exactly the same as a literal , within the regex. If you wanted to make your [,](.*?)$ work, you should use (,(.+))?$ to match either nothing or a comma followed by any (non-empty) string.


EDIT

If the following are also valid:

S1,F2  title including several white spaces  (abbr) single,Here<->There,[reply
S1,F2  title including several white spaces  (abbr) single,Here<->There,reply]

Then you could use (,\[?reply\]?)? at the end.


You can make the last part optional by using the (?:..)? as:

^S(\d),F(\d)\s+(.*?)\((.*?)\)\s+(.*?),(.*?)(?:,(.*))?$

Codepad link

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜