Need help with Perl reg ex?
Here is my text file forms.
S1,F2 title including several white spaces (abbr) single,Here<->There,reply
S1,F2 title including several white spaces (abbr) single,Here<->T开发者_StackOverflow中文版here
S1,F2 title including several white spaces (abbr) single,Here<->There,[reply]
How to change my reg ex to work on all the three forms above?
/^S(\d),F(\d)\s+(.*?)\((.*?)\)\s+(.*?),(.*?)[,](.*?)$/
I tried replace (.*?)$/
with [.*?]$/
. It doesn't work. I guess I shouldn't use []
(square brackets) to match the possible word of [reply]
(including the []
).
Actually, my general question should be how to match the possible characters better in Reg exp using Perl? I looked up the online PerlDoc webpages. But it is hard for me to find out the useful information based on my Perl knowledge level. That's why I also asked some stupid questions.
Appreciated for your comments and suggestions.
What about using negated character classes:
/^S(\d),F(\d)\s+([^()]*?)\s+\(([^()]+)\)\s+([^,]*),([^,]*)(?:,(.*?))?$/
When incorporated into this script:
#!/bin/perl
use strict;
use warnings;
while (<>)
{
chomp;
my($s,$f,$title,$abbr,$single,$here,$reply) =
$_ =~ m/^S(\d),F(\d)\s+([^()]*?)\s+\(([^()]+)\)\s+([^,]*),([^,]*)(?:,(.*?))?$/;
$reply ||= "<no reply>";
print "S$s F$f <$title> ($abbr) $single : $here : $reply\n";
}
And run on the original data file, it produces:
S1 F2 <title including several white spaces> (abbr) single : Here<->There : reply
S1 F2 <title including several white spaces> (abbr) single : Here<->There : <no reply>
S1 F2 <title including several white spaces> (abbr) single : Here<->There : [reply]
You should probably also use the 'xms' suffix to the expression to allow you to document it more easily:
#!/bin/perl
use strict;
use warnings;
while (<>)
{
chomp;
my($s,$f,$title,$abbr,$single,$here,$reply) =
$_ =~ m/^
S(\d) , # S1
F(\d) \s+ # F2
([^()]*?) \s+ # Title
\(([^()]+)\) \s+ # (abbreviation)
([^,]*) , # Single
([^,]*) # Here or There
(?: , (.*?) )? # Optional reply
$
/xms;
$reply ||= "<no reply>";
print "S$s F$f <$title> ($abbr) $single : $here : $reply\n";
}
I confess I'm still apt to write one-line monsters - I'm trying to mend my ways.
You know that brackets in regular expression are reserved for declaring sets of characters that you want to match? So, for a real bracket, you need to escape it, or to enclose it in brackets ([[]
or []]
), isn't that obfuscated?!.
Try (\[.*?\]|.*?)
to indicate that optional brackets.
Try
/^S(\d),F(\d)\s+(.*?)\((.*?)\)\s+(.*?),(.*?)(,(\[reply\]|reply))?$/
This will match the optional (?
) part ,(\[reply\]|reply)
which is either ,[reply]
or ,reply
, i.e.,
- (nothing)
,reply
[,reply]
BTW, your [,]
means "one character of the following: ,
". Exactly the same as a literal ,
within the regex. If you wanted to make your [,](.*?)$
work, you should use (,(.+))?$
to match either nothing or a comma followed by any (non-empty) string.
EDIT
If the following are also valid:
S1,F2 title including several white spaces (abbr) single,Here<->There,[reply
S1,F2 title including several white spaces (abbr) single,Here<->There,reply]
Then you could use (,\[?reply\]?)?
at the end.
You can make the last part optional by using the (?:..)?
as:
^S(\d),F(\d)\s+(.*?)\((.*?)\)\s+(.*?),(.*?)(?:,(.*))?$
Codepad link
精彩评论