Is this the way to go about building Perl subroutines?
So I've had a simple ucwords function for Perl which I've had a while, and wanted to expand it, this is wh开发者_JAVA技巧at I've come up with, is this the way I should be building my functions to handle optional parameters?
Original:
sub ucwords{
$str = @_[0];
$str = lc($str);
$str =~ s/\b(\w)/\u$1/g;
return $str;
}
Extended:
sub ucwords{
if(@_[0] ne undef){#make sure some argument was passed
@overloads = (0,1,2,3);
$str = @_[0];
if(@_[1] eq undef || @_[1] eq 0){ #default is to lowercase all but first
$str = lc($str);
$str =~ s/\b(\w)/\u$1/g;
return $str;
}else{ #second parameters
if(!grep $_ eq @_[1], @overloads){ die("No overload method of ucwords() takes ".@_[1]." as second parameter."); }
if(@_[1] eq 1){ $str =~ s/\b(\w)/\u$1/g;} #first letter to upper, remaining case maintained
if(@_[1] eq 2){ $str = lc($str); $str =~ s/(\w)\b/\u$1/g;} #last letter to upper, remaining to lower
if(@_[1] eq 3){ $str =~ s/(\w)\b/\u$1/g;} #last letter to upper, remaining case maintained
return $str;
}
}else{
die("No overload method of ucwords() takes no arguments");
}
}
Psy
In one word: NO!
Let's look at:
sub ucwords{
$str = @_[0];
$str = lc($str);
$str =~ s/\b(\w)/\u$1/g;
return $str;
}
First and foremost, you are not using strict
. Use it. It is for your own good.
Second, you are not using warnings
. Use it. It is for your own good. For example, the first element of @_
should be referred to using $_[0]
and not @_[0]
.
Third, you should get in the habit of reading the FAQ list occasionally before re-inventing the wheel again: See How do I capitalize all the words on one line?
If you think this is harsh, consider the fact that when called as:
print ucwords("FRED AND BARNEY'S LODGE"), "\n";
your code outputs
Fred And Barney'S Lodge
which is the example given in that question.
Further, having a function that does more than one thing, chooses what it does on the basis of mystery numbers and does none of those things right is not a good design strategy.
You should instead have multiple functions, named in a way that can be understood by a casual reader of your code, each of which does only one thing and does that right.
Finally, the extended version of your function (without saying anything about the wisdom of writing such a function) can be better written as:
# untested code follows
use Carp;
{
my %modes = map {$_ => undef} 0 .. 3;
sub ucwords{
croak 'No arguments passed' unless @_;
my ($str, $mode) = @_;
$mode = 0 unless defined $mode;
croak "Invalid mode: '$mode'" unless exists $modes{$mode};
if ( $mode == 0 ) {
$str = lc($str);
$str =~ s/\b(\w)/\u$1/g;
}
elsif ( $mode == 1 ) {
$str =~ s/\b(\w)/\u$1/g;
}
elsif ( $mode == 2 ) {
$str = lc($str);
$str =~ s/(\w)\b/\u$1/g;
}
else {
$str =~ s/(\w)\b/\u$1/g;
}
return $str;
}
}
See also Why use if-else if in C++?
Don't use the $foo ne undef
construct. Operators in Perl are what's known as "context sensitive". By using certain operators, you introduce certain contexts. ne
, eq
, lt
, gt
, le
, ge
are all "string" operators, treating the scalars on either side as strings, whereas ==
, !=
, <
, >
, <=
, >=
are numeric operators, treating the object on either side as a number.
However, if you're testing for undef, it really doesn't make sense that something undefined is a number or a string, so they have an operator just for that kind of test: defined
You can test if something is defined simply by doing
if (defined $foo) {
# my cool logic on $foo here
}
@_
Unpacking
Generally, you always want to unpack @_ before you do any other processing in your subroutine. This makes it much, much clearer to users, other maintainers and yourself in the future about how to use your sub. By using @_
directly, it's very hard to figure out what needs to be passed, just from the arguments given. They don't have any meaningful names, making it even harder to determine their purpose, and you have magic constants everywhere - normally a bad thing overall!
Your best bet is to get the variables into meaningfully named scalars right away, before you do anything else.
For one argument subroutines, a common solution is to use shift
. This pulls the first element of an array off, and returns it (kind of like the opposite of pop
.) If not given an array, and you're in a subroutine, it pulls it from the @_ array. So, you can do
sub mysub {
my $foo = shift;
}
for any one argument subroutine.
However, what if you have more? List context assignment, to the rescue! It's possible to assign many variables at once, using a list assignment. You can do
sub myothersub {
my ($foo, $bar, $baz) = @_;
}
And $foo
, $bar
, and $baz
will be assigned the value in the 0, 1, and 2 indexes of @_, respectively. Well, what happens if there isn't anything in the 0, 1 or 2 index? They still get assigned - they become undef
! Then you can check for undef as mentioned elsewhere in this question.
This may just be my opinion, and your coding style is totally up to you, but personally I find a lot of value in assigning the arguments to variables right off the bat, and rather than wrapping the "business" portion of your subroutine in an if block, I'd have the function croak before that. For example:
use Carp;
sub ucwords {
my $str = shift;
defined($str)
or croak 'No overload method of ucwords() takes no arguments';
#the rest of your logic
}
Perl's switch statement: given/when
Perl, as of 5.10 and above, has a fantastic switch statement built in, called [given
]. This is roughly equivalent to the switch
statement in C, but much more versatile. To enable this feature, you need to add a line at the top of your script:
use 5.010;
This enables all the perl 5.10 features, including switch (and say
, which works like print
but automatically adds a "\n" at the end.) You can use it like this:
my $foo = get_foo();
my $nothing = 0;
given($foo) {
when (undef) { say "got an undefined value!"; }
when ([1,3,5,6,8]) { say "was 1, 3, 5, 6, or 8"; }
when (/^abc/) { say "was a string starting with abc"; }
when ($_ == 4) { say "It was 4!!!"; }
when ($_ > 100) { say "Greater than 100"; }
default { $nothing = 1; }
}
The variable passed to given automatically gets put into $_
inside the given code, allowing you to compare against it. Then, the when
construct does a smart match against $_.
So, in your case, it would look like this (fixing the @[] to $[] issue):
given ($_[1]) {
when (1) { $str =~ s/\b(\w)/\u$1/g }
when (2) { $str = lc($str); $str =~ s/(\w)\b/\u$1/g }
when (3) { $str =~ s/(\w)\b/\u$1/g; }
default { croak "No overloaded method of ucwords() takes '$_'." }
}
May be you will find Params::Validate usefull. It can be used to validate params by various rules. Here is how it may look in your case:
## somewhere is the start of the module
use Params::Validate qw(:all);
sub ucwords {
## this line helps to undestand which parameter should be passed to function
my ($string, $algorithm_id) = @_;
## make sure that 2 scalar parameters passed
validate_pos(@_, {'type' => SCALAR}, {'type' => SCALAR});
## main code here
}
die
die
, like other perl builtins, does not need, and generally should not have parentheses. However, die
has a big brother that most people use these days, called
croak
Do:
use Carp;
and then
croak "My error here!";
Croak works just like die, but generally adds more useful information to the error message than die
does, such as the line that the error occurred on relative to the caller.
Array Indexing
Array access, like other things in Perl, is context sensitive. Think of the sigil that gets attached to the name as a 'reminder' to you about what you're trying to access or use at the moment. Whenever you see the $
, that means you're trying to get a single scalar value. Whenever you see a @
, that means you're accessing a list, and %
of course means a key/value hash pair. So, when you access your array like this:
@_[1]
You're asking for a list, containing a single element. This feature lets you get multiple values from an array at once, but when only accessing one value, it cause problems in some contexts, such as assignment. So, when accessing a single array element, you want to always use the scalar context:
$_[1]
Something that's been hinted at but not directly addressed in other answers is the use of numeric modes, a convention alien to Perl held over from C. Quick, without looking at the code what does mode #3 do? Hell, looking at the code what does mode #3 do?
Perl has efficient and easy to use strings. Use them. Give your modes names that have something to do with what its doing. Something like... first, last, recase_first, recase_last. They don't have to be totally descriptive, lower_case_then_uc_last_letter would be too long to type, but enough give something for the human brain to hook onto and associate.
But really these are four subroutines. Mode flags are red flags, especially when most of your code winds up inside an if/else statement.
I strongly dislike overly-smart functions. An overly-smart function is one whose behavior is totally changed by its parameters. Look at yours, they almost don't share any code except parameter handling. Anyway if I would do some similar this I would write something like this:
use Carp;
{
my %ucwords = (
0 => sub {
my $str = lc( shift() );
$str =~ s/\b(\w)/\u$1/g;
return $str;
},
1 => sub {
my $str = shift;
$str =~ s/\b(\w)/\u$1/g;
return $str;
},
2 => sub {
$str = lc( shift() );
$str =~ s/(\w)\b/\u$1/g;
return $str;
},
3 => sub {
my $str = shift;
$str =~ s/(\w)\b/\u$1/g;
return $str;
}
);
sub ucwords {
my ( $str, $mode ) = @_;
croak "No overload method of ucwords() takes no arguments"
unless defined $str;
$mode = 0 unless defined $mode;
my $code = $ucwords{$mode};
croak "Invalid mode: '$mode'" unless defined $code;
goto \&$code;
}
}
精彩评论