Trying to understand this perl script
It seems very simple and I figured most of it out. But seeing as perl is loose with syntax, it's difficult for a new comer to jump right in :)
my @unique = ();
my %seen = ();
foreach my $elem ( @array ) {
next if $seen{ $elem }++;
push @unique, $elem;
}
This is right from the perldoc website. If I understand correctly, it can also be written as:
my @unique = ();
my %seen = ();
my $elem;
foreach $elem ( @array ) {
if ( $seen{ $elem }++ ) {
next;
}
push ( @unique, $elem );
}
So my understanding at this point is:
- Declare an array named unique
- Declare a hash named seen
- Declare a variable named elem
- Iterate over @array, each iteration is stored in $elem
- If $elem is a key in the hash %seen (I have no idea what the
++
does), skip to th开发者_如何学Goe next iteration - Append $elem to the end of @unique
I'm missing 2 things:
- When does anything get stored in %seen?
- What does ++ do (in every other language it increments, but I dont see how that works)
I know that the issue lies with this part:
$seen{ $elem }++
which I suspect is doing a bunch of different stuff at once. Is there a simpler more verbose way of writing that line?
Thanks for the help
The ++
operator does essentially the same thing in Perl as it does in most other languages that have it: it increments a variable.
$seen{ $elem }++;
increments a value in the %seen
has, namely the one whose key is $elem
.
The "magic" is that if $seen{$elem}
hasn't been defined yet, it's automatically created, as if it already existed and had the value 0; the ++
then sets it to 1. So it's equivalent to:
if (! exists $seen{$elem}) {
$seen{$elem} = 0;
}
$seen{$elem} ++;
This is called "autovivification". (No, really, that's what it's called.) (EDIT2: No, my mistake, it's not; as @ysth points out, "autovification" actually refers to references springing into existence. See perldoc perlref
.)
EDIT: Here's a revised version of your description:
- Declare an array variable named @unique
- Declare a hash variable named %seen
- Declare a scalar variable named $elem
- Iterate over @array, each iteration is stored in $elem
- If $elem is a key in the hash %seen, skip to the next iteration
- Append the value of $elem to the end of @unique
@unique
, %seen
, and $elem
are all variables. The punctuation character (known as the "sigil" indicates what kind of variable each of them is, and is best thought of as part of the name.
This is a common pattern in Perl for creating an array consisting of the "unique" elements in the given array.
In Perl, a hash stores a value associated with any given key. If you haven't put anything in the hash for a given key, though, you get undef
-- but in a numeric context, such as when you're doing an increment operation, undef
is treated like 0
and then incremented.
The if
statement checks for true or false values, as you know. In Perl, 0
, "0"
, ''
(empty string), and undef
(and possibly others?) are treated as false values.
Post-increment, like in C/C++/Java, returns the original value to the containing expression. So this code
if ( $seen{ $elem }++ ) {
next;
}
would return false (0
) for an element that hasn't been seen yet, and the loop will continue (i.e., the next
statement will not be run). The element will be put in the array. Before that happens, though, the increment occurs-- now suddenly 1
is stored in the hash, meaning the value has been seen once. Next time that value is seen, the loop will be skipped and the value will not be added to the result array again.
When does anything get stored in %seen?
When it tries to increment it.
What does ++ do (in every other language it increments, but I dont see how that works)
Incrementing an undefined variable makes it 1
It does the same as this:
my @unique = ();
my %seen = ();
my $elem;
foreach $elem ( @array ) {
if ( ! $seen{ $elem } ) {
$seen{ $elem } = 1;
} else {
$seen{ $elem }++;
push ( @unique, $elem );
}
}
If there is no element in %seen with the key $elem then this line will create a new element (a new entry in the hashtable) with the key $elem. The ++ after it is an increment operator. It adds one to whatever is the value of $seen{$elem}. Since the initial value of $seen{$elem} evaluates to false, or zero in numerical context, this increases the value of $seen{elem} by one. Since the ++ is on the right side of $seen{$elem}, it's only added after $seen{elem} is evaluated. Thus, the first time it encounters any particular $elem this test will fail and it will go to the next step, putting $elem in the list (array) of unique elements.
精彩评论