Unexpected behavior of each
%h = (a => 1, b => 2);
keys %h;
while(my($k, $v开发者_如何学JAVA) = each %h)
{
$h{uc $k} = $h{$k} * 2; # BAD IDEA!
}
The output is :
(a => 1, A => 2, b => 2, B => 8)
instead of
(a => 1, A => 2, b => 2, B => 4)
Why?
From perldoc -f each
If you add or delete a hash's elements while iterating over it, entries may be skipped or duplicated--so don't do that. Exception: It is always safe to delete the item most recently returned by
each()
.
The loop is changing %h
on the fly, so it interprets twice the value of b
(first b
, then B
). The semantics of each
work by removing a pair from the hash, and then returning it, but you're adding it afterwards within the loop, so it may get processed later. You should get the keys first, and then loop that to get the values. For example:
my @keys = keys %h;
foreach (@keys)
{
$h{uc $_} = $h{$_} * 2;
delete $h{$_};
}
As Chas. Owens above pointed, as each
removes the element, you have to remove them too.
Another cute thing you can do is use map to create a new hash:
my %result = map {uc $_ => $h{$_} * 2} (keys %h);
and then use the hash %result
.
Because each
doesn't let you modify items in place like a for
loop does. each
just returns the next key and value for the hash. You are creating new values in the hash when you say $h{uc $k} = $h{$k} * 2;
. To get the behavior you desire, I would probably say
for my $k (keys %h) {
$h{uc $k} = $h{$k};
delete $h{$k};
}
If the hash is huge and you are worried about storing all of the keys in memory (which is the main use of each
), then you would be better off saying:
my %new_hash;
while (my ($k, $v) = each %h) {
$new_hash{uc $k} = $v;
delete $h{$k};
}
and then using %new_hash
instead of %h
.
As to why some keys get processed more than once, and other don't, first we must look to the documentation for each
:
If you add or delete a hash's elements while iterating over it, entries may be skipped or duplicated--so don't do that.
That is fine, it tells us what to expect, but not why. To see why we must create a model of what is happening. When you assign a value to a hash, the key is turned into a number by a hash function. This number is then used to index into an array (at the C level, not the Perl level). For our purposes we can get away with a very simplistic model:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my %hash_function = (
a => 2,
b => 1,
A => 0,
B => 3
);
my @hash_table;
{
my $position = 0;
sub my_each {
#return nothing if there is nothing
return unless @hash_table;
#get the key and value from the next positon in the
#hash table, skipping empty positions
until (defined $hash_table[$position]) {
$position++;
#return nothing if there is nothing left in the array
return if $position > $#hash_table;
}
my ($k, $v) = %{$hash_table[$position]};
#set up for the next call
$position++;
#if in list context, return both key an value
#if in scalar context, return the key
return wantarray ? ($k, $v) : $k;
}
}
$hash_table[$hash_function{a}] = { a => 1 }; # $h{a} = 1;
$hash_table[$hash_function{b}] = { b => 2 }; # $h{b} = 2;
while (my ($k, $v) = my_each) {
# $h{$k} = $v * 2;
$hash_table[$hash_function{uc $k}] = { uc $k => $v * 2 };
}
print Dumper \@hash_table;
For this example, we can see that when the key "A"
gets added to the hash table, it is put before the other keys, so it doesn't get processed a second time, but the key "B"
does get placed after the other keys, so it the my_each
function sees it on the first pass (as the item following the key "a"
).
This works for me
%h = (a => 1, b => 2);
keys %h;
for my $k (keys %h ) {
$h{uc $k} = $h{$k} * 2;
}
while ( ($k,$v) = each %h ) {
print "$k => $v\n";
}
Output:
A => 2
a => 1
b => 2
B => 4
Adding a warn $k;
to your loop might make things a bit more clear - I get the same result as you do, and it is because the keys it ends up using are 'a', 'b' and then 'B', so:
#round 1 ($k='a'):
$h{uc 'a'} = 1 * 2;
# $h{A} = 2;
#round 2: ($k='b'):
$h{uc 'b'} = 2 * 2;
# $h{B} = 4;
#round 3: ($k='B'):
$h{uc 'B'} = 4 * 2;
# $h{B} = 8;
Why is it running the loop with the key 'B' but not 'A'? This is because the each
call is being run every time it goes through the loop (so it is working with the new version of the hash), but it is remembering the last value it was working with, so in this case, when 'A' is added to the hash, it is assigned a position before 'a', so it never gets seen.
精彩评论