Understanding code: Hash, grep for duplicates (modified to check for multiple elements)
Code:
@all_matches = grep
{
! ( $seensentence
{
$_->[0] .'-'. $_->[1] .'-'. $_->[5]
}
++ )
}
@all_matches;
Purpose: This code removes duplicates of certain elements from the array @all_matches
which is an AoA.
My attempt at full breakdown ( with ??..?? around where I'm unsure ):
Grep returns the elements of @all_matches
which return true.
The key of the hash %seensentence
is ??the three elements?? of @all_matches
. Since a hash can only have unique keys, the first time 开发者_StackOverflow社区through it's value is incremented from undef(0) to 1. The next time through, it is a defined value, but the !
means grep returns it only if it's undef (unique value associated with that element).
My Questions:
(1) How can I turn {$_->[0] .'-'. $_->[1] .'-'. $_->[5]}++
into a HoH?
I was told this is another (idiomatic) way to accomplish it. A stab in the dark would be:
( {$_->[0] => 0,
$_->[1] => 0,
$_->[5] => 0} )++
(1b) Because I don't understand how the original is doing what I want it to. I read that -bareword
is equiv to "-bareword"
so I tried: {"$_->[0]" . "$_->[1]". "$_->[5]"}
and it seemed to work the exact same. Still I don't understand: is it treating each element as a key (a) separately (like an array of keys) or is it (b)Correct: all simultaneously (since .
concatenates them all into one string) or is it (c) not doing what I think it is?
(2) What does this mean: $_->[0] || $_->[1] || $_->[5]
? It doesn't do the same as above.
I read that: short circuit logical operators return the last value, so it would check a value at {$_->[0]}
and if there was one, I thought the value there would be incremented, if not it would check the next element until none were true, which is when grep pass the unique value on.
Thanks for your time, I tried to be as thorough as possible (to a fault?) but let me know if there is anything missing.
First lets turn the grep
into a foreach
loop so that we can examine it more clearly. I'm going to expand some of the idioms into larger constructs for clarity's sake.
my @all_matches = ( ... );
{
my %seen;
my @no_dupes;
foreach my $match ( @all_matches ) {
my $first_item = $match->[0];
my $second_item = $match->[1];
my $third_item = $match->[5];
my $key = join '-', $first_item, $second_item, $third_item;
if( not $seen{ $key }++ ) {
push @no_dupes, $match;
}
}
@all_matches = @no_dupes;
}
In other words, the original coder is creating a hash key using the array reference held in $match, for each of the referent indices of $match->[0]
, 1
, and 5
. As hash keys are unique, any duplicates will be dropped by checking if the key already exists before pushing into @no_dupes
.
The grep{}
mechanism is just a more code-efficient (ie, quicker to type, and no throwaway variables) idiom to accomplish the same thing. If it works, why refactor it? What is it not doing that you need to improve upon?
To do the same with a HoH, you could do this:
my @all_matches = ( ... );
{
my %seen;
my @no_dupes;
foreach my $match ( @all_matches ) {
my $first_item = $match->[0];
my $second_item = $match->[1];
my $third_item = $match->[5];
if( not $seen{ $first_item }->{ $second_item }->{ $third_item }++ ) {
push @no_dupes, $match;
}
}
@all_matches = @no_dupes;
}
Which could be translated back into a grep as follows:
my @all_matches = ( ... );
{
my %seen;
@all_matches = grep { not $seen{$_->[0]}->{$_->[1]}{$_->[5]}++ } @all_matches;
}
However, this is a case where I don't see a clear advantage to building a datastructure, unless you intend to use %seen
later for something else.
With respect to the ||
operator, that's a different animal. I can't think of any useful way to employ it in this context. The logical short circuit operator of, say, "$a || $b || $c
" tests the boolean truthfulness of $a
. If it's true, it returns its value. If it's false, it checks $b
the same way. If it's false, it checks $c
the same way. But if $a
is true, $b
never gets checked. If $b
is true, $c
never gets checked.
The key of $seensentence is a simple string. That expression $_->[0] .'-'. $_->[1] .'-'. $_->[5]
constructs a string. Here is an equivalent expression: join '-', $_->[0], $_->[1], $_->[5]
. It appears to assume that elements 0, 1, and 5 are enough to identify duplicates in @all_matches.
Edit
Missed your last question.
$_->[0] || $_->[1] || $_->[5]
returns
$_->[0]
if$_->[0]
is not false (0, empty string, undefined),$_->[1]
if$_->[1]
is not false,$_->[5]
otherwise.
The shortcut operators stop as soon as it makes sense to stop. In the case of ||
, this is as soon as the result is some non-false value. In the case of &&
, this is as soon as the result is false.
精彩评论