Array of hashes
In perl , i have an array of hashes like
0 HASH(0x98335e0)
'title' => 1177
'author' => 'ABC'
'quantity' => '-100'
1 HASH(0x832a9f0)
'title' => 1177
'a开发者_如何学运维uthor' => 'ABC'
'quantity' => '100'
2 HASH(0x98335e0)
'title' => 1127
'author' => 'DEF'
'quantity' => '5100'
3 HASH(0x832a9f0)
'title' => 1277
'author' => 'XYZ'
'quantity' => '1030'
Now I need to accumulate the quantity where title and author are same. In the above structure for hash with title = 1177 and author ='ABC' quantity can be accumulated into one and the entire structure should looks like below
0 HASH(0x98335e0)
'title' => 1177
'author' => 'ABC'
'quantity' => 0
1 HASH(0x98335e0)
'title' => 1127
'author' => 'DEF'
'quantity' => '5100'
2 HASH(0x832a9f0)
'title' => 1277
'author' => 'XYZ'
'quantity' => '1030'
What is the best way i can do this accumulation so that it is optimised? Number of array elements can be very large. I dont mind adding an extra key to the hash to aid the same , but i dont want n lookups . Kindly advise
my %sum;
for (@a) {
$sum{ $_->{author} }{ $_->{title} } += $_->{quantity};
}
my @accumulated;
foreach my $author (keys %sum) {
foreach my $title (keys %{ $sum{$author} }) {
push @accumulated => { title => $title,
author => $author,
quantity => $sum{$author}{$title},
};
}
}
Not sure whether map
makes it look nicer:
my @accumulated =
map {
my $author = $_;
map { author => $author,
title => $_,
quantity => $sum{$author}{$_},
},
keys %{ $sum{$author} };
}
keys %sum;
If you don't want N lookups, then you need a hash function -- however you need to store them with that hash function. By the time you have them in a list (or array), it's too late. You either get lucky, all the time, or you're going to have N lookups.
Or insert them into the hash abovebelow. A hybrid solution is to store a locator as item 0 in the list/array.
my $lot = get_lot_from_whatever();
my $tot = $list[0]{ $lot->{author} }{ $lot->{title} };
if ( $tot ) {
$tot->{quantity} += $lot->{quantity};
}
else {
push @list, $list[0]{ $lot->{author} }{ $lot->{title} } = $lot;
}
previous
First of all we'll reformat that to make it readable.
[ { title => 1177, author => 'ABC', quantity => '-100' }
, { title => 1177, author => 'ABC', quantity => '100' }
, { title => 1127, author => 'DEF', quantity => '5100' }
, { title => 1277, author => 'XYZ', quantity => '1030' }
]
Next, you need to break down the problem. You want quantities of things grouped by author and title. So you need those things to uniquely identify those lots. To repeat, you want a combination of names to identify entities. Thus, you will need a hash that identifies things by names.
Since we have two things, a double hash is a good way to do it.
my %hash;
foreach my $lot ( @list ) {
$hash{ $lot->{author} }{ $lot->{title} } += $lot->{quantity};
}
# consolidated by hash
To turn this back into a list, we need to unbundle the levels.
my @consol
= sort { $a->{author} cmp $b->{author} || $a->{title} cmp $b->{title} }
map {
my ( $a, $titles ) = @$_; # $_ is [ $a, {...} ]
map { +{ title => $_, author => $a, quantity => $titles->{$_} }
keys %$titles;
}
map { [ $_ => $hash{$_} ] } # group and freeze a pair
keys %hash
;
# consolidated in a list.
And there you have it back, I even sorted it for you. Of course you could also sort this by--publishers being what they are--descending quantities.
sort { $b->{quantity} <=> $a->{quantity}
|| $a->{author} cmp $b->{author}
|| $a->{title} cmp $b->{title}
}
I think it is important to step back and consider the source of the data. If the data are coming from a database, then you should write the SQL query so that it gives you one row for each author/title combination with the total quantity in the quantity field. If you are reading the data from a file, then you should either read it directly into a hash or use Tie::IxHash if order is important.
Once you have the data in an array of hashrefs like you do, you will have to create an auxiliary data structure and do a whole bunch of lookups, the cost of which may well dominate the running time of your program (not in a way it matters if it is run for 15 minutes once a day) and you might run into memory issues.
精彩评论