Perl - How do I count and print occurrences of domains in email address array?
I have been struggling with this for a couple days now and cannot seem to figure it out.
I have an array of email addresses that were created via push(@emails,$email)
in a while loop.
I am attempting to create a list of unique domains with occurrence count of each in the array.
Ordered by number of o开发者_运维问答ccurrences.
So, if the array @emails
has:
john@yadoo.com ringo@geemail.net george@zoohoo.org paul@yadoo.com
I can print:
yadoo.com 2
geemail.net 1
zoohoo.org 1
I found this example based on emails in a file but, WAY over my head. Can someone help me in a more verbose code example that can be used with an array of email addresses?
perl -e 'while(<>){chomp;/^[^@]+@([^@]+)$/;$h{$1}++;}
foreach $k (sort { $h{$b} <=> $h{$a} } keys %h) {print $h{$k}." ".$k."\n";} infile
I also tried: (more to my level of lack of understanding)
foreach my $domain (sort keys %$domains) {
print "$domain"."=";
print $domains->{$domain}."\n";
};
AND
my %countdoms;
$countdoms{$_}++ for @domains;
print "$_ $countdoms{$_}\n" for keys %countdoms;
The best result I got of many different attempts was a total count (which was 1812 (accurate count) with a number 2 next to it. I am close, possibly?
Instead of giving you another answer, let me explain you what your code example is doing:
while(<>){chomp;/^[^@]+@([^@]+)$/;$h{$1}++;}
foreach $k (sort { $h{$b} <=> $h{$a} } keys %h) {print $h{$k}." ".$k."\n";}
The first line counts the domains from emails in files.
while(<>)
iterates over the input files line by line. The input files are the file(s) passed as arguments or stdin if no arguments were passed. Each line is placed in $_
.
chomp;
simply removes the newline from the end of $_
.
/^[^@]+@([^@]+)$/
is the regular expression that parses out the domain and is applied to $_
. It checks for something that has no '@' in the first part, then a '@' and then no '@' in the last part. It remembers the last part, which will be stored in $1
. ^
and $
stand for the beginning and the end of the string, respectively.
$h{$1}++;
uses the domain (in $1
) to increment the count in the hash %h
. This works even if it's not present, because undef
behaves here like 0.
In order to make this work for your list, you can just do
foreach(@emails) {/^[^@]+@([^@]+)$/;$h{$1}++;}
The second line prints the domains from the hash %h
.
sort { $h{$b} <=> $h{$a} } keys %h
returns a list of domains sorted by descending occurrence by using the comparison function $h{$b} <=> $h{$a}
to look up the count. Note that it's b <=> a, not a <=> b, this makes it descending.
The rest of line 2 prints out the result.
If you have your email address populated in an array this'll get you a count for each domain. I'm sure someone can produce something prettier!
my @emails = ('john@yadoo.com','ringo@geemail.net','george@zoohoo.org','paul@yadoo.com');
my %domainCount;
foreach(@emails){
if ($_ =~ /@(\w+.*)/){
$domainCount{$1}++;
}
}
for my $domain (sort { $domainCount{$b} <=> $domainCount{$a}} keys %domainCount ){
print "$domain - $domainCount{$domain}\n";
}
It's a bit crude because I am rusty on Perl but this should do the job:
use strict;
$|=1;
my ($dom, %hsh);
my @arr = ('john@yadoo.com', 'ringo@geemail.net', 'george@zoohoo.org', 'paul@yadoo.com');
foreach (@arr) {
($dom) = ($_ =~ /.*\@(.*)$/);
$hsh{$dom}++;
}
foreach (keys %hsh) {
print ("$_:$hsh{$_}\n");
}
Another variation:
use strict;
use warnings;
my @array
= qw<john@yadoo.com ringo@geemail.net george@zoohoo.org paul@yadoo.com>
;
my %dom_count;
$dom_count{ $_ }++ foreach map { ( split '@' )[-1] } @array;
foreach my $pair (
sort { $b->[1] <=> $a->[1] or $a->[0] cmp $b->[0] }
map { [ $_ => $dom_count{ $_ } ] } keys %dom_count
) {
print "@$pair\n";
}
精彩评论