开发者

Perl data structure traversal -- reference followed key

Result: Many lines of HASH(0x1948958) ARRAY(0x1978250) ./directory/filename

Desired result: [Key of first hash] [Key of second hash] ./directory/filename #(elements of array, currently working)

Catch: Should carry across to N level structures, hence my attempt at using Data::Walk.

What I really want to do as I walk the structure is to reference the key that is being used. Kind of like Data::Dumper but tab-separated instead of in code format. I think the likely solutions (in order of preference) are:

  • Some call to Data::Walk that I've overlooked.
  • A better module for this task that I don't know about.
  • A quick code snippet that I can inline
  • My own module / fork of Data::Walk / Data::Dumper (big frown) that will add this functionality.

use strict;
use File::Basename;
use Data::Walk;

my $files;
while (<>) {
        chomp;
        #ls -l output in a file; referencing filename from it (8th column)
        my @line = split(/ /, $_, 8);
        #fileparse exported by File::Basename
        my ($name,$path) = fileparse($line[7]);
        open (my $fh, '<', $path . $name);
        my $sha = Digest::SHA->new('sha1');
        $sha->addfile($fh);
        #finding files by basename, then unique hash, then however many places it is stored.
        #question not why I don't use the hash as the first field.

        #basename    digest    path
        push(@{$files->{$name}->{$sha->hexdigest}}开发者_如何学Python, $path . $name);
}

my @val;
sub walkit {
        $val[$Data::Walk::depth - 1] =  $_;
        if ($Data::Walk::depth == 3) {
                print join("\t", @val), "\n";
        }
}

&walk (\&walkit, %$files);

Gurus?


Edit: against my better judgement, I'll try to answer this question again.

Here's a simple approach to print what you want. Using Data::Walk is not feasible because you don't have key context when you are inside a hash (you just get a pointer to the container.)

This function works for somewhat complicated structures. Of course it will not give proper output if you put a function reference or something wonky in there.

use strict;
use warnings;

my $res;
sub walk {
    my ($item, $path) = @_;
    if (ref $item eq 'ARRAY') {
        foreach (@$item) {
            walk($_, $path);
        }
    } elsif (ref $item eq 'HASH') {
        foreach (keys %$item) {
            push @$path, $_;
            walk($item->{$_}, $path);
            pop @$path;
        }
    } else {
        print join('-', @$path, $item), "\n";
    }
}

my $struct = {
    a => {
            a1 => { a11 => [ 1, 2, 3 ] },
            a2 => { a22 => [5, 6, 7] }
    },
    b => { b1 => [ 99 ], },
    c => [ 100, 101, ],
    d => [ 101, { d2 => { d3 => [200, 210] }, }, ],
};

walk $struct;


for my $name (keys(%$files)) {
   for my $digest (keys(%{$files->{$name}})) {
      my @qfns = @{ $files->{$name}{$digest} };
      if (@qfns > 1) {
         say "For $name and $digest,";
         say "   $_" for @qfns;
      }
   }
}

(I'm assuming you're looking for duplicates, so I print nothing when there's only one path associated with a name-digest combo. You can remove the if if you want to print everything.)

Some other cleanup:

use strict;
use warnings;
use 5.010;

use Digest::SHA    qw( );
use File::Basename qw( basename );

sub calc_digest {
   my ($qfn) = @_;
   open(my $fh, '<', $qfn) or die $!;
   my $sha = Digest::SHA->new('sha1');
   $sha->addfile($fh);
   return $sha->hexdigest();
}

my $files;
while (<>) {
   my $qfn = (split)[7];
   my $name = basename($path);
   my $digest = calc_digest($qfn);
   push @{ $files->{$name}{$digest} }, $qfn;
}

("qfn" stands for "qualified file name", which means the path to the file, which wasn't what $path contained. You were repeatedly building the path even though $line[7] contained it.)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜