开发者

Perl Working On Two Hash References

I would like to compare the values of two hash references. The data dumper of my first hash is this:

$VAR1 = {
          '42-MG-BA' => [
                          {
                            'chromosome' => '19',
                            'position' => '35770059',
                            'genotype' => 'TC'
                          },
                          {
                            'chromosome' => '2',
                            'position' => '68019584',
                            'genotype' => 'G'
                          },
                          {
                            'chromosome' => '16',
                            'position' => '9561557',
                            'genotype' => 'G'
                          },

And the second hash is similar to this but with more hashes in the array. I would like to compare the genotype of my first and second hash if the position and the choromosome matches.

map {print "$_= $cave_snp_list->{$_}->[0]->{chromosome}\n"}sort keys %$cave_snp_list;
map {print "$_= $geno_seq_list->{$_}->[0]->{chromosome}\n"}sort keys %$geno_seq_list;

I could do that for the first array of the hashes. Could you help me in how to work for all the arrays?

This is my actual code in full

#!/software/bin/perl

use strict;

use warnings;
use Getopt::Long;
use Benchmark;
use Config::Config qw(Sequenom.ini);
useDatabase::Conn;
use Data::Dumper;

GetOptions("sam=s" => \my $sample);

my $geno_seq_list = getseqgenotypes($sample);
my $cave_snp_list = getcavemansnpfile($sample);
#print Dumper($geno_seq_list);
print scalar %$geno_seq_list, "\n";

foreach my $sam (keys %{$geno_seq_list}) {

    my $seq_used  = $geno_seq_list->{$sam};
    my $cave_used = $cave_snp_list->{$sam};
    print scalar(@$geno_seq_list->{$_}) if so开发者_StackOverflow社区rt keys %$geno_seq_list, "\n";
    print scalar(@$cave_used), "\n";
    #foreach my $seq2com (@ {$seq_used } ){
    #    foreach my $cave2com( @ {$cave_used} ){
    #       print $seq2com->{chromosome},":" ,$cave2com->{chromosome},"\n";
    #    }
    #}

    map { print "$_= $cave_snp_list->{$_}->[0]->{chromosome}\n" } sort keys %$cave_snp_list;
    map { print "$_= $geno_seq_list->{$_}->[0]->{chromosome}\n" } sort keys %$geno_seq_list;
}

sub getseqgenotypes {

    my $snpconn;
    my $gen_list = {};
    $snpconn = Database::Conn->new('live');
    $snpconn->addConnection(DBI->connect('dbi:Oracle:pssd.world', 'sn', 'ss', { RaiseError => 1, AutoCommit => 0 }),
        'pssd');

#my $conn2 =Database::Conn->new('live');
#$conn2->addConnection(DBI->connect('dbi:Oracle:COSI.world','nst_owner','nst_owner', {RaiseError =>1 , AutoCommit=>0}),'nst');
    my $id_ind = $snpconn->execute('snp::Sequenom::getIdIndforExomeSample', $sample);
    my $genotype = $snpconn->executeArrRef('snp::Sequenom::getGenotypeCallsPosition', $id_ind);
    foreach my $geno (@{$genotype}) {

        push @{ $gen_list->{ $geno->[1] } }, {

            chromosome => $geno->[2],
            position   => $geno->[3],
            genotype   => $geno->[4],
        };

    }

    return ($gen_list);
}    #end of sub getseqgenotypes

sub getcavemansnpfile {

    my $nstconn;
    my $caveman_list = {};
    $nstconn = Database::Conn->new('live');
    $nstconn->addConnection(
        DBI->connect('dbi:Oracle:CANP.world', 'nst_owner', 'NST_OWNER', { RaiseError => 1, AutoCommit => 0 }), 'nst');

    my $id_sample = $nstconn->execute('nst::Caveman::getSampleid', $sample);
    #print "IDSample: $id_sample\n";
    my $file_location = $nstconn->execute('nst::Caveman::getCaveManSNPSFile', $id_sample);

    open(SNPFILE, "<$file_location") || die "Error: Cannot open the file $file_location:$!\n";

    while (<SNPFILE>) {

        chomp;
        next if /^>/;
        my @data = split;
        my ($nor_geno, $tumor_geno) = split /\//, $data[5];
        # array of hash
        push @{ $caveman_list->{$sample} }, {

            chromosome => $data[0],
            position   => $data[1],
            genotype   => $nor_geno,

        };

    }    #end of while loop
    close(SNPFILE);
    return ($caveman_list);
}


The problem that I see is that you're constructing a tree for generic storage of data, when what you want is a graph, specific to the task. While you are constructing the record, you could also be constructing the part that groups data together. Below is just one example.

my %genotype_for;
my $record
    = { chromosome => $data[0]
      , position   => $data[1]
      , genotype   => $nor_geno
    };
push @{ $gen_list->{ $geno->[1] } }, $record; 

# $genotype_for{ position }{ chromosome }{ name of array } = genotype code
$genotype_for{ $data[1] }{ $data[0] }{ $sample } = $nor_geno;

...
return ( $caveman_list, \%genotype_for );

In the main line, you receive them like so:

my ( $cave_snp_list, $geno_lookup ) = getcavemansnpfile( $sample );

This approach at least allows you to locate similar position and chromosome values. If you're going to do much with this, I might suggest an OO approach.


Update

Assuming that you wouldn't have to store the label, we could change the lookup to

$genotype_for{ $data[1] }{ $data[0] } = $nor_geno;

And then the comparison could be written:

foreach my $pos ( keys %$small_lookup ) { 
    next unless _HASH( my $sh = $small_lookup->{ $pos } )
            and _HASH( my $lh = $large_lookup->{ $pos } )
            ;
    foreach my $chrom ( keys %$sh ) { 
        next unless my $sc = $sh->{ $chrom }
               and  my $lc = $lh->{ $chrom }
               ;
        print "$sc:$sc";
    }
}

However, if you had limited use for the larger list, you could construct the specific case and pass that in as a filter when creating the longer list.

Thus, in whichever loop creates the longer list, you could just go

...
next unless $sample{ $position }{ $chromosome };
my $record
    = { chromosome => $chromosome
      , position   => $position
      , genotype   => $genotype
    };
...
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜