开发者

perl Script to search for a motif in a multifasta file and print the complete sequence along with the header line

I am able to search a motif in a multi fasta file and print the line containing the motif.... but i need to print all the sequences along with the header line of the motif containing fasta sequence. Please help me i am just a beginner in perl

#!usr/bin/perl -w
use strict;

print 开发者_开发技巧STDOUT "Enter the motif: ";
my $motif = <STDIN>;
chomp $motif;


my $line;
open (FILE, "data.fa");
while ($line = <FILE>) {
  if ($line =~ /$motif/)  {
     print $line;
   }
}


Try this:

Bio::DB::Fasta

Instructions on the page. For more examples or instructions just search Google for: "use Bio::DB::Fasta"

To install this simply follow any of these instructions, I suggest using the CPAN.pm method as super user:

Installing Perl Modules


Your script as written above doesn't remember the current sequence identifiers, so that you don't know which identifier is associated with each sequence.

I've modified your script below to read all of the FASTA sequences into a hash which maps ( identifier => sequence ), then iterate over that hash, printing out matches when appropriate. This will be an inappropriate approach for very large sequence files, but learning how to write little helper functions like this can be a very big speedup when writing new scripts to analyze data. It's also important to understand how to use and manipulate hashes and other data structures in Perl, as most code you encounter won't be written by beginners.

#!/usr/bin/perl

use strict;
use warnings;

print STDOUT "Enter the motif: ";
my $motif = <STDIN>;
chomp $motif;

my %seqs = %{ read_fasta_as_hash( 'data.fa' ) };
foreach my $id ( keys %seqs ) {
    if ( $seqs{$id} =~ /$motif/ ) {
        print $id, "\n";
        print $seqs{$id}, "\n";
    }
}

sub read_fasta_as_hash {
    my $fn = shift;

    my $current_id = '';
    my %seqs;
    open FILE, "<$fn" or die $!;
    while ( my $line = <FILE> ) {
        chomp $line;
        if ( $line =~ /^(>.*)$/ ) {
            $current_id  = $1;
        } elsif ( $line !~ /^\s*$/ ) { # skip blank lines
            $seqs{$current_id} .= $line
        }
    }
    close FILE or die $!;

    return \%seqs;
}


@james_thompson's answer is great. I would use that if you're looking for something more versatile. If you're looking for a simpler version (perhaps for teaching?), this would also suffice - though note that this would miss the motif if there's a hard return in the middle.

#!usr/bin/perl -w
use strict;

print STDOUT "Enter the motif: ";
my $motif = <STDIN>;
chomp $motif;


my $line;
my $defline;
open (FILE, "data.fa");
while ($line = <FILE>) {
  if ($line =~ /^>/) {
     $defline = $line;
   } elsif ($line =~ /$motif/)  {
     print($defline,$line);
   }
}
close (FILE);

You'll note I also added an explicit close on the file handle.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜