perl Script to search for a motif in a multifasta file and print the complete sequence along with the header line

2023-02-09 08:03 问答作者：

I am able to search a motif in a multi fasta file and print the line containing the motif.... but i need to print all the sequences along with the header line of the motif containing fasta sequence. Please help me i am just a beginner in perl

#!usr/bin/perl -w
use strict;

print 开发者_开发技巧STDOUT "Enter the motif: ";
my $motif = <STDIN>;
chomp $motif;


my $line;
open (FILE, "data.fa");
while ($line = <FILE>) {
  if ($line =~ /$motif/)  {
     print $line;
   }
}

Try this:

Bio::DB::Fasta

Instructions on the page. For more examples or instructions just search Google for: "use Bio::DB::Fasta"

To install this simply follow any of these instructions, I suggest using the CPAN.pm method as super user:

Installing Perl Modules

Your script as written above doesn't remember the current sequence identifiers, so that you don't know which identifier is associated with each sequence.

I've modified your script below to read all of the FASTA sequences into a hash which maps ( identifier => sequence ), then iterate over that hash, printing out matches when appropriate. This will be an inappropriate approach for very large sequence files, but learning how to write little helper functions like this can be a very big speedup when writing new scripts to analyze data. It's also important to understand how to use and manipulate hashes and other data structures in Perl, as most code you encounter won't be written by beginners.

#!/usr/bin/perl

use strict;
use warnings;

print STDOUT "Enter the motif: ";
my $motif = <STDIN>;
chomp $motif;

my %seqs = %{ read_fasta_as_hash( 'data.fa' ) };
foreach my $id ( keys %seqs ) {
    if ( $seqs{$id} =~ /$motif/ ) {
        print $id, "\n";
        print $seqs{$id}, "\n";
    }
}

sub read_fasta_as_hash {
    my $fn = shift;

    my $current_id = '';
    my %seqs;
    open FILE, "<$fn" or die $!;
    while ( my $line = <FILE> ) {
        chomp $line;
        if ( $line =~ /^(>.*)$/ ) {
            $current_id  = $1;
        } elsif ( $line !~ /^\s*$/ ) { # skip blank lines
            $seqs{$current_id} .= $line
        }
    }
    close FILE or die $!;

    return \%seqs;
}

@james_thompson's answer is great. I would use that if you're looking for something more versatile. If you're looking for a simpler version (perhaps for teaching?), this would also suffice - though note that this would miss the motif if there's a hard return in the middle.

#!usr/bin/perl -w
use strict;

print STDOUT "Enter the motif: ";
my $motif = <STDIN>;
chomp $motif;


my $line;
my $defline;
open (FILE, "data.fa");
while ($line = <FILE>) {
  if ($line =~ /^>/) {
     $defline = $line;
   } elsif ($line =~ /$motif/)  {
     print($defline,$line);
   }
}
close (FILE);

You'll note I also added an explicit close on the file handle.

继续阅读：bioinformatics perl

perl Script to search for a motif in a multifasta file and print the complete sequence along with the header line

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？