specific pattern search from given fasta file of genome sequence
i'm trying to find pattern search with specific constraints :
input pattern to be searched will be entered from std. input
R,W and Y will be anywhere in the pattern which should be replaced with below values before searching and so every combination possible for the pattern are to be searched in a sequence.
R = C or G
W = A or T
Y = A or G
[e.g, pattern may be AWTYCR and sequence may be ATCGATGAT....]
we have to find starting positions
Also one misma开发者_如何学编程tch is allowed.
i.e., one character of the pattern may or maynot match while matching with the sequence.
output:
should be written into a file in a table format:
for example,
finding ATCR (R = C or G as above mentioned) in sequence ATCGAT will result in
S.no position the_matched_pattern
1 1 ATCG
2 1 ATCC
the match and mismatch may come in any position (i.e, in same position or in different position)
The following simple code is a starting point. It can be easily generalized to read the patterns from the file, read the sequence from the Fasta file; unique the results, etc.
sub trans_pat {
my $pat=shift;
$pat=~s/R/\[CG\]/g;
$pat=~s/W/\[AT\]/g;
$pat=~s/Y/\[AG\]/g;
return $pat;
}
sub find_pat {
my ($pat,$seq) = (@_);
print "Looking for pattern $pat\n";
while ($seq=~m/$pat/g) {
print "... match at $-[0]: $&\n";
};
}
my $read_pat="ATCR";
my $seq="ATCGATATCGAT";
# Looking for a perfect match
find_pat (trans_pat($read_pat),$seq);
# Allowing for a single mismatch
foreach $i (1..length $read_pat) {
my $mis_pat = $read_pat;
substr($mis_pat,$i-1,1) = ".";
find_pat (trans_pat($mis_pat),$seq);
}
A sample implementation:
perl -e 'sub trans {$pat=shift;$pat=~s/R/\[CG\]/g;return $pat};$read_pat="ATCR";$seq="ATCGATATCGAT";$pat=trans($read_pat);print "Looking for pattern $pat\n";while ($seq=~m/$pat/g) {print "... match at $-[0]: $&\n"};foreach $i (1..length $read_pat) {$mis_pat = $read_pat;substr($mis_pat,$i-1,1)=".";$pat=trans($mis_pat);print "Looking for pattern $pat\n"; while ($seq=~m/$pat/g) {print "... match at $-[0]: $&\n"}}'
Yields
Looking for pattern ATC[CG]
... match at 0: ATCG
... match at 6: ATCG
Looking for pattern .TC[CG]
... match at 0: ATCG
... match at 6: ATCG
Looking for pattern A.C[CG]
... match at 0: ATCG
... match at 6: ATCG
Looking for pattern AT.[CG]
... match at 0: ATCG
... match at 6: ATCG
Looking for pattern ATC.
... match at 0: ATCG
... match at 6: ATCG
精彩评论