Does a regular expression exist for enzymatic cleavage?

2022-12-13 11:57 问答作者：

Does a regular expression exist for (theoretical) tryptic cleavage of protein sequences? The cleavage rule for trypsin is: after R or K, but not before P.

Example:

Cleavage of the sequence VGTKCCTKPESERMPCTEDYLSLILNR should result in these 3 sequences (peptides):

 VGTK
 CCTKPESER
 MPCTEDYLSLILNR

Note that there is no cleavage after K in the second peptide (because P comes after K).

In Perl (it could just as well have been in C#, Python or Ruby):

  my $seq = 'VGTRCCTKPESERMPCTEDYLSLILNR';
  my @peptides = split /someRegularExpression/, $seq;

I have used this work-around (where a cut marker, =, is first inserted in the sequence and removed again if P is immediately after the cut maker):

  my $seq      = 'VGTRCCTKPESERMPCTEDYLSLILNR';
  $seq         =~ s/([RK])/$1=/g; #Main cut rule.
  $seq         =~ s/=P/P/g;       #The exception.
  my @peptides = split( /=/, $seq);

But this requires modification to a string that can potentially be 开发者_如何转开发very long and there can be millions of sequences. Is there a way where a regular expression can be used with split? If yes, what would the regular expression be?

Test platform: Windows XP 64 bit. ActivePerl 64 bit. From perl -v: v5.10.0 built for MSWin32-x64-multi-thread.

You indeed need to use the combination of a positive lookbehind and a negative lookahead. The correct (Perl) syntax is as follows:

my @peptides = split(/(?!P)(?<=[RK])/, $seq);

You could use look-around assertions to exclude that cases. Something like this should work:

split(/(?<=[RK](?!P))/, $seq)

You can use lookaheads and lookbehinds to match this stuff while still getting the correct position.

/(?<=[RK])(?!P)/

Should end up splitting on a point after an R or K that is not followed by a P.

In Python you can use the finditer method to return non-overlapping pattern matches including start and span information. You can then store the string offsets instead of rebuilding the string.

继续阅读：bioinformatics perl regex

Does a regular expression exist for enzymatic cleavage?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？