开发者

how should I combine these regex?

 $syb =~ s/(at{3,6})/\U$1/gi;

 $syb =~ s/(aat{2,5})/\U$1/gi;

 $syb =~ s/(aaat{1,4})/\U$1/gi;

 $syb =~ s/(aaaat{0,3})/\U$1/gi;

 $syb =~ s/(aaaaat{0,2})/\U$1/gi;

 $syb =~ s/(a{4,7})/\U$1/gi;

 $syb =~ s/(a开发者_如何学Caaaaat)/\U$1/gi;

 $syb =~ s/(t{4,7})/\U$1/gi;

Is there any way I could get all these regexps into one? Is it bad practice to use this many regexps on the same string? the end result should if $syb is aaatcgacgatcgatcaatttcgaaaaaggattttttatgcacgcacggggattaaaa the regexp should make it AAATcgacgatcgatcAATTTcgAAAAAggATTTTTTatgcacgcacggggattAAAA

one problem with my regexps is that they match aaaatttt as two separate matches and output AAAATTTT. i need to fix this as well.

i have a string of A's C's T's and G's stored in $syb. i want to capitalize any part of the string that has a set of A's followed by T's, just A's or just T's (T's followed by A's should not) and the capitalized section may be no shorter than 4 and no longer than 7


This is a tough one. I think this might work:

s/((?<!a)a|(?<!a|t)t)((?<!t)\1|t){3,6}(?!\2|t)/\U$&/gi

Essentially, what I'm doing is:

  1. Get an a not preceded by an a. Or a t not preceded by an a or t.
    • ((?<!a)a|(?<!a|t)t)
  2. Get 3-6 more of the first match, or t's not preceded by a t
    • ((?<!t)\1|t){3,6}
  3. Make sure it is not followed by the last item in the sequence or a t.
    • (?!\2|t)/

And the perl code:

$syb = "aaatcgacgatcgatcaatttcgaaaaaggattttttatgcacgcacggggattaaaaactgaaaattttactgaaaaaaaasttttttts";
$syb =~ s/((?<!a)a|(?<!a|t)t)((?<!t)\1|t){3,6}(?!\2|t)/\U$&/gi;
print $syb;

Edit taking a queue from qtax I've removed capturing groups from mine and chars from his:

s/(?:(?<!a)a|(?<!a|t)t)(?:(?<!t)a|t){3,6}(?!(?<=a)a|t)/\U$&/gi

Edit: reducing the regex by 5 chars.

s/(?<!a|t(?=t))(?:a|t(?!a)){3,6}(?:a(?!a)|t)(?!t)/\U$&/gi

with commments

s/
# Look behind for a char not an 'a' nor a 't' followed by a 't'
(?<!a|t(?=t))
# Capture 3-6 'a's or 't's not followed by 'a's
(?:a|t(?!a)){3,6}
# Capture an 'a' not followed by an 'a', or a 't'
(?:a(?!a)|t)
#make sure none of this is followed by a 't'.
(?!t)
/\U$&/gix;


As stated in the chat, if the possible combination is longer than 7 it should be ignored, and no parts of it replaced. See this chat.

My back reference free solution:

s/
(?:(?<!a)a|(?<!t|a)t)
(?:(?<=a)a|(?<=a)t|(?<=t)t){3,6}
(?!(?<=a)a|(?<=a)t|(?<=t)t)
/\U$&/gix;

With some comments:

s/
# match the first [at] only if not part of a valid sequence
(?:(?<!a)a|(?<!t|a)t)
# only match the allowed transitions: a->a, a->t, t->t
(?:(?<=a)a|(?<=a)t|(?<=t)t){3,6}
# ending can not be a valid transition: negate the above
(?!(?<=a)a|(?<=a)t|(?<=t)t)
/\U$&/gix;

Update: Applied the shortening ideas by Jacob, here with some comments:

s/
# match the first a or t only if it's not part of a valid sequence
(?:(?<!a)a|(?<!t|a)t)
# only match the allowed transitions: a->a, a->t, t->t
# (t can follow any of the previous chars, so no need to check it)
(?:(?<=a)a|t){3,6}
# ending can not be a valid transition: negate the above
(?!(?<=a)a|t)
/\U$&/gix;

Edit: A less regexy solution just for fun:

s/(a+t*|t+)/(length $1 >= 4 && length $1 <= 7)? "\U$1": $1/gie;

PS: Thanks to the OP for a more fun than usual regex question. :)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜