开发者

removing a repetitive sequence

Im new to Ruby and I like some pointers please. I have a file that contains many of the following:

UPDATE:

+ 

?@??>=???>?>??>?>=9>>==?2>===<=>=== @IL9_2657:1:1:1:1217/1开发者_StackOverflow 

TTTTCCGTGCTTTTTTTTTCGGTTCGATCCCCTCTTT

 +

I want a script that will say for each block that contains + to +, remove the block if the sequence has a

TTTTTTTTT.

Thanks in advance.

Mark


This should do:

s = 'preceding_string+ ?@??>=???>?>??>?>=9>>==?2>===<=>=== @IL9_2657:1:1:1:1217/1 TTTTCCGTGCTTTTTTTTTCGGTTCGATCCCCTCTTT +following_string'

s.gsub!(/\+[^+]*TTTTTTTTT[^+]*\+/, '')
p s

# => "preceding_stringfollowing_string"


First split your data into an array. scan can do this with a simple regular expression. You can then remove the unwanted items with reject!. For example:

data = "+x+ +y+ +TTTTTTTTT+ +z+"

blocks = data.scan(/\+[^+]+\+/)
blocks.reject! { |b| b.include? "TTTTTTTTT" }

p blocks
# => ["+x+", "+y+", "+z+"]


ruby -0777 -ne 'puts $_.split(/\+/).reject{|x| x[/TTTTTTT/] }.join("+")' file


Could the sequence have too many adenines? If so, use bioruby to get the reverse complement of the sequence.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜