removing a repetitive sequence
Im new to Ruby and I like some pointers please. I have a file that contains many of the following:
UPDATE:
+
?@??>=???>?>??>?>=9>>==?2>===<=>=== @IL9_2657:1:1:1:1217/1开发者_StackOverflow
TTTTCCGTGCTTTTTTTTTCGGTTCGATCCCCTCTTT
+
I want a script that will say for each block that contains + to +, remove the block if the sequence has a
TTTTTTTTT.
Thanks in advance.
Mark
This should do:
s = 'preceding_string+ ?@??>=???>?>??>?>=9>>==?2>===<=>=== @IL9_2657:1:1:1:1217/1 TTTTCCGTGCTTTTTTTTTCGGTTCGATCCCCTCTTT +following_string'
s.gsub!(/\+[^+]*TTTTTTTTT[^+]*\+/, '')
p s
# => "preceding_stringfollowing_string"
First split your data into an array. scan
can do this with a simple regular expression. You can then remove the unwanted items with reject!
. For example:
data = "+x+ +y+ +TTTTTTTTT+ +z+"
blocks = data.scan(/\+[^+]+\+/)
blocks.reject! { |b| b.include? "TTTTTTTTT" }
p blocks
# => ["+x+", "+y+", "+z+"]
ruby -0777 -ne 'puts $_.split(/\+/).reject{|x| x[/TTTTTTT/] }.join("+")' file
Could the sequence have too many adenines? If so, use bioruby to get the reverse complement of the sequence.
精彩评论