Avoiding/Removing nil when using a case statement while parsing a string

2022-12-21 00:31 问答作者：

sample data:

DNA : 
This is a string

BaseQuality :
4 4 4 4 4 4 6 7 7 7 

Metadata : 
Is_read

DNA : 
yet another string

BaseQuality : 
4 4 4 4 7 7 4 8 4 4 4 4 4

Metadata :
Is_read
SCF_File 
.
.
.

I have a method that is using a case statement as follows to separate parts of a longer text fil开发者_StackOverflow中文版e into records using the delimeter "\n\n". And a class that models a data object

def parse_file(myfile)
    $/ = "\n\n"
    records = []
    File.open(myfile) do |f|
      f.each_line do |line|
        read = Read.new     
         case line
          when /^DNA/
            read.dna_data = line.strip
          when /^BaseQuality/
            read.quality_data =line.strip
          when /^Metadata/
            read.metadata =line.strip
          else
            puts "Unrecognized line: #{line}"
        end
        records.push read
      end
    end
    records
  end

class Read
attr_accessor :dna_data,:quality_data,:metadata
end

records.each do |r|
 puts r.dna_data
end

dna data contains the 'rightful' string part as well as two nil 'objects'/ irritating nils!

"This is a string"
nil
nil

My problems are the nil strings shown above which are assigned to dna_data when using read.dna_data = line.

Please how do you get rid of them? How do you avoid them in the first instance. What am i missing? Is my approach 'smelly'? Thank you

The problem is that the code creates a new instance of Read for each line. Instead, it should create an instance for each section. It appears that a section starts with the DNA header, so:

def parse_file(myfile)
  $/ = "\n\n"
  records = []
  File.open(myfile) do |f|
    read = nil                              # <- NEW
    f.each_line do |line|
      #read = Read.new                      # <- DELETED
      case line
      when /^DNA/
        read = Read.new                     # <- NEW
        read.dna_data = line.strip
      when /^BaseQuality/
        read.quality_data = line.strip
      when /^Metadata/
        read.metadata = line.strip
        records.push read                   # <= ADDED
      else
        puts "Unrecognized line: #{line}"
      end
      #records.push read                    # <= DELETED
    end
  end
  records
end

Having the parsed record pushed onto the records array after reading metadata works, but only if each record always contains metadata and the metadata is always last. We can make the program more forgiving of changes in the data layout by pushing the read onto records when it is first created:

def parse_file(myfile)
  $/ = "\n\n"
  records = []
  File.open(myfile) do |f|
    f.each_line do |line|
      read = Read.new
      case line
      when /^DNA/
        records << Read.new
        records.last.dna_data = line.strip
      when /^BaseQuality/
        records.last.quality_data = line.strip
      when /^Metadata/
        records.last.metadata = line.strip
      else
        puts "Unrecognized line: #{line}"
      end
    end
  end
  records
end

You may wish to see if BioRuby is appropriate to your needs. I use it to handle quality sequences as well as nucleotide sequences.

First off, I would avoid using Ruby for bioinformatics, it's not fast enough for certain set of problems. Sooner or later, you will hit issues and your program will crwal to a stop.

From what I gathered, you are trying to remove nils from an array. Here's two ways of doing so:

use the compact method.

[nil, nil, 'asdfa'].compact # >> ['asdfa']
don't add nil when you are adding elements.

records.push read unless read.nil?

records.push read if read # nil gets evaluated to false.

继续阅读：ruby string

Avoiding/Removing nil when using a case statement while parsing a string

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？