Ruby - Read file in batches

2023-01-01 19:25 问答作者：

I am reading a file that is 10mb in size and which contains some id's. I read them into a list in ruby. I am concerned that it might cause memory issues in the future, when the numb开发者_开发百科er of id's in file might increase. Is there a effective way of reading a large file in batches?

Thank you

With Lazy Enumerators and each_slice, you can get the best of both worlds. You don't need to worry about cutting lines in the middle, and you can iterate over multiple lines in a batch. batch_size can be chosen freely.

header_lines = 1
batch_size   = 2000

File.open("big_file") do |file|
  file.lazy.drop(header_lines).each_slice(batch_size) do |lines|
    # do something with batch of lines
  end
end

It could be used to import a huge CSV file into a database:

require 'csv'
batch_size   = 2000

File.open("big_data.csv") do |file|
  headers = file.first
  file.lazy.each_slice(batch_size) do |lines|
    csv_rows = CSV.parse(lines.join, headers: headers)
    # do something with 2000 csv rows, e.g. bulk insert them into a database
  end
end

there's no universal way.

1) you can read file by chunks:

File.open('filename','r') do |f|
  chunk = f.read(2048)
  ...
end

disadvantage: you can miss a substring if it'd be between chunks, i.e. you look for "SOME_TEXT", but "SOME_" is a last 5 bytes of 1st 2048-byte chunk, and "TEXT" is a 4 bytes of 2nd chunk

2) you can read file line-by-line

File.open('filename','r') do |f|
  line = f.gets
  ...
end

disadvantage: this way it'd be 2x..5x slower than first method

If you're worried this much about speed/memory efficiency, have you considered shelling out to the shell and use grep, awk, sed etc.? If I knew a bit more about the structure of the input file and what you're trying to extract, I could potentially construct a command for you.

继续阅读：ruby ruby-on-rails scripting

Ruby - Read file in batches

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？