开发者

Read a file in chunks in Ruby

I need to read a file in MB chunks, is there a cleaner way to do this in Ruby:

FILENAME="d:\\tmp\\file.bin"
MEGABYTE = 1024*1024
size = File.size(FILENAME)
open(FILENAME, "rb") do |io| 
  read = 0
  while read < size
    left = (size - read)
    cur = left < MEGABYTE ? left : MEGABYTE
   开发者_运维百科 data = io.read(cur)
    read += data.size
    puts "READ #{cur} bytes" #yield data
  end
end


Adapted from the Ruby Cookbook page 204:

FILENAME = "d:\\tmp\\file.bin"
MEGABYTE = 1024 * 1024

class File
  def each_chunk(chunk_size = MEGABYTE)
    yield read(chunk_size) until eof?
  end
end

open(FILENAME, "rb") do |f|
  f.each_chunk { |chunk| puts chunk }
end

Disclaimer: I'm a ruby newbie and haven't tested this.


Alternatively, if you don't want to monkeypatch File:

until my_file.eof?
  do_something_with( my_file.read( bytes ) )
end

For example, streaming an uploaded tempfile into a new file:

# tempfile is a File instance
File.open( new_file, 'wb' ) do |f|
  # Read in small 65k chunks to limit memory usage
  f.write(tempfile.read(2**16)) until tempfile.eof?
end


You can use IO#each(sep, limit), and set sep to nil or empty string, for example:

chunk_size = 1024
File.open('/path/to/file.txt').each(nil, chunk_size) do |chunk|
  puts chunk
end


If you check out the ruby docs: http://ruby-doc.org/core-2.2.2/IO.html there's a line that goes like this:

IO.foreach("testfile") {|x| print "GOT ", x }

The only caveat is. Since, this process can read the temp file faster than the generated stream, IMO, a latency should be thrown in.

IO.foreach("/tmp/streamfile") {|line|
  ParseLine.parse(line)
  sleep 0.3 #pause as this process will discontine if it doesn't allow some buffering 
}


https://ruby-doc.org/core-3.0.2/IO.html#method-i-read gives an example of iterating over fixed length records with read(length):

# iterate over fixed length records
open("fixed-record-file") do |f|
  while record = f.read(256)
    # ...
  end
end

If length is a positive integer, read tries to read length bytes without any conversion (binary mode). It returns nil if an EOF is encountered before anything can be read. Fewer than length bytes are returned if an EOF is encountered during the read. In the case of an integer length, the resulting string is always in ASCII-8BIT encoding.


FILENAME="d:/tmp/file.bin"

class File
  MEGABYTE = 1024*1024

  def each_chunk(chunk_size=MEGABYTE)
    yield self.read(chunk_size) until self.eof?
  end
end

open(FILENAME, "rb") do |f|
  f.each_chunk {|chunk| puts chunk }
end

It works, mbarkhau. I just moved the constant definition to the File class and added a couple of "self"s for clarity's sake.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜