Read a file in chunks in Ruby
I need to read a file in MB chunks, is there a cleaner way to do this in Ruby:
FILENAME="d:\\tmp\\file.bin"
MEGABYTE = 1024*1024
size = File.size(FILENAME)
open(FILENAME, "rb") do |io|
read = 0
while read < size
left = (size - read)
cur = left < MEGABYTE ? left : MEGABYTE
开发者_运维百科 data = io.read(cur)
read += data.size
puts "READ #{cur} bytes" #yield data
end
end
Adapted from the Ruby Cookbook page 204:
FILENAME = "d:\\tmp\\file.bin"
MEGABYTE = 1024 * 1024
class File
def each_chunk(chunk_size = MEGABYTE)
yield read(chunk_size) until eof?
end
end
open(FILENAME, "rb") do |f|
f.each_chunk { |chunk| puts chunk }
end
Disclaimer: I'm a ruby newbie and haven't tested this.
Alternatively, if you don't want to monkeypatch File
:
until my_file.eof?
do_something_with( my_file.read( bytes ) )
end
For example, streaming an uploaded tempfile into a new file:
# tempfile is a File instance
File.open( new_file, 'wb' ) do |f|
# Read in small 65k chunks to limit memory usage
f.write(tempfile.read(2**16)) until tempfile.eof?
end
You can use IO#each(sep, limit)
, and set sep
to nil
or empty string, for example:
chunk_size = 1024
File.open('/path/to/file.txt').each(nil, chunk_size) do |chunk|
puts chunk
end
If you check out the ruby docs: http://ruby-doc.org/core-2.2.2/IO.html there's a line that goes like this:
IO.foreach("testfile") {|x| print "GOT ", x }
The only caveat is. Since, this process can read the temp file faster than the generated stream, IMO, a latency should be thrown in.
IO.foreach("/tmp/streamfile") {|line|
ParseLine.parse(line)
sleep 0.3 #pause as this process will discontine if it doesn't allow some buffering
}
https://ruby-doc.org/core-3.0.2/IO.html#method-i-read gives an example of iterating over fixed length records with read(
length
)
:
# iterate over fixed length records
open("fixed-record-file") do |f|
while record = f.read(256)
# ...
end
end
If length is a positive integer,
read
tries to read length bytes without any conversion (binary mode). It returnsnil
if an EOF is encountered before anything can be read. Fewer than length bytes are returned if an EOF is encountered during the read. In the case of an integer length, the resulting string is always in ASCII-8BIT encoding.
FILENAME="d:/tmp/file.bin"
class File
MEGABYTE = 1024*1024
def each_chunk(chunk_size=MEGABYTE)
yield self.read(chunk_size) until self.eof?
end
end
open(FILENAME, "rb") do |f|
f.each_chunk {|chunk| puts chunk }
end
It works, mbarkhau. I just moved the constant definition to the File class and added a couple of "self"s for clarity's sake.
精彩评论