How do I wrap ruby IO with a sliding window filter
I'm using an opaque API in some ruby code which takes a File/IO as a parameter. I want to be able to pass it an IO object that only gives access to a given range of data in the real IO object.
For example, I have a 8GB file, and I want to give the api an IO object that has a 1GB range within the开发者_运维问答 middle of my real file.
real_file = File.new('my-big-file')
offset = 1 * 2**30 # start 1 GB into it
length = 1 * 2**30 # end 1 GB after start
filter = IOFilter.new(real_file, offset, length)
# The api only sees the 1GB of data in the middle
opaque_api(filter)
The filter_io project looks like it would be the easiest to adapt to do this, but doesn't seem to support this use case directly.
I think you would have to write it yourself, as it seems like a rather specific thing: you would have to implement all (or, a subset that you need) of IO
's methods using a chunk of the opened file as a data source. An example of the "speciality" would be writing to such stream - you would have to take care not to cross the boundary of the segment given, i.e. constantly keeping track of your current position in the big file. Doesn't seem like a trivial job, and I don't see any shortcuts that could help you there.
Perhaps you can find some OS-based solution, e.g. making a loopback device out of the part of the large file (see man losetup
and particularly -o
and --sizelimit
options, for example).
Variant 2:
If you are ok with keeping the contents of the window in memory all the time, you may wrap StringIO
like this (just a sketch, not tested):
def sliding_io filename, offset, length
File.open(filename, 'r+') do |f|
# read the window into a buffer
f.seek(offset)
buf = f.read(length)
# wrap a buffer into StringIO and pass it given block
StringIO.open(buf) do |buf_io|
yield(buf_io)
end
# write altered buffer back to the big file
f.seek(offset)
f.write(buf[0,length])
end
end
And use it as you would use block variant of IO#open
.
I believe the IO object has the functionality you are looking for. I've used it before for MD5 hash summing similarly sized files.
incr_digest = Digest::MD5.new()
file = File.open(filename, 'rb') do |io|
while chunk = io.read(50000)
incr_digest << chunk
end
end
This was the block I used, where I was passing the chunk to the MD5 Digest object.
http://www.ruby-doc.org/core/classes/IO.html#M000918
精彩评论