开发者

How do I wrap ruby IO with a sliding window filter

I'm using an opaque API in some ruby code which takes a File/IO as a parameter. I want to be able to pass it an IO object that only gives access to a given range of data in the real IO object.

For example, I have a 8GB file, and I want to give the api an IO object that has a 1GB range within the开发者_运维问答 middle of my real file.

real_file = File.new('my-big-file')
offset = 1 * 2**30 # start 1 GB into it
length = 1 * 2**30 # end 1 GB after start
filter = IOFilter.new(real_file, offset, length)

# The api only sees the 1GB of data in the middle
opaque_api(filter)

The filter_io project looks like it would be the easiest to adapt to do this, but doesn't seem to support this use case directly.


I think you would have to write it yourself, as it seems like a rather specific thing: you would have to implement all (or, a subset that you need) of IO's methods using a chunk of the opened file as a data source. An example of the "speciality" would be writing to such stream - you would have to take care not to cross the boundary of the segment given, i.e. constantly keeping track of your current position in the big file. Doesn't seem like a trivial job, and I don't see any shortcuts that could help you there.

Perhaps you can find some OS-based solution, e.g. making a loopback device out of the part of the large file (see man losetup and particularly -o and --sizelimit options, for example).

Variant 2:

If you are ok with keeping the contents of the window in memory all the time, you may wrap StringIO like this (just a sketch, not tested):

def sliding_io filename, offset, length
  File.open(filename, 'r+') do |f|
    # read the window into a buffer
    f.seek(offset)
    buf = f.read(length)
    # wrap a buffer into StringIO and pass it given block
    StringIO.open(buf) do |buf_io|
      yield(buf_io)
    end
    # write altered buffer back to the big file
    f.seek(offset)
    f.write(buf[0,length])
  end
end

And use it as you would use block variant of IO#open.


I believe the IO object has the functionality you are looking for. I've used it before for MD5 hash summing similarly sized files.

incr_digest = Digest::MD5.new()
file = File.open(filename, 'rb') do |io|
    while chunk = io.read(50000)
        incr_digest << chunk
    end
end

This was the block I used, where I was passing the chunk to the MD5 Digest object.

http://www.ruby-doc.org/core/classes/IO.html#M000918

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜