Can as 2D byte array be made one huge continuous byte array?
I have an extremely large 2D bytearray in memory,
byte MyBA = new byte[int.MaxValue][10];
Is there any way (probably unsafe) that I can fool C# into thinking this is one huge continuous byte array? I want to do this such that I can pass it to a MemoryStream
and then a BinaryReader
.
MyReader = new BinaryReader(MemoryStream(*MyBA)) //Syntax obvio开发者_Go百科usly made-up here
I do not believe .NET provides this, but it should be fairly easy to implement your own implementation of System.IO.Stream
, that seamlessly switches backing array. Here are the (untested) basics:
public class MultiArrayMemoryStream: System.IO.Stream
{
byte[][] _arrays;
long _position;
int _arrayNumber;
int _posInArray;
public MultiArrayMemoryStream(byte[][] arrays){
_arrays = arrays;
_position = 0;
_arrayNumber = 0;
_posInArray = 0;
}
public override int Read(byte[] buffer, int offset, int count){
int read = 0;
while(read<count){
if(_arrayNumber>=_arrays.Length){
return read;
}
if(count-read <= _arrays[_arrayNumber].Length - _posInArray){
Buffer.BlockCopy(_arrays[_arrayNumber], _posInArray, buffer, offset+read, count-read);
_posInArray+=count-read;
_position+=count-read;
read=count;
}else{
Buffer.BlockCopy(_arrays[_arrayNumber], _posInArray, buffer, offset+read, _arrays[_arrayNumber].Length - _posInArray);
read+=_arrays[_arrayNumber].Length - _posInArray;
_position+=_arrays[_arrayNumber].Length - _posInArray;
_arrayNumber++;
_posInArray=0;
}
}
return count;
}
public override long Length{
get {
long res = 0;
for(int i=0;i<_arrays.Length;i++){
res+=_arrays[i].Length;
}
return res;
}
}
public override long Position{
get { return _position; }
set { throw new NotSupportedException(); }
}
public override bool CanRead{
get { return true; }
}
public override bool CanSeek{
get { return false; }
}
public override bool CanWrite{
get { return false; }
}
public override void Flush(){
}
public override void Seek(long offset, SeekOrigin origin){
throw new NotSupportedException();
}
public override void SetLength(long value){
throw new NotSupportedException();
}
public override void Write(byte[] buffer, int offset, int count){
throw new NotSupportedException();
}
}
Another way to workaround the size-limitation of 2^31 bytes is UnmanagedMemoryStream
which implements System.IO.Stream
on top of an unmanaged memory buffer (which might be as large as the OS supports). Something like this might work (untested):
var fileStream = new FileStream("data",
FileMode.Open,
FileAccess.Read,
FileShare.Read,
16 * 1024,
FileOptions.SequentialScan);
long length = fileStream.Length;
IntPtr buffer = Marshal.AllocHGlobal(new IntPtr(length));
var memoryStream = new UnmanagedMemoryStream((byte*) buffer.ToPointer(), length, length, FileAccess.ReadWrite);
fileStream.CopyTo(memoryStream);
memoryStream.Seek(0, SeekOrigin.Begin);
// work with the UnmanagedMemoryStream
Marshal.FreeHGlobal(buffer);
Agree. Anyway you have limit of array size itself.
If you really need to operate huge arrays in a stream, write your custom memory stream class.
I think you can use a linear structure instead of a 2D structure using the following approach.
Instead of having byte[int.MaxValue][10] you can have byte[int.MaxValue*10]. You would address the item at [4,5] as int.MaxValue*(4-1)+(5-1). (a general formula would be (i-1)*number of columns+(j-1).
Of course you could use the other convention.
If I understand your question correctly, you've got a massive file that you want to read into memory and then process. But you can't do this because the amount of data in the file exceeds that of any single-dimensional array.
You mentioned that speed is important, and that you have multiple threads running in parallel to process the data as quickly as possible. If you're going to have to partition the data for each thread anyway, why not base the number of threads on the number of byte[int.MaxValue]
buffers required to cover everything?
You can create a memoryStream and then pass the array in line by line using the method Write
EDIT: The limit of a MemoryStream is certainly the amount of memory present for your application. Maybe there is a limit beneath that but if you need more memory, then you should consider to modify your overall architecture. E.g. you could process your data in chunks, or you could do a swapping mechanism to a file.
If you are using Framework 4.0, you have the option of working with a MemoryMappedFile. Memory mapped files can be backed by a physical file, or by the Windows swap file. Memory mapped files act like an in-memory stream, transparently swapping data to/from the backing storage if and when required.
If you are not using Framework 4.0, you can still use this option, but you will need to either write your own or find an exsiting wrapper. I expect there are plenty on The Code Project.
精彩评论