Quickly load 350M numbers into a double[] array in C#
I am going to store 350M pre-calculated double numbers in a binary file, and load them into memory as my dll starts up. Is there any built in way to load it up in parallel, or should I split the data into multiple files myself and take care of multiple threads myself too?
Answering the comments: I will be running this dll on powerful enough boxes, most likely only on 64 bit ones. Because all the access to my numbers will be via properties anyway, I can store my numbers in several arrays.
[update]
Everyone, thanks for answering! I'm looking forward to a lot of benchmarking on different boxes. Regarding the need: I want to speed up a very slow calculation, so I am going to pre-calcula开发者_如何学编程te a grid, load it into memory, and then interpolate.
Well I did a small test and I would definitely recommend using Memory Mapped Files. I Created a File containing 350M double values (2.6 GB as many mentioned before) and then tested the time it takes to map the file to memory and then access any of the elements.
In all my tests in my laptop (Win7, .Net 4.0, Core2 Duo 2.0 GHz, 4GB RAM) it took less than a second to map the file and at that point accessing any of the elements took virtually 0ms (all time is in the validation of the index). Then I decided to go through all 350M numbers and the whole process took about 3 minutes (paging included) so if in your case you have to iterate they may be another option.
Nevertheless I wrapped the access, just for example purposes there a lot conditions you should check before using this code, and it looks like this
public class Storage<T> : IDisposable, IEnumerable<T> where T : struct
{
MemoryMappedFile mappedFile;
MemoryMappedViewAccessor accesor;
long elementSize;
long numberOfElements;
public Storage(string filePath)
{
if (string.IsNullOrWhiteSpace(filePath))
{
throw new ArgumentNullException();
}
if (!File.Exists(filePath))
{
throw new FileNotFoundException();
}
FileInfo info = new FileInfo(filePath);
mappedFile = MemoryMappedFile.CreateFromFile(filePath);
accesor = mappedFile.CreateViewAccessor(0, info.Length);
elementSize = Marshal.SizeOf(typeof(T));
numberOfElements = info.Length / elementSize;
}
public long Length
{
get
{
return numberOfElements;
}
}
public T this[long index]
{
get
{
if (index < 0 || index > numberOfElements)
{
throw new ArgumentOutOfRangeException();
}
T value = default(T);
accesor.Read<T>(index * elementSize, out value);
return value;
}
}
public void Dispose()
{
if (accesor != null)
{
accesor.Dispose();
accesor = null;
}
if (mappedFile != null)
{
mappedFile.Dispose();
mappedFile = null;
}
}
public IEnumerator<T> GetEnumerator()
{
T value;
for (int index = 0; index < numberOfElements; index++)
{
value = default(T);
accesor.Read<T>(index * elementSize, out value);
yield return value;
}
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
T value;
for (int index = 0; index < numberOfElements; index++)
{
value = default(T);
accesor.Read<T>(index * elementSize, out value);
yield return value;
}
}
public static T[] GetArray(string filePath)
{
T[] elements;
int elementSize;
long numberOfElements;
if (string.IsNullOrWhiteSpace(filePath))
{
throw new ArgumentNullException();
}
if (!File.Exists(filePath))
{
throw new FileNotFoundException();
}
FileInfo info = new FileInfo(filePath);
using (MemoryMappedFile mappedFile = MemoryMappedFile.CreateFromFile(filePath))
{
using(MemoryMappedViewAccessor accesor = mappedFile.CreateViewAccessor(0, info.Length))
{
elementSize = Marshal.SizeOf(typeof(T));
numberOfElements = info.Length / elementSize;
elements = new T[numberOfElements];
if (numberOfElements > int.MaxValue)
{
//you will need to split the array
}
else
{
accesor.ReadArray<T>(0, elements, 0, (int)numberOfElements);
}
}
}
return elements;
}
}
Here is an example of how you can use the class
Stopwatch watch = Stopwatch.StartNew();
using (Storage<double> helper = new Storage<double>("Storage.bin"))
{
Console.WriteLine("Initialization Time: {0}", watch.ElapsedMilliseconds);
string item;
long index;
Console.Write("Item to show: ");
while (!string.IsNullOrWhiteSpace((item = Console.ReadLine())))
{
if (long.TryParse(item, out index) && index >= 0 && index < helper.Length)
{
watch.Reset();
watch.Start();
double value = helper[index];
Console.WriteLine("Access Time: {0}", watch.ElapsedMilliseconds);
Console.WriteLine("Item: {0}", value);
}
else
{
Console.Write("Invalid index");
}
Console.Write("Item to show: ");
}
}
UPDATE I added a static method to load all data in a file to an array. Obviously this approach takes more time initially (on my laptop takes between 1 and 2 min) but after that access performance is what you expect from .Net. This method should be useful if you have to access data frequently.
Usage is pretty simple
double[] helper = Storage<double>.GetArray("Storage.bin");
HTH
It sounds extremely unlikely that you'll actually be able to fit this into a contiguous array in memory, so presumably the way in which you parallelize the load depends on the actual data structure.
(Addendum: LukeH pointed out in comments that there is actually a hard 2GB limit on object size in the CLR. This is detailed in this other SO question.)
Assuming you're reading the whole thing from one disk, parallelizing the disk reads is probably a bad idea. If there's any processing you need to do to the numbers as or after you load them, you might want to consider running that in parallel at the same time you're reading from disk.
The first question you have presumably already answered is "does this have to be precalculated?". Is there some algorithm you can use that will make it possible to calculate the required values on demand to avoid this problem? Assuming not...
That is only 2.6GB of data - on a 64 bit processor you'll have no problem with a tiny amount of data like that. But if you're running on a 5 year old computer with a 10 year old OS then it's a non-starter, as that much data will immediately fill the available working set for a 32-bit application.
One approach that would be obvious in C++ would be to use a memory-mapped file. This makes the data appear to your application as if it is in RAM, but the OS actually pages bits of it in only as it is accessed, so very little real RAM is used. I'm not sure if you could do this directly from C#, but you could easily enough do it in C++/CLI and then access it from C#.
Alternatively, assuming the question "do you need all of it in RAM simultaneously" has been answered with "yes", then you can't go for any kind of virtualisation approach, so...
Loading in multiple threads won't help - you are going to be I/O bound, so you'll have n threads waiting for data (and asking the hard drive to seek between the chunks they are reading) rather than one thread waiitng for data (which is being read sequentially, with no seeks). So threads will just cause more seeking and thus may well make it slower. (The only case where splitting the data up might help is if you split it to different physical disks so different chunks of data can be read in parallel - don't do this in software; buy a RAID array)
The only place where multithreading may help is to make the load happen in the background while the rest of your application starts up, and allow the user to start using the portion of the data that is already loaded while the rest of the buffer fills, so the user (hopefully) doesn't have to wait much while the data is loading.
So, you're back to loading the data into one massive array in a single thread...
However, you may be able to speed this up considerably by compressing the data. There are a couple of general approaches woth considering:
If you know something about the data, you may be able to invent an encoding scheme that makes the data smaller (and therefore faster to load). e.g. if the values tend to be close to each other (e.g. imagine the data points that describe a sine wave - the values range from very small to very large, but each value is only ever a small increment from the last) you may be able to represent the 'deltas' in a float without losing the accuracy of the original double values, halving the data size. If there is any symmetry or repetition to the data you may be able to exploit it (e.g. imagine storing all the positions to describe a whole circle, versus storing one quadrant and using a bit of trivial and fast maths to reflect it 4 times - an easy way to quarter the amount of data I/O). Any reduction in data size would give a corresponding reduction in load time. In addition, many of these schemes would allow the data to remain "encoded" in RAM, so you'd use far less RAM but still be able to quickly fetch the data when it was needed.
Alternatively, you can very easily wrap your stream with a generic compression algorithm such as Deflate. This may not work, but usually the cost of decompressing the data on the CPU is less than the I/O time that you save by loading less source data, so the net result is that it loads significantly faster. And of course, save a load of disk space too.
In typical case, loading speed will be limited by speed of storage you're loading data from--i.e. hard drive.
If you want it to be faster, you'll need to use faster storage, f.e. multiple hard drives joined in a RAID scheme.
If your data can be reasonably compressed, do that. Try to find algorithm which will use exactly as much CPU power as you have---less than that and your external storage speed will be limiting factor; more than that and your CPU speed will be limiting factor. If your compression algorithm can use multiple cores, then multithreading can be useful.
If your data are somehow predictable, you might want to come up with custom compression scheme. F.e. if consecutive numbers are close to each other, you might want to store differences between numbers---this might help compression efficiency.
Do you really need double precision? Maybe floats will do the job? Maybe you don't need full range of doubles? For example if you need full 53 bits of mantissa precision, but need only to store numbers between -1.0 and 1.0, you can try to chop few bits per number by not storing exponents in full range.
Making this parallel would be a bad idea unless you're running on a SSD. The limiting factor is going to be the disk IO--and if you run two threads the head is going to be jumping back and forth between the two areas being read. This will slow it down a lot more than any possible speedup from parallelization.
Remember that drives are MECHANICAL devices and insanely slow compared to the processor. If you can do a million instructions in order to avoid a single head seek you will still come out ahead.
Also, once the file is on disk make sure to defrag the disk to ensure it's in one contiguous block.
That does not sound like a good idea to me. 350,000,000 * 8 bytes = 2,800,000,000 bytes. Even if you manage to avoid the OutOfMemoryException
the process may be swapping in/out of the page file anyway. You might as well leave the data in the file and load smaller chucks as they are needed. The point is that just because you can allocate that much memory does not mean you should.
With a suitable disk configuration, splitting into multiple files across disks would make sense - and reading each file in a separate thread would then work nicely (if you've some stripyness - RAID whatever :) - then it could make sense to read from a single file with multiple threads).
I think you're on a hiding to nothing attempting this with a single physical disk, though.
Just saw this : .NET 4.0 has support for memory mapped files. That would be a very fast way to do it, and no support required for parallelization etc.
精彩评论