Large arrays in C
I'm trying to implement the OSEM algorithm (I'm trying because I have to, not just for fun) and I hav开发者_如何学Pythone a question:
Since I'll be working with very large matrices, I want to know the maximum array size (C language) I can allocate with malloc. From what I've read it depends on your OS and Hardware: I'm working on an Intel Xeon E5530 2.40 Ghz, Red Had Enterprise 64 bits, Nvidia Quadro FX 3800.
The matrices I'll be working with, have something like these dimensions: float/double 2000x1000x20.
Given that those matrices are to be worked with CUDA C, I must allocate the matrices in 1D arrays like this:
float*matrix=(float*)malloc(sizeof(float)*2000*1000*20));
Thanks in advance ;)
These are relatively small allocations - around 160 MB for float, 320 MB for double. Unless you have a lot of these matrices concurrently then there shouldn't be a problem.
The main limitation will be with CUDA, where you may be limited by the total amount of physical memory on your GPU card, but again, unless you have a significant number of these matrices then you should be OK with any current CUDA-compatible GPU card.
Theoretically, there largest possible buffer you can allocate on a 64bit system is 264, which is much larger than your 2000x1000x20 array. It's also much larger than all the memory you can ever process with a computer.
On a 32bit system it's usually 2GB. (Some systems allow 3 or 4GB.) That's 2.1 * 109 bytes. The sizeof(float)
is 4 bytes. Let's see, you've got:
2000 * 1000 * 20 = 4 * 107
Multiplying that by the size of a float:
4 * 107 * 4 = 1.6 * 108
Even though 1.6*108 is quite an impressive number, you could even allocate that much memory on a 32bit system.
I wouldn't worry about it.
Here are some other considerations.
Do not worry about the although big matrix sizes, unless you need multiple images that could saturate the GPU Memory.
If you can process the images with a small set each time DO use the AsyncAPI to Upload / Process / Download. While computing the first result you could be uploading the next image.
Experiment with CudaMallocHost, non pageable memory ie MUCH faster data transfer
Experiment with Pitched Memory on the device, even if it consumes more memory provides better access performances
Last but not least get a better card: You can get 360 cores for ~200 $ for example with a Gtx 460
The maximum size of the arrays that you can use (i.e. the maximum amount of memeory you can allocate using malloc
in this case) is not restricted by anything in the C language itself. It depends entirely on the amount of memory you have available in the machine.
精彩评论