Is it better to have multiple arrays or one multi-dimensional array?
I am working on a project where I have to perform calculations on arrays of data in PHP. Some of these calculations involve working with multiple arrays. All are the same length (count).
Question: Is it more efficient (memory and processor usage) to place the data into a multi-dimensional array or keep in two arrays.
Keep in mind that some of these arrays can have thousands of values.
Example: to better clarify, here is an example of the data and usage:
X = 1,2,3,4,5
Y = 2,3,3,4,4
Calculate correlation between X and Y.
To do this:
- Get sum of X and Y from columns
- Get sum of X^2 and Y^2 from columns
- Then calculate with the formula for Correlation
My Thoughts: Combining the two arrays into a multi-dimensional array would allow fewer iterations to calculate but they would need to be combined first.
So my main concern and reason for asking is does it take fewer resources to create a multi-dimensional array and iterate over it 1x or is it better to keep them separate and iterate over each one - making 2 iterations.
Or is there a better way开发者_运维百科 to perform calculations on arrays that does not involve iterations?
If you already have the data as two separate arrays, merging them first would be a waste of time and resources I would imagine.
There are two forms of array access in PHP, iterative, which uses the internal pointers and is sequentially accessed and via the associated key/index, which is a hash map and not sequential. If you are going to look at all elements of an array and it is possible to do so in order, then try and access it iteratively using either the built in array_ functions or the iterator functions reset(), next(), cur(), end(), each().
Have a look at the array_reduce() function in PHP, it can help you to achieve this sort of thing quickly. Though in this simple case you might be better doing a straight for() loop and using the array iterator functions reset(), next(), cur() to get the values out of each array - or if they are keyed identically them you can just perform a foreach() and use the key from one for the other.
$sum_x = array_reduce($x, create_function('$x1,$x2', 'return $x1 + $x2;'), 0);
$sum_y = array_reduce($y, create_function('$y1,$y2', 'return $y1 + $y2;'), 0);
$sum_x2 = array_reduce($x, create_function('$x1,$x2', 'return $x1 + $x2 * $x2;'), 0);
$sum_y2 = array_reduce($y, create_function('$y1,$y2', 'return $y1 + $y2 * $y2;'), 0);
or
$sum_x = 0;
$sum_y = 0;
$sum_x2 = 0;
$sum_y2 = 0;
foreach (array_keys($x) as $i) {
$sum_x += $x[$i];
$sum_y += $y[$i];
$sum_x2 += $x[$i] * $x[$i];
$sum_y2 += $y[$i] * $y[$i];
}
Considering that all arrays in PHP are hash tables, and associative, I would imagine that the biggest performance gain would be fewer iterations. I would use the multidimensional array.
Write a test case? You could use PEAR to determine this: http://pear.php.net/package/Benchmark
This is not PHP-specific. Locality of reference for data often matters, because cache misses are expensive.
For instance, if you're processing items in parallel arrays (all ?1
, then all ?2
...), it is more efficient to organize them in memory as:
A1 B1 C1 ... A2 B2 C2 ... A3 B3 C3 ...
Instead of the typical:
A1 A2 A3 ... B1 B2 B3 ... C1 C2 C3 ...
Of course, it depends on your specific calculation. Loading your data into the first layout may take considerable time. In the end, profiling is the only way to be certain.
I can't see how there would be any difference processor or memory wise between a two dimensional array or two one dimensional arrays. The same amount of memory should be used. Are they both going to have the same number or elements?
精彩评论