开发者

Computing the norm of a vector expression ||aW+bX+cY||

I am a PhD student. In the introduction of my thesis, I am insterested by the compromise between expressivity and performances of Linear Algebra tools.

As a simple example, I use the computation of the norm of a vector expression. The C code for my example is:

float normExpression3(float a, float *W, float b, float *X, float c, float*Y){
double norm = 0;
for (int i=0; i<n; ++i) // n in [3e6; 2e8]
{
    float tmp = a*W[i]+b*X[i]+c*Y[i];
    norm+=tmp*tmp;
}
return sqrtf(norm);

}

I compare the performances achieved with different techniques. As the vectors are big (several million elements), the performances are limited by the memory bandwidth. However, there are huge differences beetween the different approaches.

The optimized C version I wrote is not expressive (a new function has to be written to as a 4th vector) and very ugly (threaded and vectorized) but achieved 6.4 GFlops. On the other hand, MATLAB code is very nice:

result = norm(a*W+b*X+c*Y)

but only achieves 0.28 GFlops.

C++ templates expressions à la Blitz++ provide both expressivity and performances to the user (6.5 GFlops).

As a part of my analysis, I would like to know how functionnal languages can compare to these approaches. I thought about showing an example either in Haskell or in OCaml (AFAIK, both are reputed well suited for this kind of operation).

I know none of these languages. I could learn on of them to provide my example but this would'nt be a fair comparison: I am not sure to be able to provide an implementation allowing both expressivity and performances.

So my two questions are : 1) which language is best suited ? 2) how could the norm of vector expressions be computed efficiently without compromising the generality of the 开发者_如何转开发implementation ?

By advance, thank you !

Wilfried K.


Edit : corrected the type of the norm accumulator for float to double


For what is worth, the following is an OCaml version of your function:

let normExpression3 a w b x c y =
    let n = Array.length w in
    if not (n = Array.length x && n = Array.length y)
        then invalid_arg "normExpression3";
    let (@) = (Array.unsafe_get : float array -> int -> float) in
    let rec accum a w b x c y n i norm =
        if i = n then sqrt norm else
        let t = a *. (w @ i) +. b *. (x @ i) +. c *. (y @ i) in
        accum a w b x c y n (i + 1) (norm +. t)
    in accum a w b x c y n 0 0.

It makes some allowances for performance, namely:

  • Unchecked array access (or rather, array bounds checks manually hoisted out of the loop)
  • Monomorphic array access
  • Recursive inner loop to avoid boxing and unboxing the float accumulator
  • Lambda-lifting of the inner loop to avoid referencing closed-over values

The last optimization should be checked against a closed-over inner loop, since with so many parameters register spilling might dominate over the cost of referencing closed-over parameters.

Note that one normally wouldn't bother with this kind of optimizations unless one were to compete in a benchark ;-) Note also that you'll need to test this with a 64-bit OCaml, since arrays are otherwise limited to 4 mega elements.


1) which language is best suited ?

Either are used for this kind of task. The primary concern would be availability of required libraries (e.g. for vectors or matrices), and whether there was any need for parallelism.

Libraries such as vector and repa in Haskell are well-suited. And in the case of repa, you get parallelism as well.

2) how could the norm of vector expressions be computed efficiently without compromising the generality of the implementation ?

One approach would be to use meta-programming techniques to generate specialized implementations of the computational kernels from high level descriptions. In functional languages this is a relatively common technique, based on small domain-specific languages, with custom code generators.

See e.g. Specialising Simulator Generators for High-Performance Monte-Carlo Methods or the work in OCaml on FFTW.


Not so much as a functional language answer per se, but please note that your implementation to calculate norm (in C) is way different how matlab actually calculates it.

And yes, there are indeed very good reasons for that. Most probably your approximation of norm is pretty much useless (as how it's implemented currently) for any real use case. Please don't underestimate the 'difficulties' related to calculating numerically superior approximations of norms.


As don said, consider Repa. Here is some simple code to get you started.

import Data.Array.Repa as R

len :: Int
len = 50000

main = do
    let ws = R.fromList (Z :. len) [0..len-1]
        xs = R.fromList (Z :. len) [10498..10498 + len - 1]
        ys = R.fromList (Z :. len) [8422..8422 + len - 1]
    print (multSum 52 73 81 ws xs ys)

multSum a b c ws xs ys = R.map (a*) ws +^ R.map (b*) xs +^ R.map (c*) ys

This still leaves you to find a good way to get the data from disk and into a Repa array. I think reading it all in as a Llazy ByteString and using Repa.fromFunction should do, perhaps someone will chime in with a smarter way.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜