开发者

Parallelising Cholesky decomposition for use in training a machine learning algorithm

I am trying to work out if I can parallelise the training aspect of a machine learning algorithm. The computationally expensive part of the training involves Cholesky decomposing a positive-definite matrix (covariance matrix). I'll try and frame the question purely in terms of the matrix algebra. Let me know if you need any more info.

Lets say we have a block matrix (covariance matrix, but that's not relevant to the problem)

 
M = A  B  
    B* C

where A and 开发者_运维百科C relate to training data from two different sets. Both A , and B are positive definite. Lets also assume for simplicity that A and C have size nxn.

There is a formula for carrying out block Cholesky decomposition. See http://en.wikipedia.org/wiki/Block_LU_decomposition. Summarising we have the following result.

M = LU

where (* indicates transpose)

L = A^{1/2}      0 
    B*A^{-*/2}  Q^{1/2}

where

Q = C - B*A^{-1}B

Now lets say training related to matrices A and C has already been carried out, so we have carried out the cholesky decomposition for A, and C giving A^{1/2}, and C^{1/2} (It is therefore straightforward to calculate the inverses A^{-1/2}, and C^{-1/2} using forward substitution).

Rewriting the Q in terms of these quantities we now have.

Q = Q^{1/2} Q^{*/2} = C^{1/2} C^{*/2} - B* A^{-*/2}A^{-1/2} B

My question is this: Given this set up is it possible to algebraicly calculate Q^{1/2} without having to apply cholesky decomposition to Q. Or in other words can I use C^{1/2} to help me in the calculation of Q^{1/2}. If this were possible it would then be possible to easily parallelise the training.

Thanks in advance for any replies. Sorry about the matrix typesetting. Is there any way sensible way to typeset maths or matrices in particular?

Matt.


You can do this with a sequence of cholesky downdates:

(Below I use ' for transpose to avoid confusion with multiplication)

If the cholesky factor of A is a, and of C is c, then the equation for Q can be written

Q = c*c' - beta'*beta where beta = inverse(a)B (ie solve abeta = B for beta)

If we write b[i] for the i'th column of beta, then

Q = c*c' - Sum b[i]*b[i]'

Finding the cholesky decomposition of

cc' - xx' (where x is a vector and c is lower triangular)

is known as a rank 1 cholesky downdate. There is a stable algorithm for this in Golub and van Loan


I think I've come to an answer although it is not exactly as I'd hoped.

Removing the machine learning context, my question boiled down to whether knowing C^{1/2} would help in the calculation of Q^{-1/2}. I'll go into more detail below but to cut the chase, the answer is yes, but only with respect to stability and not computation (can't prove this to be the case currently, but fairly certain).

For why the answer is yes wrt to stability we look at the definition Q from the original question has been rearranged as follows.

Q = C - B* A^{-1} B = (C^{1/2} + B*A^{-*/2})(C^{1/2} - B*A^{-*/2})*

By knowing C^{1/2} before hand, we can calculate Q without having to invert A directly. Direct inversion is not numerically stable.

Sadly, although I have done a fair amount of research on the subject, it does not appear that $C^{1/2}$ helps wrt computation in the exact calculation of Q^{-1/2}. The best approach appears to be to calculate Q using C^{1/2} as above and then use Cholesky to decompose Q to Q^{1/2} and then forward substitution to calculate Q^{-1/2}.

Further Research

One area I did not look into in much detail was whether it was possible to use C^{1/2} to approximate Q^{-1/2}. Something along the lines of an iterative method using C^{1/2} as a starting point. I do not know of any such iterative approximation process, but I'll keep searching. I may even start a new question with that as the focus.

I'll update you all if I have any major breakthroughs.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜