开发者

Fast algorithm for boundary value problem

I am looking for the fastest way to solve the following problem:

Given some volume of lattice points in a 3D grid, some points b_i (the boundary) satisfy f(b_i)=0, while another point a_0 satisfies f(a_0)= 1.

All other points (non-boundary) are some linear combination of the surrounding 26 points. For example, I could want

f(i, j, k)= .05*f(i+1, j, k)+.1*f(i, j+1, k)+.07*f(i, j, k+1)+...

The sum of the coefficients .05+.1+.07+... will add up to 1. My objective is to solve for f(x_i) for all x_i in the volume.

Currently, I am using the successive over-relaxation (SOR) method, which basically initializes the boundary of the volume, assigns to each point the weighted average of the 26 surrounding points, and repeats. The SOR method just takes a combination of f(x_i) after the most recent iteration and f(x_i) after an iteration before.

I was wondering if anyone knows of any faster ways to solve this problem for a 3D grid around the size 102x102x48. SOR currently takes about 500-1000 iterations to converge to the level I want (varying depending on the coefficients used). I am most willing to use matlab, idl, and c++. Does anyone have any idea of how fast SOR is compared to converting the problem into a linear system and using matrix methods (like BCGSTAB)? Also, which method would be most effectively (and easily) parallelized? I have access to a 250 core cluster, and have been trying to make the SOR method parallel using mpi and c++, but haven't seen as much speed increase as I would like (ideal would be on the order of 100-fold). I would greatly appreciate any ideas on speeding开发者_运维问答 up the solution to this problem. Thanks.


If you're comfortable with multithreading, using a red-black scheme for SOR can give a decent speedup. For a 2D problem, imagine a checkerboard - the red squares only depend on the black squares (and possibly themselves), so you can update all the red squares in parallel, and then repeat for all the black ones. Note that this does converge more slowly than the simple ordering, but it lets you spread the work over multiple threads.

Conjugate gradient methods will generally converge faster than SOR (if I remember correctly, by about an order of magnitude). I've never used BCGSTAB, but I remember GMRES working well on non-symmetric problems, and they can probably both benefit from preconditioning.

As for opportunities for parallelization, most CG-type methods only need you to compute the matrix-vector product A*x, so you never need to form the full matrix. That's probably the biggest cost of each iteration, and so it's where you should look at multithreading.

Two good references on this are Golub and Van Loan, and Trefethen and Bau. The latter is much more readable IMHO.

Hope that helps...


I've got (I think) an exact solution, however, it might be long. The main problem would be computation on a very very big (and hopefully sparse) matrix.

Let's say you have N points in your volume. Let's call p1...pN these points. What you are looking for is f(p1)...f(pN)

Let's take a point pi. It has got 26 neighbours. Let's call them pi-1...pi-26. I suppose you have access for each point pi to the coefficients of the linear combination. Let's call them ci-1...ci-j...ci-26 (for "coefficient of the linear combination for point pi to its j-th neighbour)

Let's do even better, you can consider that pi is a linear combination of all the points in your space, except that most of them (except 26) are equals to 0. You have your coefficients ci-1...ci-N.

You can now build a big matrix N*N of these ci-j coefficients :

+---------------------    -------+   
|  0  | c1-2 | c1-3 | ... | c1-N |   |f(p1)|    |f(p1)|
+---------------------    -------+   |     |    |     |
| c2_1|   0  | c2-3 | ... | c1-N |   |f(p2)|    |f(p2)|
+---------------------    -------+ * .     . =  .     .
|                                |   .     .    .     .
.                                .   .     .    .     .
+---------------------    -------+   .     .    .     .
| cN-1| cN-2 | cN-3 | ... |   0  |   |f(pN)|    |f(pN)|
+---------------------    -------+

Amazing! The solution you're looking for is one of the eigenvectors corresponding to the eigenvalue 1!

Use an optimised matrix library that computes eigenvectors efficiently (for sparse matrix) and hope it is already parallelised!

Edit : Amusing, I just reread your post, seems I just gave you your BCGSTAB solution... Sorry! ^^

Re-edit : In fact, I'm not sure, are you talking about "Biconjugate Gradient Stabilized Method"? Because I don't see the linear method you're talking about if you're doing gradient descent...

Re-re-edit : I love math =) I can even prove that given the condition sum of the ci = 1 that 1 is in fact an eigenvalue. I do not know the dimension of the corresponding space, but it is at least 1!

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜