Fast algorithm for boundary value problem

2023-03-17 19:32 问答作者：

I am looking for the fastest way to solve the following problem:

Given some volume of lattice points in a 3D grid, some points b_i (the boundary) satisfy f(b_i)=0, while another point a_0 satisfies f(a_0)= 1.

All other points (non-boundary) are some linear combination of the surrounding 26 points. For example, I could want

f(i, j, k)= .05*f(i+1, j, k)+.1*f(i, j+1, k)+.07*f(i, j, k+1)+...

The sum of the coefficients .05+.1+.07+... will add up to 1. My objective is to solve for f(x_i) for all x_i in the volume.

Currently, I am using the successive over-relaxation (SOR) method, which basically initializes the boundary of the volume, assigns to each point the weighted average of the 26 surrounding points, and repeats. The SOR method just takes a combination of f(x_i) after the most recent iteration and f(x_i) after an iteration before.

I was wondering if anyone knows of any faster ways to solve this problem for a 3D grid around the size 102x102x48. SOR currently takes about 500-1000 iterations to converge to the level I want (varying depending on the coefficients used). I am most willing to use matlab, idl, and c++. Does anyone have any idea of how fast SOR is compared to converting the problem into a linear system and using matrix methods (like BCGSTAB)? Also, which method would be most effectively (and easily) parallelized? I have access to a 250 core cluster, and have been trying to make the SOR method parallel using mpi and c++, but haven't seen as much speed increase as I would like (ideal would be on the order of 100-fold). I would greatly appreciate any ideas on speeding开发者_运维问答 up the solution to this problem. Thanks.

If you're comfortable with multithreading, using a red-black scheme for SOR can give a decent speedup. For a 2D problem, imagine a checkerboard - the red squares only depend on the black squares (and possibly themselves), so you can update all the red squares in parallel, and then repeat for all the black ones. Note that this does converge more slowly than the simple ordering, but it lets you spread the work over multiple threads.

Conjugate gradient methods will generally converge faster than SOR (if I remember correctly, by about an order of magnitude). I've never used BCGSTAB, but I remember GMRES working well on non-symmetric problems, and they can probably both benefit from preconditioning.

As for opportunities for parallelization, most CG-type methods only need you to compute the matrix-vector product A*x, so you never need to form the full matrix. That's probably the biggest cost of each iteration, and so it's where you should look at multithreading.

Two good references on this are Golub and Van Loan, and Trefethen and Bau. The latter is much more readable IMHO.

Hope that helps...

I've got (I think) an exact solution, however, it might be long. The main problem would be computation on a very very big (and hopefully sparse) matrix.

Let's say you have N points in your volume. Let's call p1...pN these points. What you are looking for is f(p1)...f(pN)

Let's take a point pi. It has got 26 neighbours. Let's call them pi-1...pi-26. I suppose you have access for each point pi to the coefficients of the linear combination. Let's call them ci-1...ci-j...ci-26 (for "coefficient of the linear combination for point pi to its j-th neighbour)

Let's do even better, you can consider that pi is a linear combination of all the points in your space, except that most of them (except 26) are equals to 0. You have your coefficients ci-1...ci-N.

You can now build a big matrix N*N of these ci-j coefficients :

+---------------------    -------+   
|  0  | c1-2 | c1-3 | ... | c1-N |   |f(p1)|    |f(p1)|
+---------------------    -------+   |     |    |     |
| c2_1|   0  | c2-3 | ... | c1-N |   |f(p2)|    |f(p2)|
+---------------------    -------+ * .     . =  .     .
|                                |   .     .    .     .
.                                .   .     .    .     .
+---------------------    -------+   .     .    .     .
| cN-1| cN-2 | cN-3 | ... |   0  |   |f(pN)|    |f(pN)|
+---------------------    -------+

Amazing! The solution you're looking for is one of the eigenvectors corresponding to the eigenvalue 1!

Use an optimised matrix library that computes eigenvectors efficiently (for sparse matrix) and hope it is already parallelised!

Edit : Amusing, I just reread your post, seems I just gave you your BCGSTAB solution... Sorry! ^^

Re-edit : In fact, I'm not sure, are you talking about "Biconjugate Gradient Stabilized Method"? Because I don't see the linear method you're talking about if you're doing gradient descent...

Re-re-edit : I love math =) I can even prove that given the condition sum of the ci = 1 that 1 is in fact an eigenvalue. I do not know the dimension of the corresponding space, but it is at least 1!

继续阅读：algorithm boundary pde performance

Fast algorithm for boundary value problem

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？