开发者

GPU-friendly 2D line segment intersection algorithm

I'm looking for an algorithm that tests whether 2 line segments are intersecting in a GPU-friendly way. The line segments are in 2D. While there are many algorithms discussed on the web for doing this, all the ones I have seen use a lot of branching instructions. On a CPU, this is not such a problem. However, on a GPU, a lot of branching instructions will kill performance.

Does anyone know of a good algorithm for this environment? Any sample开发者_高级运维 pseudo-code, CUDA code, OpenCL code, or DirectCompute compute would be appreciated.


FOLLOW UP

In case anyone is interested, this is (basically) what I ended up with. I trimmed it down, so it's more like pseudo-code than OpenCL C, but hopefully it gets the gist across.

__kernel void findCrossingLines(__global float2 p1, __global float2 p2, 
                                __global float2 p3, __global float2 p4, 
                                __global bool* output)
{
    int result = 0;
    float2 A = p2;
    float2 a = p2 - p1;
    float2 B = p4;
    float2 b = p4 - p3;
    float2 c = B - A;
    float2 b_perp = (float2)(-b.y, b.x);

    float numerator = dot(b_perp, c);
    float denominator = dot(b_perp, a);
    bool isParallel = (denominator == 0.0f);

    float quotient = numerator / denominator;
    float2 intersectionPoint = (float2)(quotient * a.x + A.x, quotient * a.y + A.y);

    *output = (!isParallel && 
                  intersectionPoint.x >= min(p1.x, p2.x) && 
                  intersectionPoint.x >= min(p3.x, p4.x) &&
                  intersectionPoint.x <= max(p1.x, p2.x) && 
                  intersectionPoint.x <= max(p3.x, p4.x) &&
                  intersectionPoint.y >= min(p1.y, p2.y) && 
                  intersectionPoint.y >= min(p3.y, p4.y) &&
                  intersectionPoint.y <= max(p1.y, p2.y) && 
                  intersectionPoint.y <= max(p3.y, p4.y));
}


This is outside my area of expertise, but the following thread on the NVIDIA CUDA forums discussed this topic and contains some code that may provide a useful starting point:

https://forums.developer.nvidia.com/t/intersecting-line-segments-on-gpu/18832

Note that code does not have to be completely branchless to be efficient on NVIDIA GPUs. The architecture supports predication as well as "select"-type instructions (akin to the ternary operator in C/C++), and the compiler is pretty good at mapping local branching to those branch-less constructs. I would recommend using cuobjdump to check the generated machine code, that way you'll know precisely how many branches are actually being used.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜