GPU-friendly 2D line segment intersection algorithm
I'm looking for an algorithm that tests whether 2 line segments are intersecting in a GPU-friendly way. The line segments are in 2D. While there are many algorithms discussed on the web for doing this, all the ones I have seen use a lot of branching instructions. On a CPU, this is not such a problem. However, on a GPU, a lot of branching instructions will kill performance.
Does anyone know of a good algorithm for this environment? Any sample开发者_高级运维 pseudo-code, CUDA code, OpenCL code, or DirectCompute compute would be appreciated.
FOLLOW UP
In case anyone is interested, this is (basically) what I ended up with. I trimmed it down, so it's more like pseudo-code than OpenCL C, but hopefully it gets the gist across.
__kernel void findCrossingLines(__global float2 p1, __global float2 p2,
__global float2 p3, __global float2 p4,
__global bool* output)
{
int result = 0;
float2 A = p2;
float2 a = p2 - p1;
float2 B = p4;
float2 b = p4 - p3;
float2 c = B - A;
float2 b_perp = (float2)(-b.y, b.x);
float numerator = dot(b_perp, c);
float denominator = dot(b_perp, a);
bool isParallel = (denominator == 0.0f);
float quotient = numerator / denominator;
float2 intersectionPoint = (float2)(quotient * a.x + A.x, quotient * a.y + A.y);
*output = (!isParallel &&
intersectionPoint.x >= min(p1.x, p2.x) &&
intersectionPoint.x >= min(p3.x, p4.x) &&
intersectionPoint.x <= max(p1.x, p2.x) &&
intersectionPoint.x <= max(p3.x, p4.x) &&
intersectionPoint.y >= min(p1.y, p2.y) &&
intersectionPoint.y >= min(p3.y, p4.y) &&
intersectionPoint.y <= max(p1.y, p2.y) &&
intersectionPoint.y <= max(p3.y, p4.y));
}
This is outside my area of expertise, but the following thread on the NVIDIA CUDA forums discussed this topic and contains some code that may provide a useful starting point:
https://forums.developer.nvidia.com/t/intersecting-line-segments-on-gpu/18832
Note that code does not have to be completely branchless to be efficient on NVIDIA GPUs. The architecture supports predication as well as "select"-type instructions (akin to the ternary operator in C/C++), and the compiler is pretty good at mapping local branching to those branch-less constructs. I would recommend using cuobjdump to check the generated machine code, that way you'll know precisely how many branches are actually being used.
精彩评论