accumulator filling for Hough transform
I wrote a piece of code that needs to be optimized. Just felt like checking with community to see if that code is indeed optimal. It fills up the accumulator for the Hough transform. I actually just copy pasted most the code from the OpenCV li开发者_高级运维brary. Thanks!
int i,j,n,index;
for (i = 0;i<numrows;i++)
{
for (j = 0;j<numcols;j++)
{
if (img[i*numcols + j] == 100)
{
for (n = 300;n<600;n++)
{
index = cvRound(j*tabCos[n] + i * tabSin[n]) + (numrho-1)/2;
accum[(n+1) * (numrho+2) + index+1]++;
}
}
}
}
There is a large and repetitive Hough transform in a piece of code I'm vaguely attached too. The maintainer of that part of the code has been experimenting with sparse arrays (actually a C++ std::map
keyed on the cell index if I understood his presentation right) for the accumulator with some success.
I presume the speed up is related to cache locality issues, and it certainly depends on the data being sparse.
Update: The software referenced above is intended to serve many particle physics experiments, but was originally used on a test-bed project (i.e. small scale). As we've gotten serious about doing larger projects and started doing Monte Carlo for them, the Hough transform has gotten to be a bit of a bottle neck again even with the sparse matrix.
As yet we do not have a solution, but one of colleagues found Gandalf which includes "fast hough transform", which appears to evaluate the transform in a quad-tree like way (in 2D, presumably you use a oct-tree in 3D) to reduce the order of work. We're probably going to experiment with this.
Further update: A colleague eventual implemented a progressive, probabilistic Hough transform in our code which currently seems to be the fastest version we've got. Works best if you don't require that every point gets assigned to a line.
No it's not. Replace as many of the []
usages as you can by simple pointer arithmetic to iterate the arrays in question. Abstract out invariant expressions into local variables.
However, the first question is, does your profiler show that this code is a bottleneck in the context of your entire app. If not, why bother micro-optimizing this?
EDIT: loop micro-optimization - prefer the second as no array indexing required (mult versus add)
int ints[100];
int i;
int *pi;
for (i = 0; i < 100; ++i)
{
printf("%d", ints[i]);
}
for (pi = ints; pi < ints + 100; ++pi)
{
printf("%d", *pi);
}
Depending on your application, there are numerous way to optimise Hough Transform and fiddling with low-level code is possibly the last of them. I would start with Randomised HT or Multiresolution HT, followed Hybrid approach merge. I believe it is better to optimised algorithm first. Last step would be do use hardware optimisation like CAD memory.
精彩评论