1D Fast Convolution without FFT
I need an 1D Convolution against 2 big arrays. I'm using this code in C# but it takes a loooong time to run.
I know, i know! FFT convolutions is very fast. But in this project i CAN'T use it. It is a constraint of the project to not use FFT (please don't ask why :/).
This is my code in C# (ported fro开发者_如何学编程m matlab, by the way):
var result = new double[input.Length + filter.Length - 1];
for (var i = 0; i < input.Length; i++)
{
for (var j = 0; j < filter.Length; j++)
{
result[i + j] += input[i] * filter[j];
}
}
So, anyone knows any fast convolution algorithm widthout FFT?
Convolution is numerically the same as a polynomial multiplication with an extra wrap-around step. Therefore, all the polynomial and large integer multiplication algorithms can be used to perform convolution.
FFT is the only way to get the fast O(n log(n)) run-time. But you can still get sub-quadratic run-time using the divide-and-conquer approaches like Karatsuba's algorithm.
Karatsuba's algorithm is fairly easy to implement once you understand how it works. It runs in O(n^1.585), and will probably be faster than trying to super-optimize the classic O(n^2) approach.
You could reduce the number of indexed accesses to result
, as well as the Length
properties:
int inputLength = filter.Length;
int filterLength = filter.Length;
var result = new double[inputLength + filterLength - 1];
for (int i = resultLength; i >= 0; i--)
{
double sum = 0;
// max(i - input.Length + 1,0)
int n1 = i < inputLength ? 0 : i - inputLength + 1;
// min(i, filter.Length - 1)
int n2 = i < filterLength ? i : filterLength - 1;
for (int j = n1; j <= n2; j++)
{
sum += input[i - j] * filter[j];
}
result[i] = sum;
}
If you further split the outer loop, you can get rid of some repeating conditionals. (This assumes 0 < filterLength
≤ inputLength
≤ resultLength
)
int inputLength = filter.Length;
int filterLength = filter.Length;
int resultLength = inputLength + filterLength - 1;
var result = new double[resultLength];
for (int i = 0; i < filterLength; i++)
{
double sum = 0;
for (int j = i; j >= 0; j--)
{
sum += input[i - j] * filter[j];
}
result[i] = sum;
}
for (int i = filterLength; i < inputLength; i++)
{
double sum = 0;
for (int j = filterLength - 1; j >= 0; j--)
{
sum += input[i - j] * filter[j];
}
result[i] = sum;
}
for (int i = inputLength; i < resultLength; i++)
{
double sum = 0;
for (int j = i - inputLength + 1; j < filterLength; j++)
{
sum += input[i - j] * filter[j];
}
result[i] = sum;
}
You can use a special IIR filter. Then process that as like:
y(n)= a1*y(n-1)+b1*y(n-2)...+a2*x(n-1)+b2*x(n-2)......
I think it's faster.
Here are two possibilities that may give minor speedups, but you'd need to test to be sure.
- Unroll the inner loop to remove some tests. This will be easier if you know the filter length will always be a multiple if some N.
- Reverse the order of the loops. Do
filter.length
passes over the whole array. This does less dereferencing in the inner loop but may have worse caching behavior.
精彩评论