开发者

help with algorithm optimisation

I would like to optimize this algorithm. Function makeFrame divides the audio signal into time frames using a Hanning window of about 37 ms. Then function divideFreqs performs the fast fourier transform on each timeframe using jtransforms library (and it is the one that is the most time consuming). How could I cut down the time of this operation as this is taking way too long. For an audio file of 5 secs it takes around 13 secs to perform the operation. I was thinking about using multi-threading but never used it before.

 public double[][] makeFrame(double[] audioOutput) {

            int length = audioOutput.length;

            //calculate a hannining window size of 37 ms
            int window = (int) Math.round(0.37 * sampleRate);
            int interval = (开发者_开发技巧int) Math.round(0.0116 * sampleRate);
            length = length - window;
            int numintervals = length / interval;
            //calculate hanning window values
            double[] hanw = hanning(window);
            double[][] sections = new double[numintervals + 1][25];


            //divide the signal into timeframes using Hanning window of 37ms
            int k = 0;
            for (int i = 0; i < length; i += interval) {
                double[] temp = new double[88200];
                int t = 0;
                int s;

                s = i;

                for (; s < i + window; s++) {
                    temp[t] = audioOutput[s] * hanw[t];
                    t++;
                }
                sections[k] = divideFreqs(temp, sampleRate);
                k++;
            }

            return sections;
        }


public static double[] hanning(int window) {

int w = 0;

        double h_wnd[] = new double[window]; //Hanning window

        for (int i = 1; i < window; i++) { //calculate the hanning window
            h_wnd[i] = 0.5 * (1 - Math.cos(2.0 * Math.PI * i / (window + 1)));
        }

        return h_wnd;
    }

 public static double[] divideFreqs(double[] audioData, float fs) {

        DoubleFFT_1D fft = new DoubleFFT_1D(44100);
        int len;
        double[] secenergy;


        //Frequency bands in the range of 1Hz-20000Hz
        int[][] bandsec = new int[][]{
            {1, 100},
            {100, 200},
            {200, 300},
            {300, 400},
            {400, 510},
            {510, 630},
            {630, 770},
            {770, 920},
            {920, 1080},
            {1080, 1270},
            {1270, 1480},
            {1480, 1720},
            {1720, 2000},
            {2000, 2320},
            {2320, 2700},
            {2700, 3150},
            {3150, 3700},
            {3700, 4400},
            {4400, 5300},
            {5300, 6400},
            {6400, 7700},
            {7700, 9500},
            {9500, 12000},
            {12000, 15500},
            {15500, 20000}};


        //perform FFT on the data
        fft.realForwardFull(audioData);


        //splitting real and imaginary numbers
        double[] real = new double[22050];
        double[] imaginary = new double[22050];
        for (int row = 0; row < 22050; row++) {
            real[row] = (double) Math.round(audioData[row + row] * 100000000) / 100000000;
            imaginary[row] = (double) Math.round(audioData[row + row + 1] * 100000000) / 100000000;

        }

        len = bandsec.length;
        secenergy = new double[len];

        //calculate energy for each critical band
        double[] tempReal;
        double[] tempImag;
        for (int i = 0; i < len; i++) {
            int k = 0;
            tempReal = new double[bandsec[i][1] - (bandsec[i][0] - 1)];
            tempImag = new double[bandsec[i][1] - (bandsec[i][0] - 1)];

            for (int j = bandsec[i][0] - 1; j < bandsec[i][1]; j++) {

                tempReal[k] = real[j];
                tempImag[k] = imaginary[j];
                k++;
            }
            secenergy[i] = energy(tempReal, tempImag);

        }

        return secenergy;
    }

 public static double energy(double[] real, double[] imaginary) {
        double e = 0;

        Complex sum = new Complex(0, 0);
        ArrayList<Complex> complexList = new ArrayList<Complex>();

        for (int i = 0; i < real.length; i++) {
            Complex comp = new Complex(real[i], imaginary[i]);

            complexList.add(comp.multiply(comp));
        }

        for (int i = 0; i < complexList.size(); i++) {
            Complex comp = new Complex(complexList.get(i).getReal(), complexList.get(i).getImaginary());

            sum = Complex.add(comp, sum);


        }

        e = Math.sqrt(sum.magnitude());
        e = (double) Math.round(e * 10000) / 10000;
        return e;
    }


Using multiple cores will help, but often optimising your code which give you greater benefits.

Using double instead of a Complex object is 9x faster on my machine.

The average time using double took 38,687 ns
The average time using Complex took 344,010 ns

The test code

public class EnergyTest {
    public static void main(String... args) {
        double[] real = new double[22050];
        double[] imaginary = new double[22050];
        for (int i = 0; i < real.length; i++) {
            real[i] = Math.random() - Math.random();
            imaginary[i] = Math.random() - Math.random();
        }
        {
            int runs = 100000;
            long start = 0;
            double e = 0;
            for (int i = -10000; i < runs; i++) {
                if (i == 0) start = System.nanoTime();
                e += energyDouble(real, imaginary);
            }
            assert e > 0;
            long time = System.nanoTime() - start;
            System.out.printf("The average time using double took %,d ns%n", time / runs);
        }
        {
            int runs = 10000;
            long start = System.nanoTime();
            double e = 0;
            for (int i = -10000; i < runs; i++) {
                if (i == 0) start = System.nanoTime();
                e += energy(real, imaginary);
            }
            assert e > 0;
            long time = System.nanoTime() - start;
            System.out.printf("The average time using Complex took %,d ns%n", time / runs);
        }
    }

    public static double energyDouble(double[] real, double[] imaginary) {
        double re_total = 0, im_total = 0;

        for (int i = 0; i < real.length; i++) {
            double re = real[i];
            double im = imaginary[i];
            double re2 = re * re - im * im;
            double im2 = 2 * re * im;
            re_total += re2;
            im_total += im2;
        }
        double e = Math.sqrt(re_total * re_total + im_total * im_total);
        e = (double) Math.round(e * 10000) / 10000;
        return e;
    }

    public static double energy(double[] real, double[] imaginary) {
        double e = 0;

        Complex sum = new Complex(0, 0);
        ArrayList<Complex> complexList = new ArrayList<Complex>();

        for (int i = 0; i < real.length; i++) {
            Complex comp = new Complex(real[i], imaginary[i]);

            complexList.add(comp.multiply(comp));
        }

        for (int i = 0; i < complexList.size(); i++) {
            Complex comp = new Complex(complexList.get(i).getReal(), complexList.get(i).getImaginary());

            sum = Complex.add(comp, sum);


        }

        e = Math.sqrt(sum.magnitude());
        e = (double) Math.round(e * 10000) / 10000;
        return e;
    }

    static class Complex {

        private final double re;
        private final double im;

        public Complex(double re, double im) {
            this.re = re;
            this.im = im;
        }

        public double getReal() {
            return re;
        }

        public double getImaginary() {
            return im;
        }

        public Complex multiply(Complex comp) {
            double re2 = re * comp.re - im * comp.im;
            double im2 = im * comp.re + re * comp.im;
            return new Complex(re2, im2);
        }

        public static Complex add(Complex a, Complex b) {
            return new Complex(a.re + b.re, a.im + b.im);
        }

        public double magnitude() {
            return re * re + im * im;
        }
    }
}


Idont know your library since I'm using FFTW myself, but the things i noticed were

1) your fft size is not a power of 2

2) your window is 370ms not 37ms.

3) Since your window has a size of 370ms (i.e. ~16k samples) why feed a 88200 
(or does the constructor value say "take only 44100 values"?) array into it? 
It is fully sufficient to take 
    pow(2.0, ceil(log2(0.37*44100))) = 2^14 = 16384
as your fft size. 
Zero padding wont add additional frequency resolution I'm afraid.

4) you instatiate a new FFT object for every call to divideFreq.
I'm not sure how expensive the construction is, so try make it a class member.

5) Last but not least (I think this is the major speed loss here) 
Your hop size is much too small! 
A common overlap is 1/2 or 2/3 of the window size 
(in terms of your code: interval = windowSize/3). 
Your's is around 1/31 of the window size. 
Thats really overkill give you many redundant results. 

cheers


A few ideas:

In makeFrame, declare double[] temp = new double[88200] outside the loop, to avoid repeated memory allocation and deallocation. (I don't know if the optimizer is smart enough to do this by itself)

In divideFreqs, What is the purpose of Math.round(data*100000000)/100000000 ? You are limiting to 9 significant digits? Does the 10th digit really have such a great effect on the result? If this step is not strictly needed, eliminating it may save some time.

The biggest savings I can see would come from eliminating the whole construction and copy into the real and imaginary arrays. It appears that your FFT leaves audioData with 22050 interleaved real/imaginary parts. So you could change the signature of the energy function to energy(double fftData[], int lowFreq, int highFreq), and in that function replace the two loops with:

      for (int i = 0; i <highFreq-lowFreq; i++) {
             Complex comp = new Complex(fftData[2*i], fftData[2*i+1]);
             sum=  Complex.add(sum, comp.multiply(comp)) 
      }

Then replace the body of divideFreqs with simply:

//...
fft.realForwardFull(audioData); //splitting real and imaginary numbers

len = bandsec.length;
secenergy = new double[len];  
for (int i = 0; i < len; i++) {  
    secenergy[i] = energy(audioData, bandSec[i][0], bandSec[i][1]-1)
}
return secenergy

This avoids a ton of memory allocation, copying and deallocation, and should increase speed dramatically.
After these changes, for even better performmance you can replace the Complex number manipulation with the direct double operations that @Peter showed.


For an FFT to actually work, you must have a power of 2 as the number of samples. This allows the FFT to run in O(n*log(n)) rather than doing a DFT which is O(n*n). Your library is probably smart enough to make this improvement if you provide it correctly sized input. Unfortunately this means your window size is not up to you, as it will be restricted to sizes of 2,4,8,16,etc...

Technically you can use the math behind FFT to speed up transforms where the number of samples is not a prime number, but that is quite complex, not as fast, and may not be supported by the library. However, if your window size is required to be a certain size to match a specification you may be forced into something more complex to get performance.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜