开发者

Find a local minimum in a special graph

The issue at hand looks easy, but I could not find an easy solution so far.

I've got a histogram describing the value distributing of an array of floats, roughly looking like this:

Find a local minimum in a special graph

As you can see, there is a local maximum near 0, which keeps falling down to a local minimum, then rising quickly to a plateau, and in the end falling to 0. I would like to detect the local minimum.

In practice, the histogram is not as smooth:

Find a local minimum in a special graph

There are lots of spikes, and the local minimum may be stretched and uneven. I'm not sure how to tackle this problem.

There is little domain knowledge. The first max may even be higher than the second max. There may be spikes in any direction, values may be as low as 0.

This is a real life sample taken from 8 distinct runs. It's scaled to 0 - 10 to make it easier to understand.

0: 22%  12% 19% 17% 6%  5%  6%  5%    
1: 3%   2%  1%  1%  4%  1%  4%  1%    
2: 6%   2%  13% 5%  0%  2%  0%  2%   
3: 62%  62% 52% 42% 2%  5%  2%  5%  
4: 4%   19% 12% 28% 10% 13% 10% 13%  
5: 0%   0%      3%  29% 30% 29% 30%
6:                  37% 31% 37% 30%
7:                  1%  7%  1%  7%
8:                  6%  1%  6%  1%
9:
10:

Values rounded down. Missing values denote no occurrence of any value.

Explanation of the first line:

0: 22%   the initial max
1: 3%    local min
2: 6%    still min
3: 62%   plateau max
4: 4%    second min
5: 0%    0
6:   no more values 
7:      
8:      
9:
10:

For reference, a list of the same data, this time scaled to 0 - 100 (there were no values in the 90-100 range at all). I messed up on the formatting, but it should give a rough idea.

0:  0%   0%   0%   1%   0%   0%   0%   0%
1:  0%   1%   1%   3%   0%   0%   0%   0%
2:  1%   2%   1%   3%   0%   0%   0%   0%
3:  4%   2%   3%   3%   0%   1%   0%   1%
4:  6%   1%   3%   2%   0%   0%   0%   0%
5:  2%   0%   3%   1%   0%   0%   0%   0%
6:  1%   0%   2%   0%   0%   0%   0%   0%
7:  1%   0%   1%   0%   0%   0%   0%   0%
8:  1%   0%   1%   0%   0%   0%   0%   0%
9:  1%   0%   1%   0%   1%   0%   1%   0%
10: 1%   0%   0%   0%   1%   0%   1%   0%
11: 0%   0%   0%   0%   0%   0%   0%   0%
12: 0%   0%   0%   0%   0%   0%   0%   0%
13: 0%   0%   0%   0%   0%   0%   0%   0%
14: 0%   0%   0%   0%   0%   0%   0%   0%
15: 0%   0%   0%   0%   0%   0%   0%   0%
16: 0%   0%   0%   0%   0%   0%   0%   0%
17: 0%   0%   0%   0%   0%   0%   0%   0%
18: 0%   0%   0%   0%   0%   0%   0%   0%
19: 0%   0%   0%   0%   0%   0%   0%   0%
20: 0%   0%   0%   0%   0%   0%   0%   0%
21: 0%   0%   0%   0%   0%   0%   0%   0%
22: 0%   0%   0%   0%   0%   0%   0%   0%
23: 0%   0%   0%   0%   0%   0%   0%   0%
24: 0%   0%   1%   0%   0%   0%   0%   0%
25: 0%   0%   1%   0%   0%   0%   0%   0%
26: 0%   0%   1%   0%   0%   0%   0%   0%
27: 0%   0%   1%   0%   0%   0%   0%   0%
28: 1%   0%   2%   1%   0%   0%   0%   0%
29: 3%   0%   2%   2%   0%   0%   0%   0%
30: 7%   1%   3%   2%   0%   0%   0%   0%
31: 10%  2%   4%   3%   0%   0%   0%   0%
32: 10%  3%   4%   4%   0%   0%   0%   0%
33: 6%   6%   5%   5%   0%   0%   0%   0%
34: 5%   5%   4%   4%   0%   0%   0%   0%
35: 5%   8%   6%   3%   0%   0%   0%   0%
36: 5%   10%  6%   4%   0%   0%   0%   0%
37: 5%   9%   5%   3%   0%   0%   0%   0%
38: 3%   8%   5%   5%   0%   0%   0%   0%
39: 2%   5%   5%   5%   0%   0%   0%   0%
40: 1%   4%   4%   5%   0%   1%   0%   1%
41: 1%   3%   2%   5%   0%   1%   0%   1%
42: 0%   1%   1%   4%   0%   0%   0%   0%
43: 0%   2%   0%   4%   1%   1%   1%   1%
44: 0%   1%   0%   3%   1%   1%   1%   1%
45: 0%   1%   0%   1%   0%   1%   0%   1%
46: 0%   1%   0%   1%   1%   1%   1%   1%
47: 0%   1%   0%   0%   1%   1%   1%   1%
48: 0%   1%   0%   0%   1%   1%   1%   1%
50: 0%   0%   0%   1%   1%   1%   1%   1%
50:      0%        1%   1%   1%   1%   1%
51:      0%        0%   2%   1%   2%   1%
52:      0%        1%   2%   1%   2%   1%
53:      0%        0%   4%   2%   4%   2%
54:                0%   2%   2%   2%   2%
55:                0%   2%   2%   2%   2%
56:                0%   2%   3%   2%   3%
57:                0%   2%   4%   2%   4%
58:                     4%   6%   4%   6%
59:                     3%   3%   3%   3%
60:                     5%   5%   5%   5%
61:                     5%   7%   5%   7%
62:                     3%   5%   3%   5%
63:                     4%   3%   4%   3%
64:                     5%   2%   5%   2%
65:                     3%   2%   2%   2%
66:                     5%   1%   5%   1%
67:                     1%   0%   1%   0%
68:                     1%   0%   1%   0%
69:                     0%   1%   0%   1%
70:                     0%   0%   0%   0%
71:                     开发者_开发技巧0%   0%   0%   0%
72:                     0%   0%   0%   0%
73:                     0%   1%   0%   1%
74:                     0%   0%   0%   0%
75:                     0%   0%   0%   0%
76:                     0%   1%   0%   1%
77:                     0%   0%   0%   0%
78:                     0%   0%   0%   0%
79:                     0%   0%   0%   0%
80:                     0%   0%   0%   1%
81:                     0%   0%   0%   0%
82:                     0%   0%   0%   0%
83:                     0%   0%   0%   0%
84:                     0%   0%   0%   0%
85:                     1%        1%
86:                     0%        0%
87:                     1%        1%
88:                     1%        1%
89:                     0%        0%


Your "true" histogram is low frequency. Your noise is high frequency. Low-pass filtering the data with an appropriate bandwidth filter will get rid of most of the noise.


Here's an algoithm:

  1. Smooth your data set by calculating a moving average for a small window.
  2. Test your smoothed data for local minima (i.e. any single datum that it is smaller than its neighbours.
  3. If there are more than two local minima, increase the window size, and goto step 1.

Update:

Having looked at the sample data you posted, I've realised that you need to detect minimal plateaus rather than just individual points, so step two in the algorithm should be tweaked to identify a point as part of a minimum if there are no neighbours with smaller values between the nearest higher value neighbours on either side. Then when counting minima in step 3, a minimal plateau should count as a single minimum.

I've tested this algorithm on your example datasets and it performs well, picking minima at: 18, 12, 15, 13, 23, 20, 23and20 for your datasets respectively.


a possible heuristic: using spline approximation to smooth the histogram, and make it polynomical-like and then look for a local minimum.
note that this is only a heuristic solution and might fail... but I think will provide a good solution for most cases.


This actually sounds rather like histogram-based image segmentation to me (although this is not an image, so it's really just histogram segmentation). Sounds weird, but bear with me.

Is what's important about the minimum the fact that it's a minimum, or that it divides the small maximum from the large maximum? If it's the fact that it divides the maxima, then segmentation is definitely what you want.

Have a look at K-means clustering. You'd have two clusters. It's not a terribly complicated procedure, but Wikipedia (and other sources) do a much better job of explaining it than i could, so i'll leave it to them.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜