开发者

Binning in Excel

Which formulae in MS Excel 开发者_StackOverflow社区can we use for -

equi-depth binning

equi-width binning


Here's what I used. The data I was binning was in A2:A2001.

Equi-width:

I calculated the width in a separate cell (U2), using this formula:

=(MAX($A$2:$A$2001) - MIN($A$2:$A$2001) + 0.00000001)/10

10 is the number of bins. The + 0.00000000001 is there because without it, values equal to the maximum were getting put into their own bin.

Then, for the actual binning, I used this:

=ROUNDDOWN(($A2-MIN($A$2:$A$2001))/$U$2, 0)

This function is finding how many bin-widths above the minimum your value is, by dividing (value - minimum) by the bin width. We only care about how many full bin-widths fit into the value, not fractional ones, so we use ROUNDDOWN to chop off all the fractional bin-widths (that is, show 0 decimal places).

Equi-depth

This one is simpler.

=ROUNDDOWN(PERCENTRANK($A$2:$A$2001, $A2)*10, 0)

First, get the percentile rank of the current cell ($A2) out of all the cells being binned ($A$2:$A$2001). This will be a value between 0 and 1, so to convert it into bins, just multiply by the total number of bins you want (I used 10). Then, chop off the decimals the same way as before.

For either of these, if you want your bins to start at 1 rather than 0, just add a +1 to the end of the formula.


Best approach is to use the built-in method:

http://support.microsoft.com/kb/214269

I think the VBA version of the addin (step 3 with most versions) will also give you the code.


Put this formula in B1:

=MAX( ROUNDUP( PERCENTRANK($A$1:$A$8, A1) *4, 0),1)

Binning in Excel

Fill down the formula all across B column and you are done. The formula divides the range into 4 equal buckets and it returns the bucket number which the cell A1 falls into. The first bucket contains the lowest 25% of values.

General pattern is:

=MAX( ROUNDUP ( PERCENTRANK ([Range], [TestCell]) * [NumberOfBuckets], 0), 1)


You may have to build the matrix to graph.

For the bin bracket you could use =PERCENTILE() for equi-depth and a proportion of the difference =Max(Data) - Min(Data) for equi-width.

You could obtain the frequency with =COUNTIF(). The bin's Mean could be obtained using =SUMPRODUCT((Data>LOWER_BRACKET)*(Data<UPPER_BRACKET)*Data)/frequency

More complex statistics could be reached hacking around with SUMPRODUCT and/or Array formulas (which I do not recommend since are very hard to comprehend for a non-programmer)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜