How can I do standard deviation in Ruby?
I have several records with a given attribute, and I wan开发者_运维百科t to find the standard deviation.
How do I do that?
module Enumerable
def sum
self.inject(0){|accum, i| accum + i }
end
def mean
self.sum/self.length.to_f
end
def sample_variance
m = self.mean
sum = self.inject(0){|accum, i| accum +(i-m)**2 }
sum/(self.length - 1).to_f
end
def standard_deviation
Math.sqrt(self.sample_variance)
end
end
Testing it:
a = [ 20, 23, 23, 24, 25, 22, 12, 21, 29 ]
a.standard_deviation
# => 4.594682917363407
01/17/2012:
fixing "sample_variance" thanks to Dave Sag
It appears that Angela may have been wanting an existing library. After playing with statsample, array-statisics, and a few others, I'd recommend the descriptive_statistics gem if you're trying to avoid reinventing the wheel.
gem install descriptive_statistics
$ irb
1.9.2 :001 > require 'descriptive_statistics'
=> true
1.9.2 :002 > samples = [1, 2, 2.2, 2.3, 4, 5]
=> [1, 2, 2.2, 2.3, 4, 5]
1.9.2p290 :003 > samples.sum
=> 16.5
1.9.2 :004 > samples.mean
=> 2.75
1.9.2 :005 > samples.variance
=> 1.7924999999999998
1.9.2 :006 > samples.standard_deviation
=> 1.3388427838995882
I can't speak to its statistical correctness, or your comfort with monkey-patching Enumerable; but it's easy to use and easy to contribute to.
The answer given above is elegant but has a slight error in it. Not being a stats head myself I sat up and read in detail a number of websites and found this one gave the most comprehensible explanation of how to derive a standard deviation. http://sonia.hubpages.com/hub/stddev
The error in the answer above is in the sample_variance
method.
Here is my corrected version, along with a simple unit test that shows it works.
in ./lib/enumerable/standard_deviation.rb
#!usr/bin/ruby
module Enumerable
def sum
return self.inject(0){|accum, i| accum + i }
end
def mean
return self.sum / self.length.to_f
end
def sample_variance
m = self.mean
sum = self.inject(0){|accum, i| accum + (i - m) ** 2 }
return sum / (self.length - 1).to_f
end
def standard_deviation
return Math.sqrt(self.sample_variance)
end
end
in ./test
using numbers derived from a simple spreadsheet.
#!usr/bin/ruby
require 'enumerable/standard_deviation'
class StandardDeviationTest < Test::Unit::TestCase
THE_NUMBERS = [1, 2, 2.2, 2.3, 4, 5]
def test_sum
expected = 16.5
result = THE_NUMBERS.sum
assert result == expected, "expected #{expected} but got #{result}"
end
def test_mean
expected = 2.75
result = THE_NUMBERS.mean
assert result == expected, "expected #{expected} but got #{result}"
end
def test_sample_variance
expected = 2.151
result = THE_NUMBERS.sample_variance
assert result == expected, "expected #{expected} but got #{result}"
end
def test_standard_deviation
expected = 1.4666287874
result = THE_NUMBERS.standard_deviation
assert result.round(10) == expected, "expected #{expected} but got #{result}"
end
end
I'm not a big fan of adding methods to Enumerable
since there could be unwanted side effects. It also gives methods really specific to an array of numbers to any class inheriting from Enumerable
, which doesn't make sense in most cases.
While this is fine for tests, scripts or small apps, it's risky for larger applications, so here's an alternative based on @tolitius' answer which was already perfect. This is more for reference than anything else:
module MyApp::Maths
def self.sum(a)
a.inject(0){ |accum, i| accum + i }
end
def self.mean(a)
sum(a) / a.length.to_f
end
def self.sample_variance(a)
m = mean(a)
sum = a.inject(0){ |accum, i| accum + (i - m) ** 2 }
sum / (a.length - 1).to_f
end
def self.standard_deviation(a)
Math.sqrt(sample_variance(a))
end
end
And then you use it as such:
2.0.0p353 > MyApp::Maths.standard_deviation([1,2,3,4,5])
=> 1.5811388300841898
2.0.0p353 :007 > a = [ 20, 23, 23, 24, 25, 22, 12, 21, 29 ]
=> [20, 23, 23, 24, 25, 22, 12, 21, 29]
2.0.0p353 :008 > MyApp::Maths.standard_deviation(a)
=> 4.594682917363407
2.0.0p353 :043 > MyApp::Maths.standard_deviation([1,2,2.2,2.3,4,5])
=> 1.466628787389638
The behavior is the same, but it avoids the overheads and risks of adding methods to Enumerable
.
The presented computation are not very efficient because they require several (at least two, but often three because you usually want to present average in addition to std-dev) passes through the array.
I know Ruby is not the place to look for efficiency, but here is my implementation that computes average and standard deviation with a single pass over the list values:
module Enumerable
def avg_stddev
return nil unless count > 0
return [ first, 0 ] if count == 1
sx = sx2 = 0
each do |x|
sx2 += x**2
sx += x
end
[
sx.to_f / count,
Math.sqrt( # http://wijmo.com/docs/spreadjs/STDEV.html
(sx2 - sx**2.0/count)
/
(count - 1)
)
]
end
end
As a simple function, given a list of numbers:
def standard_deviation(list)
mean = list.inject(:+) / list.length.to_f
var_sum = list.map{|n| (n-mean)**2}.inject(:+).to_f
sample_variance = var_sum / (list.length - 1)
Math.sqrt(sample_variance)
end
If the records at hand are of type Integer
or Rational
, you may want to compute the variance using Rational
instead of Float
to avoid errors introduced by rounding.
For example:
def variance(list)
mean = list.reduce(:+)/list.length.to_r
sum_of_squared_differences = list.map { |i| (i - mean)**2 }.reduce(:+)
sum_of_squared_differences/list.length
end
(It would be prudent to add special-case handling for empty lists and other edge cases.)
Then the square root can be defined as:
def std_dev(list)
Math.sqrt(variance(list))
end
In case people are using postgres ... it provides aggregate functions for stddev_pop and stddev_samp - postgresql aggregate functions
stddev (equiv of stddev_samp) available since at least postgres 7.1, since 8.2 both samp and pop are provided.
Or how about:
class Stats
def initialize( a )
@avg = a.count > 0 ? a.sum / a.count.to_f : 0.0
@stdev = a.count > 0 ? ( a.reduce(0){ |sum, v| sum + (@avg - v) ** 2 } / a.count ) ** 0.5 : 0.0
end
end
You can place this as helper method and assess it everywhere.
def calc_standard_deviation(arr)
mean = arr.sum(0.0) / arr.size
sum = arr.sum(0.0) { |element| (element - mean) ** 2 }
variance = sum / (arr.size - 1)
standard_deviation = Math.sqrt(variance)
end
精彩评论