Calculating Percentiles (Ruby)
My code is based on the methods described here and here.
def fraction?(number)
number - number.truncate
end
def percentile(param_arr开发者_StackOverfloway, percentage)
another_array = param_array.to_a.sort
r = percentage.to_f * (param_array.size.to_f - 1) + 1
if r <= 1 then return another_array[0]
elsif r >= another_array.size then return another_array[another_array.size - 1]
end
ir = r.truncate
another_array[ir] + fraction?((another_array[ir].to_f - another_array[ir - 1].to_f).abs)
end
Example usage:
test_array = [95.1772, 95.1567, 95.1937, 95.1959, 95.1442, 95.061, 95.1591, 95.1195,
95.1065, 95.0925, 95.199, 95.1682]
test_values = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
test_values.each do |value|
puts value.to_s + ": " + percentile(test_array, value).to_s
end
Output:
0.0: 95.061
0.1: 95.1205
0.2: 95.1325
0.3: 95.1689
0.4: 95.1692
0.5: 95.1615
0.6: 95.1773
0.7: 95.1862
0.8: 95.2102
0.9: 95.1981
1.0: 95.199
The problem here is that the 80th percentile is higher than the 90th and the 100th. However, as far as I can tell my implementation is as described, and it returns the right answer for the example given (0.9).
Is there an error in my code I'm not seeing? Or is there a better way to do this?
script
This sounds like a homework problem. Anyway, it was kinda fun to do.
# Score class
class Score
attr_accessor :value, :percentile
def initialize(score)
self.value = score.to_f
end
def <=>(foo)
self.value <=> foo.value
end
end
# load scores
scores = []
DATA.each do |line|
scores << Score.new(line)
end
scores.sort!
scores_count = scores.size
# iterate through scores and calculate percentile
scores.each_with_index do |s, i|
# L/N(100) = P
# L = number of scores beneath this score (score array index)
# N = total number of scores
# P = percentile
s.percentile = (i.to_f/scores_count.to_f*100).ceil
end
# output
puts "What is the precise percentile of each score"
scores.each_with_index do |s,i|
puts "#{s.value} is in the #{s.percentile} percentile"
end
# bonus: what score is in the Xth percentile?
puts "\nWhat score is in the Xth percentile?"
percentiles = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
percentiles.each do |p|
# P/100(N) = L
# P = percentile
# N = total number of scores
# L = score array index
l = (p.to_f/100*scores_count).ceil
puts "#{p} percentile? #{scores[l].value}"
end
__END__
95.1772
95.1567
95.1937
95.1959
95.1442
95.061
95.1591
95.1195
95.1065
95.0925
95.199
95.1682
output
What is the precise percentile of each score
95.061 is in the 0 percentile
95.0925 is in the 9 percentile
95.1065 is in the 17 percentile
95.1195 is in the 25 percentile
95.1442 is in the 34 percentile
95.1567 is in the 42 percentile
95.1591 is in the 50 percentile
95.1682 is in the 59 percentile
95.1772 is in the 67 percentile
95.1937 is in the 75 percentile
95.1959 is in the 84 percentile
95.199 is in the 92 percentile
What score is in the Xth percentile?
0 percentile? 95.061
10 percentile? 95.1065
20 percentile? 95.1195
30 percentile? 95.1442
40 percentile? 95.1567
50 percentile? 95.1591
60 percentile? 95.1772
70 percentile? 95.1937
80 percentile? 95.1959
90 percentile? 95.199
Got it working. Added -Infinity
to the array so that I could use the indexes in the range 1 - N
. I was also multiplying the value in the last line for the wrong variable.
def percentile(param_array, percentage)
another_array = param_array.to_a.dup
another_array.push(-1.0/0.0) # add -Infinity to be 0th index
another_array.sort!
another_array_size = another_array.size - 1 # disregard -Infinity
r = percentage.to_f * (another_array_size - 1) + 1
if r <= 1 then return another_array[1]
elsif r >= another_array_size then return another_array[another_array_size]
end
ir = r.truncate
fr = fraction? r
another_array[ir] + fr*(another_array[ir+1] - another_array[ir])
end
The r = ...
line can be replaced for r = percentage.to_f * (another_array_size + 1)
to use the formula in the first link instead of Excel's.
Output:
0.0: 95.061
0.1: 95.0939
0.2: 95.1091
0.3: 95.12691
0.4: 95.1492
0.5: 95.1579
0.6: 95.16456
0.7: 95.1745
0.8: 95.1904
0.9: 95.19568
1.0: 95.199
You could also monkeypatch Enumerable:
module Enumerable
def rank value, n_tiles
count = self.length
raise "You cannot split an array of #{count} elements into #{n_tiles} tiles!" if n_tiles > count
ordered_array = self.sort
split_size = count / n_tiles
boundaries = []
(n_tiles - 1).times do |i|
boundaries << ordered_array[(i + 1) * split_size - 1]
end
boundaries.each_with_index do |boundary, i|
if value > boundaries.last
return n_tiles
elsif value <= boundary
return (i + 1)
end
end
end
end
After this you would be able to do something like this:
a = [1,4,2,5,3,6]
# Test in which range (rank) the number '1' would be places, if the array is ordered and spit into 3 pieces:
a.rank(1,3)
#=> 1
精彩评论