Ruby array interpolation?
So I have an array to be rendered and displayed in some charts, but say my dataset is going to be far too large, how can I take an array that is say 20,000 items in length and like either drop every other item until the array is 1,000 items or interpolate the array until it's that size?
Example, say I have the following array (of hashes):
[
{"timestamp"=>2011-09-05 14:30:00 UTC, "count"=>4488.0},
{"timestamp"=>2011-09-05 14:45:00 UTC, "count"=>4622.0},
{"timestamp"=>2011-09-05 15:00:00 UTC, "count"=>4655.0},
{"timestamp"=>2011-09-05 15:15:00 UTC, "count"=>4533.0},
{"timestamp"=>2011-09-05 15:30:00 UTC, "count"=>4439.0},
{"timestamp"=>2011-09-05 15:45:00 UTC, "count"=>4468.0},
{"timestamp"=>2011-09-05 16:00:00 UTC, "count"=>4419.0},
{"timestamp"=>2011-09-05 16:15:00 UTC, "count"=>4430.0},
{"timestamp"=>2011-09-05 16:30:00 UTC, "count"=>4429.0},
{"timestamp"=>2011-09-05 16:45:00 UTC, "count"=>4502.0},
{"timestamp"=>2011-09-05 17:00:00 UTC, "count"=>4497.0},
{"timestamp"=>2011-09-05 17:15:00 UTC, "count"=>4468.0},
{"timestamp"=>2011-09-05 17:30:00 UTC, "count"=>4510.0},
{"timestamp"=>2011-09-05 17:45:00 UTC, "count"=>4547.0},
{"timestamp"=>2011-09-05 18:00:00 UTC, "count"=>4471.0},
{"timestamp"=>2011-09-05 18:15:00 UTC, "count"=>4501.0},
{"timestamp"=>2011-09-05 18:30:00 UTC, "count"=>4451.0},
{"timestamp"=>2011-09-05 18:45:00 UTC, "count"=>4453.0},
{"timestamp"=>2011-09-05 19:00:00 UTC, "count"=>4593.0},
{"timestamp"=>2011-09-05 19:15:00 UTC, "count"=>4540.0},
{"timestamp"=>2011-09-05 19:30:00 UTC, "count"=>4516.0},
{"timestamp"=>2011-09-05 19:45:00 UTC, "count"=>4494.0}
]
And I want an array of the intermediary values, either just dropped out of the array or somehow interpolated, like such:
[
{"timestamp"=>2011-09-05 14:45:00 UTC, "count"=>4622.0},
{"timestamp"=>2011-09-05 15:00:00 UTC, "count"=>4655.0},
{"timestamp"=>2011-09-05 15:30:00 UTC, "count"=>4439.0},
{"timestamp"=>2011-09-05 16:00:00 UTC, "count"=>4419.0},
{"timestamp"=>2011-09-05 16:30:00 UTC, "count"=>4429.0},
{"timestamp"=>2011-09-05 17:00:00 UTC, "count"=>4497.0},
{"timestamp"=>2011-09-05 17:30:00 UTC, "count"=>4510.0},
{"timestamp"=>2011-09-05 18:00:00 UTC, "count"=>4471.0},
{"timestamp"=>2011-09-05 18:30:00 UTC, "count"=>4451.0},
{"timestamp"=>2011-09-05 19:00:00 UTC, "count"=>4593.0},
{"timestamp"=>2011-09-05 19:15:00 UTC, "count"开发者_开发知识库=>4540.0},
{"timestamp"=>2011-09-05 19:45:00 UTC, "count"=>4494.0}
]
Any thoughts or help on this would be greatly appreciated, I may just be missing the point here as well.
require 'pp'
# Interval in seconds (30 min)
INTERVAL = 1800
# generate the data
start = Time.mktime(2001, 9, 5, 14, 30)
data = Array.new
1000.times do |i|
data << {:timestamp => start + i*INTERVAL, :count => rand(4000)}
end
# Plain data
pp data
puts # blank
# Simply gets de data from the sample number 300 to 400
pp data[300..400]
puts # blank
# For example, data from from the second hour, for 3 hours long
pp data[2*60*60/INTERVAL..(2+3)*60*60/INTERVAL]
puts # blank
# Make it smaller (50%)
# We need data.size * 0.5 elements
# Calculate the step we need to iterate to get
# 50% elements. In this case skipping one between two
step = (data.size/(data.size * 0.5)).to_i
# We use Range#step to get the array of indexes, and then
# transform it using Enumerable#collect to get the array
# of Hashes. and filter nils
#
# Probably there is a simpler way to do this. Too late to think
pp (0..data.size).step(step.to_i).collect {|index| data[index]}.reject{|x| x.nil?}
Also you may want to look a Enumerable#each_slice(n)
(1..10).each_slice(3) {|a| p a}
# outputs below
[1, 2, 3]
[4, 5, 6]
[7, 8, 9]
[10]
You can reduce the set by making slices of n elements, and then creating a new element from each slice. The element in the middle, an average, etc.
data.each_slice(3).collect { |slice| make_one_out_of_a_slice(slice) }
Use Array#sample
:
a = [ 1, 2, 3, 4, 5, 6 ]
smaller = a.sample(3)
# [4, 2, 1]
In your case you'd do something like this:
a = [
# 10 000 little hashes
]
smaller = a.sample(1000)
and then send smaller
off to be displayed.
And if you want them in order you could just sort them again:
smaller.sort! { |a,b| a['timestamp'] <=> b['timestamp'] }
To condense your array, you have to define some rule on which criteria you want to drop out the samples. To make it easier to understand, I use s simple integer as the timestamp instead. If you want to use it with your data, you have to modify the reject method a little bit.
samples = 100.times.map do |i|
{"timestamp" => i, "count" => rand(100)}
end
i = samples.size
samples.reject! do |item| item["timestamp"]%2 == 0 end
The item["timestamp"]%2 == 0
is the rule on which the sample gets droped of the sample set. You can define some time ranges or something else on it for your data.
$> samples.size # => 50
精彩评论