Ruby array interpolation?

2023-04-03 17:36 问答作者：

So I have an array to be rendered and displayed in some charts, but say my dataset is going to be far too large, how can I take an array that is say 20,000 items in length and like either drop every other item until the array is 1,000 items or interpolate the array until it's that size?

Example, say I have the following array (of hashes):

[ 
  {"timestamp"=>2011-09-05 14:30:00 UTC, "count"=>4488.0},
  {"timestamp"=>2011-09-05 14:45:00 UTC, "count"=>4622.0},
  {"timestamp"=>2011-09-05 15:00:00 UTC, "count"=>4655.0},
  {"timestamp"=>2011-09-05 15:15:00 UTC, "count"=>4533.0},
  {"timestamp"=>2011-09-05 15:30:00 UTC, "count"=>4439.0},
  {"timestamp"=>2011-09-05 15:45:00 UTC, "count"=>4468.0},
  {"timestamp"=>2011-09-05 16:00:00 UTC, "count"=>4419.0},
  {"timestamp"=>2011-09-05 16:15:00 UTC, "count"=>4430.0},
  {"timestamp"=>2011-09-05 16:30:00 UTC, "count"=>4429.0},
  {"timestamp"=>2011-09-05 16:45:00 UTC, "count"=>4502.0},
  {"timestamp"=>2011-09-05 17:00:00 UTC, "count"=>4497.0},
  {"timestamp"=>2011-09-05 17:15:00 UTC, "count"=>4468.0},
  {"timestamp"=>2011-09-05 17:30:00 UTC, "count"=>4510.0},
  {"timestamp"=>2011-09-05 17:45:00 UTC, "count"=>4547.0},
  {"timestamp"=>2011-09-05 18:00:00 UTC, "count"=>4471.0},
  {"timestamp"=>2011-09-05 18:15:00 UTC, "count"=>4501.0},
  {"timestamp"=>2011-09-05 18:30:00 UTC, "count"=>4451.0},
  {"timestamp"=>2011-09-05 18:45:00 UTC, "count"=>4453.0},
  {"timestamp"=>2011-09-05 19:00:00 UTC, "count"=>4593.0},
  {"timestamp"=>2011-09-05 19:15:00 UTC, "count"=>4540.0},
  {"timestamp"=>2011-09-05 19:30:00 UTC, "count"=>4516.0},
  {"timestamp"=>2011-09-05 19:45:00 UTC, "count"=>4494.0}
]

And I want an array of the intermediary values, either just dropped out of the array or somehow interpolated, like such:

[ 
  {"timestamp"=>2011-09-05 14:45:00 UTC, "count"=>4622.0},
  {"timestamp"=>2011-09-05 15:00:00 UTC, "count"=>4655.0},
  {"timestamp"=>2011-09-05 15:30:00 UTC, "count"=>4439.0},
  {"timestamp"=>2011-09-05 16:00:00 UTC, "count"=>4419.0},
  {"timestamp"=>2011-09-05 16:30:00 UTC, "count"=>4429.0},
  {"timestamp"=>2011-09-05 17:00:00 UTC, "count"=>4497.0},
  {"timestamp"=>2011-09-05 17:30:00 UTC, "count"=>4510.0},
  {"timestamp"=>2011-09-05 18:00:00 UTC, "count"=>4471.0},
  {"timestamp"=>2011-09-05 18:30:00 UTC, "count"=>4451.0},
  {"timestamp"=>2011-09-05 19:00:00 UTC, "count"=>4593.0},
  {"timestamp"=>2011-09-05 19:15:00 UTC, "count"开发者_开发知识库=>4540.0},
  {"timestamp"=>2011-09-05 19:45:00 UTC, "count"=>4494.0}
]

Any thoughts or help on this would be greatly appreciated, I may just be missing the point here as well.

require 'pp'

# Interval in seconds (30 min)
INTERVAL = 1800

# generate the data
start = Time.mktime(2001, 9, 5, 14, 30)

data = Array.new
1000.times do |i|
  data << {:timestamp => start + i*INTERVAL, :count => rand(4000)}
end

# Plain data
pp data

puts # blank

# Simply gets de data from the sample number 300 to 400
pp data[300..400]

puts # blank

# For example, data from from the second hour, for 3 hours long
pp data[2*60*60/INTERVAL..(2+3)*60*60/INTERVAL]

puts # blank

# Make it smaller (50%)
# We need data.size * 0.5 elements
# Calculate the step we need to iterate to get
# 50% elements. In this case skipping one between two
step = (data.size/(data.size * 0.5)).to_i

# We use Range#step to get the array of indexes, and then
# transform it using Enumerable#collect to get the array
# of Hashes. and filter nils
#
# Probably there is a simpler way to do this. Too late to think
pp (0..data.size).step(step.to_i).collect {|index| data[index]}.reject{|x| x.nil?}

Also you may want to look a Enumerable#each_slice(n)

(1..10).each_slice(3) {|a| p a}
    # outputs below
    [1, 2, 3]
    [4, 5, 6]
    [7, 8, 9]
    [10]

You can reduce the set by making slices of n elements, and then creating a new element from each slice. The element in the middle, an average, etc.

data.each_slice(3).collect { |slice| make_one_out_of_a_slice(slice) }

Use Array#sample:

a = [ 1, 2, 3, 4, 5, 6 ]
smaller = a.sample(3)
# [4, 2, 1]

In your case you'd do something like this:

a = [
    # 10 000 little hashes
]
smaller = a.sample(1000)

and then send smaller off to be displayed.

And if you want them in order you could just sort them again:

smaller.sort! { |a,b| a['timestamp'] <=> b['timestamp'] }

To condense your array, you have to define some rule on which criteria you want to drop out the samples. To make it easier to understand, I use s simple integer as the timestamp instead. If you want to use it with your data, you have to modify the reject method a little bit.

 samples = 100.times.map do |i|
   {"timestamp" => i, "count" => rand(100)}
 end

 i = samples.size
 samples.reject! do |item| item["timestamp"]%2 == 0 end

The item["timestamp"]%2 == 0 is the rule on which the sample gets droped of the sample set. You can define some time ranges or something else on it for your data.

 $> samples.size # => 50

继续阅读：ruby

Ruby array interpolation?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？