Sorted quantile mean via Rpy
The real goal here is to find the quantile means (or sums, or median, etc.) in Python. Since I'm not a power user of Python but have used R for a while, my chosen route is via Rpy. However, I ran into the problem that the returned list of means are not correspondent to the order of the quantiles. In particular, I have the followings in R:
> a = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
> b = c(2, 4, 20, 40, 200, 400, 2000, 4000, 20000, 40000)
> prob = seq(0,5)/5
> br = quantile(a,prob)
> rcut = cut(a, br, include.lowest = TRUE)
> quintile_means = tapply(b, rcut, mean)
> quintile_means
[1,2.8] (2.8,4.6] (4.6,6.4] (6.4,8.2] (8.2,10]
3 3开发者_运维问答0 300 3000 30000
which is all very good. However, if I translate the code into Rpy, I got
>>> import rpy
>>> from rpy import r
>>> a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> b = [2, 4, 20, 40, 200, 400, 2000, 4000, 20000, 40000]
>>> prob = [ x / 5.0 for x in range(6)]
>>> br = r.quantile(a, prob)
>>> rcut = r.cut(a, br, include_lowest=r.TRUE)
>>> quintile_means = r.tapply(b, rcut, r.mean)
>>> print quintile_means
[30.0, 300.0, 3000.0, 30000.0, 3.0]
Note the final list is mis-ordered (we know it because a
and b
are both ordered in this case). In general, I just have no way to recover the correct order from the lowest to highest quantile in Rpy. Any suggestions?
In addition (not in substitution, as I'd like to know the answer to the above question), if you can suggest a way to directly perform the analysis in python, that will be great too. (I don't have numpy or scipy installed.) Thx!
EDIT: To clarify, a
and b
are paired but not necessarily ordered. For example, a
is the size of eyes and b
is the size of nose. I'm trying to find out that in the various quantiles of a
, what are the means of the correspondent b
s. Thanks.
Try rpy2.
With rpy2 >= 2.1.0, this could be:
from rpy2.robjects.vectors import IntVector
from rpy2.robjects.packages import importr
base = importr('base')
stats = importr('stats')
a = IntVector((1, 2, 3, 4, 5, 6, 7, 8, 9, 10))
b = IntVector((2, 4, 20, 40, 200, 400, 2000, 4000, 20000, 40000))
prob = base.seq(0,5).ro / 5
br = stats.quantile(a,prob)
rcut = base.cut(a, br, include_lowest = True)
quintile_means = base.tapply(b, rcut, stats.mean)
print(quintile_means)
If you don't need labels (e.g: (8.2,10]
) then you could call cut
with labels=FALSE
. This should keep order (and speed up your code for free).
I just have no way to recover the correct order from the lowest to highest quantile in Rpy
If sorting the list from the lowest to the highest solves your problem, try sorted(quintile_means)
.
精彩评论