Sorted quantile mean via Rpy

2023-01-12 15:26 问答作者：

The real goal here is to find the quantile means (or sums, or median, etc.) in Python. Since I'm not a power user of Python but have used R for a while, my chosen route is via Rpy. However, I ran into the problem that the returned list of means are not correspondent to the order of the quantiles. In particular, I have the followings in R:

> a = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
> b = c(2, 4, 20, 40, 200, 400, 2000, 4000, 20000, 40000)
> prob = seq(0,5)/5
> br = quantile(a,prob)
> rcut = cut(a, br, include.lowest = TRUE)
> quintile_means = tapply(b, rcut, mean)
> quintile_means
[1,2.8] (2.8,4.6] (4.6,6.4] (6.4,8.2]  (8.2,10] 
      3        3开发者_运维问答0       300      3000     30000

which is all very good. However, if I translate the code into Rpy, I got

>>> import rpy
>>> from rpy import r
>>> a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> b = [2, 4, 20, 40, 200, 400, 2000, 4000, 20000, 40000]
>>> prob = [ x / 5.0 for x in range(6)]
>>> br = r.quantile(a, prob)
>>> rcut = r.cut(a, br, include_lowest=r.TRUE)
>>> quintile_means = r.tapply(b, rcut, r.mean)
>>> print quintile_means
[30.0, 300.0, 3000.0, 30000.0, 3.0]

Note the final list is mis-ordered (we know it because a and b are both ordered in this case). In general, I just have no way to recover the correct order from the lowest to highest quantile in Rpy. Any suggestions?

In addition (not in substitution, as I'd like to know the answer to the above question), if you can suggest a way to directly perform the analysis in python, that will be great too. (I don't have numpy or scipy installed.) Thx!

EDIT: To clarify, a and b are paired but not necessarily ordered. For example, a is the size of eyes and b is the size of nose. I'm trying to find out that in the various quantiles of a, what are the means of the correspondent bs. Thanks.

Try rpy2.

With rpy2 >= 2.1.0, this could be:

from rpy2.robjects.vectors import IntVector
from rpy2.robjects.packages import importr
base = importr('base')
stats = importr('stats')

a = IntVector((1, 2, 3, 4, 5, 6, 7, 8, 9, 10))
b = IntVector((2, 4, 20, 40, 200, 400, 2000, 4000, 20000, 40000))
prob = base.seq(0,5).ro / 5
br = stats.quantile(a,prob)
rcut = base.cut(a, br, include_lowest = True)
quintile_means = base.tapply(b, rcut, stats.mean)
print(quintile_means)

If you don't need labels (e.g: (8.2,10]) then you could call cut with labels=FALSE. This should keep order (and speed up your code for free).

I just have no way to recover the correct order from the lowest to highest quantile in Rpy

If sorting the list from the lowest to the highest solves your problem, try sorted(quintile_means).

继续阅读：python quantile r rpy2

Sorted quantile mean via Rpy

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？