开发者

Create a vector listing run length of original vector with same length as original vector

This problem seems trivial but I'm at my wits end after hours of reading.

I need to generate a vector of the same length as the input vector that lists for each value of the input vector the total count for that value. So, by way of example, I would want to generate the last column of this dataframe:

> df
   customer.id开发者_运维技巧 transaction.count total.transactions
1            1                 1                  4
2            1                 2                  4
3            1                 3                  4
4            1                 4                  4
5            2                 1                  2
6            2                 2                  2
7            3                 1                  3
8            3                 2                  3
9            3                 3                  3
10           4                 1                  1

I realise this could be done two ways, either by using run lengths of the first column, or grouping the second column using the first and applying a maximum.

I've tried both tapply:

> tapply(df$transaction.count, df$customer.id, max)

And rle:

> rle(df$customer.id)

But both return a vector of shorter length than the original:

[1] 4  2  3  1

Any help gratefully accepted!


You can do it without creating transaction counter with:

df$total.transactions <- with( df,  
                     ave( transaction.count , customer.id , FUN=length) )


You can use rle with rep to get what you want:

x <- rep(1:4, 4:1)
> x
 [1] 1 1 1 1 2 2 2 3 3 4

rep(rle(x)$lengths, rle(x)$lengths)
> rep(rle(x)$lengths, rle(x)$lengths)
 [1] 4 4 4 4 3 3 3 2 2 1

For performance purposes, you could store the rle object separately so it is only called once.

Or as Karsten suggested with ddply from plyr:

require(plyr)

#Expects data.frame
dat <- data.frame(x = rep(1:4, 4:1))
ddply(dat, "x", transform, total = length(x))


You are probably looking for split-apply-combine approach; have a look at ddply in the plyr package or the split function in base R.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜