开发者

Calculating percentages in an apply statement (R)

I'm struggling with something quite simple, but I'm going around in circles, and just don't see where I make an error. I really hope that someone could lend me a handy suggestion, so that I'm no longer stuck!

My goal: I want to calculate the percentage of instances in an data.frame which have a result higher than 0. I've tried this with the for loop, but to no avail. So, after some more searching I used the apply function to calculate various metrics as mean, sd, and min/max. This works great, but for calculating the percentage the apply function doesn't work, even when I make a custom function, and insert this into the apply function.

This is the shortened version of my data.frame:

     tradesList[c(1:5,10:15),c(1,7)]
   Instrument TradeResult.Currency.
1         JPM                    -3
2         JPM                   264
3         JPM                   284
4         JPM                    69
5         JPM                   283
10        JPM                  -294
11        KFT                    -8
12        KFT                   -48
13        KFT                   125
14        KFT                  -150
15        KFT    开发者_运维百科              -206

I want to summarize this data.frame, for example by displaying the average TradeResult for each Instrument:

> tapply(tradesList$TradeResult.Currency., tradesList$Instrument, mean)
 JPM  KFT 
42.3 14.6 

However, I would also like to calculate the percentage of rows which have an TradeResult > 0 per instrument. With the 'which' function checking for instances which are > 0 does work, however, apply won't accept this function as an argument.

> length(which(tradesList$TradeResult.Currency. > 0)) / length(tradesList$TradeResult.Currency.) * 100
[1] 50
> tapply(tradesList$TradeResult.Currency., tradesList$Instrument, (length(which(tradesList$TradeResult.Currency. > 0)) / length(tradesList$TradeResult.Currency.) * 100))
Error in match.fun(FUN) : 
  c("'(length(which(tradesList$TradeResult.Currency. > 0))/length(tradesList$TradeResult.Currency.) * ' is not a function, character or symbol", "'    100)' is not a function, character or symbol")
> 

I searched in the help function for more information on this error, and tried various different ways of formulating the function (for example with brackets or quotes), but each way led to the same result.

Does someone know a whay to calculate the percentage of instances which are greater than zero? Perhaps I'm missing something?

Thanks alot in advance,

Regards,

Edit: Thanks alot for your quick comments G. Grothendieck, Gavin Simpson and DWin. Highly appreciated and quite helpful!

Solved: Here's what I have now:

> tmpData <- tradesList[c(1:5,10:15),c(1,7)]
> tmpData
   Instrument TradeResult.Currency.
1         JPM                    -3
2         JPM                   264
3         JPM                   284
4         JPM                    69
5         JPM                   283
10        JPM                  -294
11        KFT                    -8
12        KFT                   -48
13        KFT                   125
14        KFT                  -150
15        KFT                  -206
> 100*    # to get percentages
+ with( tmpData, 
+ tapply( (TradeResult.Currency. > 0) , Instrument, sum)/   # number GT 0
+        tapply( TradeResult.Currency., Instrument, length) ) # total number
     JPM      KFT 
66.66667 20.00000 
> 100 * tapply(tmpData$TradeResult.Currency. > 0, tmpData$Instrument, mean)
     JPM      KFT 
66.66667 20.00000 
> pcentFun <- function(x) {
+     res <- x > 0
+     100 * (sum(res) / length(res))
+ }
> 
> with(tmpData, tapply(TradeResult.Currency., Instrument, pcentFun))
     JPM      KFT 
66.66667 20.00000

Thanks again!

Regards,


Write a simple function to do the computation:

pcentFun <- function(x) {
    res <- x > 0
    100 * (sum(res) / length(res))
}

Then we can apply that to groups of Instruments, via tapply()

> with(tradeList, tapply(TradeResult.Currency, Instrument, pcentFun))
     JPM      KFT 
66.66667 20.00000 

but aggregate() would be more useful if you want the summary with instrument names:

> with(tradesList, aggregate(TradeResult.Currency, 
+                            by = list(Instrument = Instrument), pcentFun))
  Instrument        x
1        JPM 66.66667
2        KFT 20.00000


Try this:

100 * tapply(tradesList$TradeResult.Currency. > 0, tradesList$Instrument, mean)

With the sample data in the post it gives:

  JPM   KFT 
66.67 20.00 

and here it is using sqldf (note that the RSQLite driver translates dots to underscores since dots are also an SQL operator so we use underscores where dots were):

> library(sqldf)
> sqldf("select Instrument, 
+     100 * avg(TradeResult_Currency_ > 0) as '%>0',
+     avg(TradeResult_Currency_) as 'Avg Currency'
+     from tradesList group by Instrument")
  Instrument   %>0 Avg Currency
1        JPM 66.67        100.5
2        KFT 20.00        -57.4

These two could also be translated to aggregate by suitable modification of the aggregate solution already posted.


You can work with logical results using sum or mean to get meaningful summary results:

100*    # to get percentages
with( tradesList, 
tapply( (TradeResult.Currency. > 0) , Instrument, sum)/   # number GT 0
       tapply( TradeResult.Currency., Instrument, length) ) # total number

Edit: I noticed that Gavin gave you an answer that returned a dataframe, a generally well understood class. The class of the values from both Gabor's and my replies were arrays of one dimension. They could be turned into named-vectors by surrounding the object by c() which doubles as a concatenation and coercion to vector function. And as they stand they are perfectly acceptable for rbinding or accessing with "[" in the expected ways and retrun expected results from names().

The tapply function returns arrays with the number of dimensions in the INDEX argument(s) and can be very effectively combined for matrix operations with table objects. I do a lot of dividing sums by counts, or counts by sums to get meaningful category statistics in 2, 3 or 4 dimensions.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜