Cross tabulate true values only in R

2023-03-24 03:11 问答作者：

I have a dataframe "dataAll" and some vars (A,B) on it, I want a table like

   A   >10    >20     >30

B

>1

>10

>100

To do that, I coded like

with(dataAll, table(A=A>10,B=B>1)) which gives me

         A
               FALSE    TRUE
B

        FALSE  220357   4798

        TRUE  596开发者_高级运维618 210080

So, here the value 210080 should go to first cell in the upper table

I tried something like this with no success

with(dataAll, table(A=A>c(10,20,30),B=B>c(1,10,100)))

and even

with(dataAll, table(A=c(A>10,A>20,A>30),B=c(B>1,B>10,B>100)))

but no success,

I know there should be a way to do this...

A.categ <- cut(A, breaks = c(-Inf, 10, 20, 30, Inf), right=FALSE)
B.categ <- cut(B, breaks = c(-Inf, 1, 10, 100, Inf), right=FALSE)
table(A.categ, B.categ)

The trick with cut is remembering to set right=FALSE, since that is the way most people expect it to work. In fact when Frank Harrell made his version of cut2 for Hmisc, he set that as the default option.

When you do that with the example cosntructed by Tommy you get

> A.categ <- cut(d$A, breaks = c(-Inf, 10, 20, 30, Inf), right=FALSE)
>     B.categ <- cut(d$B, breaks = c(-Inf, 1, 10, 100, Inf), right=FALSE)
>     table(A.categ, B.categ)
           B.categ
A.categ     [-Inf,1) [1,10) [10,100) [100, Inf)
  [-Inf,10)        0      1        1          9
  [10,20)          0      2        3          2
  [20,30)          0      5        4          1
  [30, Inf)        0     17       11         44

Not every understands the open/closed convention so sometimes you need to go in and rework the labels of a factored variable that you constructed with cut so the less mathematically inclined client can map it to his conventions. You use the factor function and specify the labels argument (and do NOT specify the levels argument or you will "break the variable")

> A.categ <- factor(A.categ, labels=c(" Less than 1", "1-9.9", "10-99.9" , "100+") )
>     table(A.categ, B.categ)
              B.categ
A.categ        [-Inf,1) [1,10) [10,100) [100, Inf)
   Less than 1        0      1        1          9
  1-9.9               0      2        3          2
  10-99.9             0      5        4          1
  100+                0     17       11         44

try the cut function.

?cut

it works with breaks like you want.

cut(x,breaks,labels,...)


 table(cut(A[which(B<1)],breaks=c(0,10,20,30)))


 table(cut(A[which(B>1)],breaks=c(0,10,20,30)))

Here's a little vapply - based solution. Is assumes you want the total count for each cell (for example, A > 20 & B > 100) - not the count for (A > 20 & A < 30) & (B > 100 & B < 1000).

# Create some data
set.seed(42)
n <- 100;
dataAll <- data.frame(A=runif(n, 1,100), B=10^runif(n, 0, 4))

# And some break points    
a <- 1:10*10 # 10, 20 etc...
b <- 10^(0:4) # 1, 10, 100, 1000


f <- function(A, a, B, b) {
    structure(t(vapply(b, function(bb) {
    vapply(a, function(aa, A) sum(A > aa), 1, A[B > bb])    
    }, a)), dimnames=list(B=b, A=a))
}

f(dataAll$A, a, dataAll$B, b)

Which gives the following table:

       A
B       10 20 30 40 50 60 70 80 90 100
  1     89 82 72 63 55 46 34 23 16   0
  10    65 60 55 47 41 34 26 18 12   0
  100   47 45 44 39 34 28 21 14 10   0
  1000  20 19 18 17 16 12  8  5  5   0
  10000  0  0  0  0  0  0  0  0  0   0

The meat of it all is to count the TRUE values for one a condition sum(A > aa), and then do that for all a conditions with a call to vapply.

vapply(a, function(aa, A) sum(A > aa)

Then do it again for each b condition, and add some dimnames to the (transposed) result.

Cross tabulate true values only in R

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？