„persuading” tabulate function to count NAs in a data frame in [R]

2023-02-18 22:00 问答作者：

I’d like to ask you a question again. It is basically about data frames, NAs and tabulate function in [R].

I have this data frame. I already used this in a previous question of mine. It intentionally looks this simple, my real ’df’ dataframe is much bigger actually and again, I am not willing to annoy anyone with huge databases… So, my database:

id <-c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3)
a <-c(3,1,3,3,1,3,3,3,3,1,3,2,1,2,1,3,3,2,1,1,1,3,1,3,3,3,2,1,1,3)
b <-c(3,2,1,1,1,1,1,1,1,1,1,2,1,3,2,1,1,1,2,1,3,1,2,2,1,3,3,2,3,2)
c <-c(1,3,2,3,2,1,2,3,3,2,2,3,1,2,3,3,3,1,1,2,3,3,1,2,2,3,2,2,3,2)
d <-c(3,3,3,1,3,2,2,1,2,3,2,2,2,1,3,1,2,2,3,2,3,2,3,2,1,1,1,1,1,2)
e <-c(2,3,1,2,1,2,3,3,1,1,2,1,1,3,3,2,1,1,3,3,2,2,3,3,3,2,3,2,1,4)
df <-data.frame(id,a,b,c,d,e)
df

I have managed to calculate the distributions of the numbers occurring in columns ’b’ to ’e’ but considering the fact at the very same time that these distributions should be ’groupped by’ the id numbers in column ’id’. It works fine, check it ->

matrix(matrix(unlist(lapply(df[,(-(1))], 
       function(x) tapply(x,df$id,tabulate,
                          nbins=nlevels(factor(df[,2])))) [[1]])), 
              ncol=3,nrow=3,byrow=TRUE)

matrix(matrix(unlist(lapply(df[,(-(1))],function(x) tapply(x,df$id,tabulate,nbins=nlevels(factor(df[,3])))) [[2]])),ncol=3,nrow=3,byrow=TRUE)

matrix(matrix(unlist(lapply(df[,(-(1))],function(x)开发者_JAVA百科 tapply(x,df$id,tabulate,nbins=nlevels(factor(df[,4])))) [[3]])),ncol=3,nrow=3,byrow=TRUE)

matrix(matrix(unlist(lapply(df[,(-(1))],function(x) tapply(x,df$id,tabulate,nbins=nlevels(factor(df[,5])))) [[4]])),ncol=3,nrow=3,byrow=TRUE)

matrix(matrix(unlist(lapply(df[,(-(1))],function(x) tapply(x,df$id,tabulate,nbins=nlevels(factor(df[,6])))) [[5]])),ncol=4,nrow=3,byrow=TRUE)

Now my problem is: what if my data frame contains NA values here and there and what if I want my in-built tabulate function to collect these NAs as well? So what if I want it to count how many occurrences I have from these NAs?

Here’s my modified data frame with the NAs:

id <-c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3)
a <-c(NA,1,3,3,1,3,3,3,3,1,3,2,1,2,1,3,3,2,1,1,1,3,1,3,3,3,2,1,1,3)
b <-c(3,2,1,1,1,1,1,1,1,1,1,2,1,3,2,1,1,1,2,1,3,1,2,2,1,3,3,2,3,2)
c <-c(1,3,2,3,2,1,2,3,3,2,2,3,NA,2,3,3,3,1,1,2,3,3,1,2,2,3,2,2,3,2)
d <-c(3,3,3,1,3,2,2,1,2,3,2,2,2,1,3,1,2,2,3,2,3,2,3,2,1,1,1,1,1,2)
e <-c(2,3,1,2,1,2,3,3,1,1,2,1,1,3,3,2,1,1,3,3,2,2,3,3,3,2,3,NA,1,4)
df <-data.frame(id,a,b,c,d,e)
df

At first I tried something like this:

unlist(lapply(df[,(-(1))],function(x) tapply(x,df$id,tabulate,nbins=nlevels(factor(df[,2],exclude=NULL)))) [[1]])

You see, the only thing I did was that I tried to apply this exclude=NULL thing.

At least my code realizes the fact that I have 4 different levels in column a (1,2,3,NA) and not only three (1,2,3). Check it here:

nlevels(factor(df[,2], exclude=NULL))

But you see in the result that somehow it could not calculate the NAs. It says

3  0  6  0  4  3  3  0  4  1  5  0

Instead of the correct:

3  0  6  1  4  3  3  0  4  1  5  0

Or in case of:

unlist(lapply(df[,(-(1))],function(x) tapply(x,df$id,tabulate,nbins=nlevels(factor(df[,4],exclude=NULL)))) [[3]])

It says

2  4  4  0  2  3  4  0  1  5  4  0

Instead of the correct

2  4  4  0  2  3  4  1  1  5  4  0

etc.

Does someone have any ideas how to "persuade" the function tabulate to count NAs? Is it possible at all?

Thanks very much and have a pleasant weekend,

Laszlo

You can simplify your repeated calls to:

tabs <-lapply(df[,2:6], function(x, id){ t(table(x, id)) }, df$id)

which gives almost the same as your repeated matrix calls, e.g. for your first (non-NA) one:

> tabs[[1]]
   x
id  1 2 3
  1 3 0 7
  2 4 3 3
  3 4 1 5

So can we now modify this to deal with NA? Yes, using the useNA argument of the table() function. Using your df with NA, we have:

tabs <-lapply(df[,2:6], 
              function(x, id){ t(table(x, id, useNA = "ifany")) }, df$id)

> tabs[[1]]
   x
id  1 2 3 <NA>
  1 3 0 6    1
  2 4 3 3    0
  3 4 1 5    0

Because we ask for NA in the table only if an NA exists, not all the tables in tabs have the same number of columns. If that is important, we can change useNA = "ifany" to be useNA = "always" and all the result tables will have the same number of columns, however it adds another id row:

> tabs[[1]]
      x
id     1 2 3 <NA>
  1    3 0 6    1
  2    4 3 3    0
  3    4 1 5    0
  <NA> 0 0 0    0

One final addition gets what we want - we use addNA() to add an NA level to each id's set of numbers, even if there are no NAs recorded:

tabs <-lapply(df[,2:6], 
              function(x, id){ t(table(addNA(x), id, useNA = "ifany")) }, df$id)

Which gives:

> tabs
$a

id  1 2 3 <NA>
  1 3 0 6    1
  2 4 3 3    0
  3 4 1 5    0

$b

id  1 2 3 <NA>
  1 8 1 1    0
  2 6 3 1    0
  3 2 4 4    0

$c

id  1 2 3 <NA>
  1 2 4 4    0
  2 2 3 4    1
  3 1 5 4    0

$d

id  1 2 3 <NA>
  1 2 3 5    0
  2 2 6 2    0
  3 5 3 2    0

$e

id  1 2 3 4 <NA>
  1 4 3 3 0    0
  2 4 2 4 0    0
  3 1 3 4 1    1

Can't you just use is.na? If you want to count up the number of entries that are NA or non-zero, you could sum(is.na(my.var)|my.var>0).

继续阅读：dataframe r

„persuading” tabulate function to count NAs in a data frame in [R]

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？