summarising multiple non-exclusive dummy variables in R into one variable
I was sent a dataset with multiple dummy variables and other variables as well. Basically what I´d like to do is create summary table with summary.formula from rms. However, I do not know how to create a single variable from the multiple dummy variables and they are not mutually exclusive. Is this at all possible. Of course I could do it creating a table etc, but then I cannot use summary.formula and I´d like the summary.formula output to include just the individual levels of the dummy variables.
edit: to clarify: a & b need to be summarized, but they are not mutually exclusive. Since age is recorded for every row I need to summarize a & b into one variable for it to be used in summary.formula. I´ve edited the code below so that 0 and 1 are changed into NA or a,b respectively.
I´d like the summary.formula output to be something like this:
h<-data.frame(a=sample(c("A",NA),100,replace=T),b=sample(c("B",NA),100,replace=T),age=rnorm(100,50,25),epo=sample(c("Y","N"),100,T))
library(rms)
summary.formula(epo~age####+summary variable of a & b######,method="reverse",data=h)
#-----------------
Descriptive Statistics by epo
+---------+--------------------------+--------------------------+
| |N |Y |
| |(N=56) |(N=44) |
+---------+--------------------------+--------------------------+
|age |31.53434/48.90788/67.69096|28.63689/43.93502/57.81834|
+---------+--------------------------+--------------------------+
|sab : A | 25% (14) | 16% ( 7) |
+---------+--------------------------+------开发者_运维百科--------------------+
| B | 27% (15) | 32% (14) |
+---------+--------------------------+--------------------------+
Using paste() seems to work acceptably.
h$sab <- paste(h$a, h$b, sep="_")
summary.formula(epo~age+sab,method="reverse",data=h)
#-----------------
Descriptive Statistics by epo
+---------+--------------------------+--------------------------+
| |N |Y |
| |(N=56) |(N=44) |
+---------+--------------------------+--------------------------+
|age |31.53434/48.90788/67.69096|28.63689/43.93502/57.81834|
+---------+--------------------------+--------------------------+
|sab : 0_0| 25% (14) | 16% ( 7) |
+---------+--------------------------+--------------------------+
| 0_1 | 27% (15) | 32% (14) |
+---------+--------------------------+--------------------------+
| 1_0 | 25% (14) | 34% (15) |
+---------+--------------------------+--------------------------+
| 1_1 | 23% (13) | 18% ( 8) |
+---------+--------------------------+--------------------------+
Another option might be interaction():
summary.formula(epo~age+interaction(a,b),method="reverse",data=h)
If instead you want a logical 'OR" applied to the combination of variables, then use:
h$a_or_b <- with(h, a|b)
summary.formula(epo ~ age+ h$a_or_b,method="reverse",data=h)
精彩评论