开发者

R `summary` when not all cells have data

Is there an argument in summary (or another command) to force R to calculate values when there is "no data" in every cell?

In my questionnaire subjects did not provide all information; and for those cells I entered -nodata-. For cells where the answer is not applicable (based on the previous question in q.) I entered -1. The summary looks like this:

> summary(qs$ESC) 
      -1 -nodata-      0.5        1       12       15        3 
      49        3        1        1        1        1        1 

What I want is calculated summary. Is there a way to tell R to disregard 开发者_如何学JAVA-nodata- and -1?


I don't really understand what kind of summary you want to compute.

If you use NA instead of your "-nodata-" and "-1" codes, they would be automatically taken into account when using the summary function :

For example :

R> v <- c(NA, NA, 0.5, 1, 12, 15, 3)
R> summary(v)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
    0.5     1.0     3.0     6.3    12.0    15.0     2.0 
R> table(v)
v
0.5   1   3  12  15 
  1   1   1   1   1 

You can see that here v is considered as numeric, as there is no string value in it. When you introduce the "-nodata-" value it will be treated either as a character or as a factor variable.

You can also use the exclude argument of the table function to automatically ignore some values :

R> v <- c(-1, "-nodata-", 0.5, 1, 12, 15, 3)
R> table(v)
v
     0.5        1       -1       12       15        3 -nodata- 
       1        1        1        1        1        1        1 
R> table(v, exclude=c(-1, "-nodata-"))
v
0.5   1  12  15   3 
  1   1   1   1   1 


It's very likely that the ESC column is a factor, the default for a data.frame from read.table when guesses are made at the column's data type and character is required. You should probably add the argument stringsAsFactors=FALSE to the original call to read.table - which will give the column as text and then convert the "-nodata-" value to NA, then convert to numeric.

There's also an na.strings argument to read.table, which could be set as na.strings = "-nodata-" to automatically replace these with NA.

Finally a guess to go from your existing data.frame, replace the nodata value and convert to numeric:

qs$ESC[qs$ESC == "-nodata-"] <- NA
summary(as.numeric(levels(qs$ESC))[qs$ESC]

That indexing on the factor's levels is recommended by ?factor, but you should step back to where the data were read or otherwise generated and

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜