R `summary` when not all cells have data
Is there an argument in summary
(or another command) to force R to calculate values when there is "no data" in every cell?
In my questionnaire subjects did not provide all information; and for those cells I entered -nodata-
. For cells where the answer is not applicable (based on the previous question in q.) I entered -1
. The summary
looks like this:
> summary(qs$ESC)
-1 -nodata- 0.5 1 12 15 3
49 3 1 1 1 1 1
What I want is calculated summary. Is there a way to tell R to disregard 开发者_如何学JAVA-nodata-
and -1
?
I don't really understand what kind of summary you want to compute.
If you use NA instead of your "-nodata-" and "-1" codes, they would be automatically taken into account when using the summary
function :
For example :
R> v <- c(NA, NA, 0.5, 1, 12, 15, 3)
R> summary(v)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.5 1.0 3.0 6.3 12.0 15.0 2.0
R> table(v)
v
0.5 1 3 12 15
1 1 1 1 1
You can see that here v
is considered as numeric, as there is no string value in it. When you introduce the "-nodata-" value it will be treated either as a character or as a factor variable.
You can also use the exclude
argument of the table
function to automatically ignore some values :
R> v <- c(-1, "-nodata-", 0.5, 1, 12, 15, 3)
R> table(v)
v
0.5 1 -1 12 15 3 -nodata-
1 1 1 1 1 1 1
R> table(v, exclude=c(-1, "-nodata-"))
v
0.5 1 12 15 3
1 1 1 1 1
It's very likely that the ESC
column is a factor, the default for a data.frame from read.table when guesses are made at the column's data type and character is required. You should probably add the argument stringsAsFactors=FALSE
to the original call to read.table - which will give the column as text and then convert the "-nodata-" value to NA, then convert to numeric.
There's also an na.strings argument to read.table, which could be set as na.strings = "-nodata-"
to automatically replace these with NA.
Finally a guess to go from your existing data.frame, replace the nodata value and convert to numeric:
qs$ESC[qs$ESC == "-nodata-"] <- NA
summary(as.numeric(levels(qs$ESC))[qs$ESC]
That indexing on the factor's levels is recommended by ?factor, but you should step back to where the data were read or otherwise generated and
精彩评论