Numeric Column in data.frame returning "num" with str() but not is.numeric()
I have a data.frame, d1, that has 7 columns, the 5th through 7th column are supposed to be numeric:
str(d1[5])
'data.frame': 871 obs. of 1 variable:
$ Latest.Assets..Mns.: num 14008 1483 11524 1081 2742 ...
is.numeric(d1[5])
[1] FALSE
as.numeric(d1[5])
Error: (list) object cannot be coerced to type 'double'
How can this be? If str identifies it as numeric, how can it not be nu开发者_运维知识库meric? I'm importing from CSV.
> is.numeric_data.frame=function(x)all(sapply(x,is.numeric))
> is.numeric_data.frame(d1[[5]])
[1] TRUE
Why
d1
is a list, hence d1[5]
is a list of length 1, and in this case contains a data.frame
. to get the data frame, use d1[[5]]
.
Even if a data frame contains numeric data, it isn't numeric itself:
> x = data.frame(1:5,6:10)
> is.numeric(x)
[1] FALSE
Individual columns in a data frame are either numeric or not numeric. For instance:
> z <- data.frame(1:5,letters[1:5])
> is.numeric(z[[1]])
[1] TRUE
> is.numeric(z[[2]])
[1] FALSE
If you want to know if ALL columns in a data frame are numeric, you can use all
and sapply
:
> sapply(z,is.numeric)
X1.5 letters.1.5.
TRUE FALSE
> all(sapply(z,is.numeric))
[1] FALSE
> all(sapply(x,is.numeric))
[1] TRUE
You can wrap this all up in a convenient function:
> is.numeric_data.frame=function(x)all(sapply(x,is.numeric))
> is.numeric_data.frame(d1[[5]])
[1] TRUE
d1[5] is not a single value. It's a vector (possibly a list?) of values. If you grab a single value I bet it is numeric. For example:
is.numeric(d1[5][[1]])
as.numeric(d1[5][[1]])
So I think the confusion is between the column object and the elements in the column. R makes a distinction between those two ideas while other languages, like SQL, functionally assume that when discussing the column you're usually referring to the elements of the column.
This discussion of indexing from the R Language Definition doc really helped me wrap my head around how to reference items in R.
It may be a list (based on the error message). Have you tried class(d1[5])
? If it's a list, then you would expect either d1[[5]]
or d1[5][[1]]
to be numeric.
Edit:
Given that d1[5] is itself a data frame, you need to treat it as such. Something like this should work:
is.numeric(d1[5][,1])
精彩评论