data imported as class "null" - unable to perform statistics, unable to change class
I am having rather lengthy problems concerning my data set and I believe that my trouble trace back to importing the data. I have looked at many other questions and answers as well as as many help sites as I can find, but I can't seem to make anything work. I am attemping to run some TTests on my data and have thus far been unable to do so. I believe the root cause is the data is imported as class NULL. I've tried to include as much information here as I can to show what I am working with and the types of issues I am having (in case the issue is in some other area)
An overview of my data and what i've been doing so far is this:
Example File data (as displayed in R after reading data from .csv file):
Part Q001 Q002 LA003 Q004 SA005 D106
1 5 3 text 99 text 3
2 3 text 2 text 2
3 2 4 3 text 5
4 99 5 text 2 2
5 4 2 1 text 3
So in my data, the "answers" are 1 through 5. 99 represents a question that was answered N/A. blanks represent unanswered questions. the 'text' questions are long and short answer/comments from a survey. All of them are stored in a large data set over over 150 Participants (Part) and over 300 questions (labled either Q, LA, SA, or D based on question with a 1-5 answer, long answer, short answer, or demographic (also numeric answers 0 thought 6 or so)).
When I import the data, I need to have it disregard any blank or 99 answers so they do not interfere with statistics.开发者_JAVA百科 I also don't care about the comments, so I filter all of them out.
EDIT: data file looks like:
Part,Q001,Q002,LA003,Q004,SA005,D006
1,5,3,text,99,text,3
2,3,,text,2,text,2
etc...
I am using the following lines to read the data:
data.all <- read.table("data.csv", header=TRUE, sep=",", na.strings = c("","99"))
data <- data.all[, !(colnames(data.all) %in% c("LA003", "SA005")
now, when I type
class(data$Q001)
I get NULL
I need these to be Numeric. I can use summary(data) to get the means and such, but when I try to run ttests, I get errors including NULL.
I tried to turn this column into numerics by using
data<-sapply(data,as.numeric)
and I tried
data[,1]<-as.numeric(as.character(data[,1]))
(and with 2 instead of 1, but I don't really understand the sapply syntax, I saw it in several other answers and was trying to make it work) when I then type
class(data$Q001)
I get "Error: $ operator is invalid for atomic vectors
If I do not try to use sapply, and I try to run a ttest, I've created subsets such as
data.2<-subset(data, D106 == "2")
data.3<-subset(data, D106 == "3")
and I use
t.test(data.2$Q001~data.3$Q001, na.rm=TRUE)
and I get "invalid type (NULL) for variable 'data.2$Q001'
I tried using the different syntax, trying to see if I can get anything to work, and
t.test(data.2$Q001, data.3$Q001, na.rm=TRUE)
gives "In is.na(d) : is.na() applied to non-(list or vector) of type 'NULL'" and "In mean.default(x) : argument is not numeric or logical: returning NA"
So, now that I think I've been clear about what I'm trying to do and some of the things I've tried...
How can I import my data so that numbers (specifically any number in a column with a header starting with Q) are accurately read as numbers and do not get a NULL class applied to them? What do I need to do in order to get my data properly imported to run TTests on it? I've used TTests on plenty of data in the past, but it has always been data I recorded manually in excel (and thus had only one column of numbers with no blanks or NAs) and I've never had an issue, and I just do not understand what it is about this data set that I can't get it to work. Any assistance in the right direction is much appreciated!
This works for me:
> z <- read.table(textConnection("Part,Q001,Q002,LA003,Q004,SA005,D006
+ 1,5,3,text,99,text,3
+ 2,3,,text,2,text,2
+ "),header=TRUE,sep=",",na.strings=c("","99"))
> str(z)
'data.frame': 2 obs. of 7 variables:
$ Part : int 1 2
$ Q001 : int 5 3
$ Q002 : int 3 NA
$ LA003: Factor w/ 1 level "text": 1 1
$ Q004 : int NA 2
$ SA005: Factor w/ 1 level "text": 1 1
$ D006 : int 3 2
> z2 <- z[,!(colnames(z) %in% c("LA003","SA005"))]
> str(z2)
'data.frame': 2 obs. of 5 variables:
$ Part: int 1 2
$ Q001: int 5 3
$ Q002: int 3 NA
$ Q004: int NA 2
$ D006: int 3 2
> z2$Q001
[1] 5 3
> class(z2$Q001)
[1] "integer"
The only I can think of is that your second command (which was missing some terminating parentheses and brackets) didn't work at all, you missed seeing the error message, and you are referring to some previously defined data
object that doesn't have the same columns defined. For example, class(z$QQQ)
is NULL
following the above example.
edit: it appears that the original problem was some weird/garbage characters in the header that messed up the name of the first column. Manually renaming the column (names(data)[1] <- "Q001"
) seems to have fixed the problem.
精彩评论