data imported as class "null" - unable to perform statistics, unable to change class

2023-03-18 22:08 问答作者：

I am having rather lengthy problems concerning my data set and I believe that my trouble trace back to importing the data. I have looked at many other questions and answers as well as as many help sites as I can find, but I can't seem to make anything work. I am attemping to run some TTests on my data and have thus far been unable to do so. I believe the root cause is the data is imported as class NULL. I've tried to include as much information here as I can to show what I am working with and the types of issues I am having (in case the issue is in some other area)

An overview of my data and what i've been doing so far is this:

Example File data (as displayed in R after reading data from .csv file):

Part   Q001    Q002   LA003    Q004   SA005       D106
1       5       3     text      99     text        3
2       3             text      2      text        2 
3       2      4                3      text        5
4      99      5      text      2                  2
5       4      2                1      text        3

So in my data, the "answers" are 1 through 5. 99 represents a question that was answered N/A. blanks represent unanswered questions. the 'text' questions are long and short answer/comments from a survey. All of them are stored in a large data set over over 150 Participants (Part) and over 300 questions (labled either Q, LA, SA, or D based on question with a 1-5 answer, long answer, short answer, or demographic (also numeric answers 0 thought 6 or so)).

When I import the data, I need to have it disregard any blank or 99 answers so they do not interfere with statistics.开发者_JAVA百科 I also don't care about the comments, so I filter all of them out.

EDIT: data file looks like:

Part,Q001,Q002,LA003,Q004,SA005,D006
1,5,3,text,99,text,3
2,3,,text,2,text,2
etc...

I am using the following lines to read the data:

data.all <- read.table("data.csv", header=TRUE, sep=",", na.strings = c("","99"))
data <- data.all[, !(colnames(data.all) %in% c("LA003", "SA005")

now, when I type

class(data$Q001)

I get NULL

I need these to be Numeric. I can use summary(data) to get the means and such, but when I try to run ttests, I get errors including NULL.

I tried to turn this column into numerics by using

data<-sapply(data,as.numeric)

and I tried

data[,1]<-as.numeric(as.character(data[,1]))

(and with 2 instead of 1, but I don't really understand the sapply syntax, I saw it in several other answers and was trying to make it work) when I then type

class(data$Q001)

I get "Error: $ operator is invalid for atomic vectors

If I do not try to use sapply, and I try to run a ttest, I've created subsets such as

data.2<-subset(data, D106 == "2")
data.3<-subset(data, D106 == "3")

and I use

t.test(data.2$Q001~data.3$Q001, na.rm=TRUE)

and I get "invalid type (NULL) for variable 'data.2$Q001'

I tried using the different syntax, trying to see if I can get anything to work, and

t.test(data.2$Q001, data.3$Q001, na.rm=TRUE)

gives "In is.na(d) : is.na() applied to non-(list or vector) of type 'NULL'" and "In mean.default(x) : argument is not numeric or logical: returning NA"

So, now that I think I've been clear about what I'm trying to do and some of the things I've tried...

How can I import my data so that numbers (specifically any number in a column with a header starting with Q) are accurately read as numbers and do not get a NULL class applied to them? What do I need to do in order to get my data properly imported to run TTests on it? I've used TTests on plenty of data in the past, but it has always been data I recorded manually in excel (and thus had only one column of numbers with no blanks or NAs) and I've never had an issue, and I just do not understand what it is about this data set that I can't get it to work. Any assistance in the right direction is much appreciated!

This works for me:

> z <- read.table(textConnection("Part,Q001,Q002,LA003,Q004,SA005,D006
+ 1,5,3,text,99,text,3
+ 2,3,,text,2,text,2
+ "),header=TRUE,sep=",",na.strings=c("","99"))
> str(z)
'data.frame':   2 obs. of  7 variables:
 $ Part : int  1 2
 $ Q001 : int  5 3
 $ Q002 : int  3 NA
 $ LA003: Factor w/ 1 level "text": 1 1
 $ Q004 : int  NA 2
 $ SA005: Factor w/ 1 level "text": 1 1
 $ D006 : int  3 2
> z2 <- z[,!(colnames(z) %in% c("LA003","SA005"))]
> str(z2)
'data.frame':   2 obs. of  5 variables:
 $ Part: int  1 2
 $ Q001: int  5 3
 $ Q002: int  3 NA
 $ Q004: int  NA 2
 $ D006: int  3 2
> z2$Q001
[1] 5 3
> class(z2$Q001)
[1] "integer"

The only I can think of is that your second command (which was missing some terminating parentheses and brackets) didn't work at all, you missed seeing the error message, and you are referring to some previously defined data object that doesn't have the same columns defined. For example, class(z$QQQ) is NULL following the above example.

edit: it appears that the original problem was some weird/garbage characters in the header that messed up the name of the first column. Manually renaming the column (names(data)[1] <- "Q001") seems to have fixed the problem.

data imported as class "null" - unable to perform statistics, unable to change class

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？