perform a statistical test on specific data in ggplot2
i wrote a script that generate plots using ggplot2 , and in each plot there is multiple x-axis value and each one of them have multiple values on the y-axis for multiple variables on this axis.
i will ask the question in another way : i have multiple subset of data in a data frame , generated inside a for loop , how can i contro开发者_开发百科l the looping of the for in order to generate another data frame that contain in each row (the value of the first column of the previous data frames)
for (x in phy) {
print(x)
test<-subset(t, Phylum==x)
dat <- melt(test, measure=c("A","C","G","T","(A-T)/(A+T)","(G-C)/(G+T)",
"(A+T)/(G+C)"))
unitest <- unique(c(test$Class))
#print(nrow(test))
i <- 1
for(y in unitest) {
towork <- subset(test, Class==y)
# here i want to create a data frame that will contain (in each row, the
# value of the first column of the towork subset for each y)
# atest=wilcox.test(towork$A,towork$A, correct=FALSE)
# print(paste(paste(y,towork$A),towork$A))
}
}
input:
e.g
class1:
0.268912 0.158921 0.214082 0.358085
1.680946 0.314681 0.210526 0.166895
0.286945 0.322006 0.147361 0.243688
class2
0.293873 0.327516 0.156235 0.222376
0.327430 0.308667 0.135710 0.227695
0.301488 0.326511 0.125865 0.246022
0.310980 0.308730 0.148861 0.231429
i want to the new data frame to contain in each row the first column of each class.
output
e.g
1st row: 0.268912 1.680946 0.286945
2nd row:0.293873 0.327430 0.301488 0.310980
etc... and then another data frame that contain in each row the 2nd column of each class etc...
than i want to perform a statistical test on each 2 row of the new data frame together (e.g Wilcoxon Rank Sum Test) and get the result.
any help would be appreciated
Hello , i came up with an idea , but i need your help to do it.
first the data is in a large text file and i will upload it if you want , my idea is : create a function that take 2 argument :
1.the name of the column which should be used for grouping the data (e.g. phylum, or class)
2. the name of the column containing the data to test (e.g. A,C,G,T)
and i will test the data for each phylum first , and if i want i will test it for each class in each phylum.
that's mean,i will take the A column for first phylum and A column for 2nd phylum and make the wilcox.test on them , and i will make the process for each common column in each phylum. and then i will use a subset function to test the classes inside each phylum.
give me your opininon with this ??
thnx in advance.
I think this will do what you are after. We don't necessarily need to go through the process of making new data.frames for the four variables of interest - we can extract the columns of interest from their respective locations within class1
and class2
. Code has been updated to find the common columns between class1 and class2. It will only compute the wilcox test for those common columns.
class1 <- matrix(rnorm(12), ncol = 4)
class2 <- matrix(rnorm(16), ncol = 4)
computeWilcox <- function(x, y, correct = FALSE, ...) {
if (!is.numeric(x)) stop("x must be numeric.")
if (!is.numeric(y)) stop("y must be numeric.")
commonCols <- intersect(colnames(x), colnames(y))
ret <- vector("list", length(commonCols))
for (col in 1:length(commonCols)) {
ret[[col]] <- wilcox.test(x[, col], y[, col], correct = correct, ...)
}
names(ret) <- commonCols
return(ret)
}
zz <- computeWilcox(class1, class2)
Where zz has a structure like:
> str(zz)
List of 2
$ c:List of 7
..$ statistic : Named num 0
.. ..- attr(*, "names")= chr "W"
..$ parameter : NULL
..$ p.value : num 0.0571
..$ null.value : Named num 0
.. ..- attr(*, "names")= chr "location shift"
..$ alternative: chr "two.sided"
..$ method : chr "Wilcoxon rank sum test"
..$ data.name : chr "x[, col] and y[, col]"
..- attr(*, "class")= chr "htest"
$ d:List of 7
..$ statistic : Named num 2
.. ..- attr(*, "names")= chr "W"
..$ parameter : NULL
..$ p.value : num 0.229
..$ null.value : Named num 0
.. ..- attr(*, "names")= chr "location shift"
..$ alternative: chr "two.sided"
..$ method : chr "Wilcoxon rank sum test"
..$ data.name : chr "x[, col] and y[, col]"
..- attr(*, "class")= chr "htest"
You can extract the parameter or p-value out of the returned list object like this:
> zz$c$p.value
[1] 0.05714286
精彩评论