perform a statistical test on specific data in ggplot2

2023-02-20 11:08 问答作者：

i wrote a script that generate plots using ggplot2 , and in each plot there is multiple x-axis value and each one of them have multiple values on the y-axis for multiple variables on this axis.

i will ask the question in another way : i have multiple subset of data in a data frame , generated inside a for loop , how can i contro开发者_开发百科l the looping of the for in order to generate another data frame that contain in each row (the value of the first column of the previous data frames)

for (x in phy) {
    print(x)

    test<-subset(t, Phylum==x)
    dat <- melt(test, measure=c("A","C","G","T","(A-T)/(A+T)","(G-C)/(G+T)",
                                "(A+T)/(G+C)"))
    unitest <- unique(c(test$Class))
    #print(nrow(test))
    i <- 1
    for(y in unitest) {
        towork <- subset(test, Class==y)

        # here i want to create a data frame that will contain (in each row, the
        # value of the first column of the towork subset for each y)

        # atest=wilcox.test(towork$A,towork$A, correct=FALSE)
        # print(paste(paste(y,towork$A),towork$A))
    }
}



input:

    e.g 
    class1:
    0.268912    0.158921    0.214082    0.358085
    1.680946         0.314681   0.210526    0.166895
    0.286945    0.322006    0.147361    0.243688
    class2
    0.293873    0.327516    0.156235    0.222376    
    0.327430    0.308667    0.135710    0.227695    
    0.301488    0.326511    0.125865    0.246022    
    0.310980    0.308730    0.148861    0.231429

i want to the new data frame to contain in each row the first column of each class.

output
    e.g
    1st row: 0.268912 1.680946 0.286945
    2nd row:0.293873 0.327430 0.301488 0.310980

etc... and then another data frame that contain in each row the 2nd column of each class etc...

than i want to perform a statistical test on each 2 row of the new data frame together (e.g Wilcoxon Rank Sum Test) and get the result.

any help would be appreciated

Hello , i came up with an idea , but i need your help to do it.
first the data is in a large text file and i will upload it if you want , my idea is : create a function that take 2 argument : 
1.the name of the column which should be used for grouping the data (e.g. phylum, or class)
2. the name of the column containing the data to test (e.g. A,C,G,T)
and i will test the data for each phylum first , and if i want i will test it for each class in each phylum.
that's mean,i will take the A column for first phylum and A column for 2nd phylum and make the wilcox.test on them ,  and i will make the process for each common column in each phylum. and then i will use a subset function to test the classes inside each phylum.  
give me your opininon with this ??

thnx in advance.

I think this will do what you are after. We don't necessarily need to go through the process of making new data.frames for the four variables of interest - we can extract the columns of interest from their respective locations within class1 and class2. Code has been updated to find the common columns between class1 and class2. It will only compute the wilcox test for those common columns.

class1 <- matrix(rnorm(12), ncol = 4)
class2 <- matrix(rnorm(16), ncol = 4)

computeWilcox <- function(x, y, correct = FALSE, ...) {

    if (!is.numeric(x)) stop("x must be numeric.")
    if (!is.numeric(y)) stop("y must be numeric.")

    commonCols <- intersect(colnames(x), colnames(y))

    ret <- vector("list", length(commonCols))

    for (col in 1:length(commonCols)) {
        ret[[col]] <- wilcox.test(x[, col], y[, col], correct = correct, ...)
    }

    names(ret) <- commonCols
    return(ret)
}


zz <- computeWilcox(class1, class2)

Where zz has a structure like:

> str(zz)
List of 2
 $ c:List of 7
  ..$ statistic  : Named num 0
  .. ..- attr(*, "names")= chr "W"
  ..$ parameter  : NULL
  ..$ p.value    : num 0.0571
  ..$ null.value : Named num 0
  .. ..- attr(*, "names")= chr "location shift"
  ..$ alternative: chr "two.sided"
  ..$ method     : chr "Wilcoxon rank sum test"
  ..$ data.name  : chr "x[, col] and y[, col]"
  ..- attr(*, "class")= chr "htest"
 $ d:List of 7
  ..$ statistic  : Named num 2
  .. ..- attr(*, "names")= chr "W"
  ..$ parameter  : NULL
  ..$ p.value    : num 0.229
  ..$ null.value : Named num 0
  .. ..- attr(*, "names")= chr "location shift"
  ..$ alternative: chr "two.sided"
  ..$ method     : chr "Wilcoxon rank sum test"
  ..$ data.name  : chr "x[, col] and y[, col]"
  ..- attr(*, "class")= chr "htest"

You can extract the parameter or p-value out of the returned list object like this:

> zz$c$p.value
[1] 0.05714286

继续阅读：ggplot2 r

perform a statistical test on specific data in ggplot2

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？