With R, loop over two files at a time
Hello my favourite coding experts,
I am trying to loop through two files at a time in R: i.e. take one 'case' file and another 'control' file, create a graph and dump it into a pdf, then take another set of 2 files and do the same and so on. I have a list indicating which file is a case and which is a control, like this:
case control
A01 G01
A02 G02
A06 G03
and so on… which can be reproduced like this: mylist<- data.frame(rbind(c("A01","G01"),c("A02","G02"),c("A06","G03"))) colnames(mylist)<- c('control', 'case')
I cannot find a way to specify which 2 files to loop through each time. The file (each file with many variables) are: "/Users/francy/Desktop/cc_files_A01", ""/Users/francy/Desktop/cc_files_A02", "/Users/francy/Desktop/cc_files_A06", "/Users/francy/Desktop/cc_files_G01", "/Users/francy/Desktop/cc_files_G02", "/Users/francy/Desktop/cc_files_G03"
For each set of case and control, I would like to do this:
case<- read.table(file="/Users/francy/Desktop/case_files_A01.txt", sep = '\t', header = F)
case <- case[,c(1,2,19,20)]
colnames(case)<- c("ID", "fname", "lname", "Position")
control<- read.table(file="/Users/francy/Desktop/case_files_G01.txt", sep = '\t', header = F)
control <- control[,c(1,2,19,20)]
colnames(control)<- c("ID", "fname", "lname", "Position")
#t-test Position:
test<- t.test(case[20],control[20])
p.value= round(test$p.value, digits=3)
mean_case= round(mean(case[20], na.rm=T), digits=2)
mean_control= round(mean(control[20], na.rm=T), digits=2)
boxplot(c(case[20], control[20]), names=c(paste("case", "mean", mean_case, sep=":"),paste("co开发者_如何转开发ntrol", "mean", mean_control, sep=":")))
And want to create a pdf file with all the boxplots.
This is what I have for now:
myFiles <- list.files(path= "/mypath/", pattern=".txt")
pdf('/home/graph.pdf')
for (x in myFiles) {
control <- read.table(file = myFiles[x], sep = '\t', header = F)
## How do I specify that is the other file here, and which file it is?
case <- read.table(file = myFiles[x], sep = '\t', header = F)
}
Any help is very appreciated. Thank you!
Why not just pass the pairs of files to the loops via a list?
files <- list(
c("fileA","fileB"),
c("fileC","fileD")
)
for( f in files ) {
cat("~~~~~~~~\n")
cat("f[1] is",f[1],"~ f[2] is",f[2],"\n")
}
The first time the loop runs, f
contains the 1st element of the list files
. Since the first element is a character vector of length two, f[1]
contains the first file name of the pair, and f[2]
contains the second. See the printed output of the above code, which should hopefully make it clear.
What probably makes more sense in this case, is building up the two filenames from your "list" (a data.frame?) of cases and controls.
If this "list" is present in a data.frame lcc
, you could do something like:
for(i in seq(nrow(lcc)))
{
currentcase<-lcc$case[i]
currentcontrol<-lcc$control[i]
currentcasefilename<-paste("someprefix_", currentcase, "_somepostfix.txt")
currentcontrolfilename<-paste("someprefix_", currentcontrol, "_somepostfix.txt")
#now open and process both files...
}
Assuming your list of cases and controls is in an R object (dataframe or matrix) called mylist
:
for (x in seq_along(nrow(mylist)) {
case <- read.table(file = paste("/my/path/", mylist[x, "case"], ".txt", sep = ""),
sep = "\t", header = F)
control <- read.table(file = paste("/my/path/", mylist[x, "control"], ".txt", sep = ""),
sep = "\t", header = F)
## your code here ##
}
精彩评论