combining columns from multiple data.frames with a loop
I have 600 tab-delimited .开发者_JAVA百科txt files that look like this:
barcode gene.symbol value
1 TCGA-61-2610-02A-01R-1141-07 15E1.2 -0.78175
2 TCGA-61-2610-02A-01R-1141-07 2'-PDE -1.0155
3 TCGA-61-2610-02A-01R-1141-07 7A5 0.029
4 TCGA-61-2610-02A-01R-1141-07 A1BG 0.96575
5 TCGA-61-2610-02A-01R-1141-07 A2BP1 -0.301
6 TCGA-61-2610-02A-01R-1141-07 A2M -2.21575
I want to put together all the 600 files in one data frame such that gene.symbol will be the row names and values will be combined with first 12 characters of the barcode being the column name. Searching through SO I think I've got a loop that does this with one caveat. Here's what I have (I'm still learning R so the code might look very crude):
n = 600
df <- read.delim(file=paste("agilent1.txt")
df.tmp <- data.frame()
colnames(df) = c("barcode", "gene.symbol", levels(df$barcode))
df = df[2 :3]
once I have df with the first file's values, the loop starts adding the other files' value columns (the files are named as agilent1.txt, agilent2.txt etc):
for (i in 2:n) {
df.tmp <- read.delim(file=paste("agilent", i, ".txt", sep="")
a <- as.character(levels(df.tmp$barcode))
a <- substr(a, 1, 12)
df <- cbind(df, a = df.tmp$value)
}
everything work BUT in cbind command, a = df.tmp$value makes the column name a (which makes sense) but I want the value of a to be the column name.
gene.symbol TCGA-61-2614 a a a a
1 15E1.2 0.80475 -0.47375 -0.26825 -0.13425 -0.78175
2 2'-PDE -0.1348125 -0.1565625 0.19475 -0.3819375 -1.0155
3 7A5 2.2735 2.4405 0.902 1.248 0.029
4 A1BG 0.817166666666667 -0.0471666666666667 -0.1005 -0.283333333333333 0.96575
5 A2BP1 -0.811333333333333 -1.02566666666667 -0.494833333333333 -0.948 -0.301
6 A2M -0.719 -1.00575 -1.07275 0.517 -2.21575
It sounds so easy in my mind but I can't seem to find the answer. Any help would be greatly appreciated.
Cheers,
Ahmet
You don't need to use an explicit loop if you use the reshape
package. Here is a two liner which will do exactly what you are seeking (if i understand correctly)
require(plyr); require(reshape);
files = paste('agilent', 1:600, '.txt', sep = "") # create list of files
dfs = ldply(files, read.delim) # read files into data frame
cast(dfs, gene ~ barcode) # reshape to required format
I suggest you to read the 600 data files and put the toghether:
myfiles <- list.files()
mydat <- c()
for(i in 1:length(myfiles)) {
temp <- read.table(myfiles[i], header=T)
mydat <- rbind(mydat, temp)
}
library(reshape2)
newdat <- cast(mydat, gene.symbol ~ barcode, value=value)
If you want the colnames have only 12 characters, you could follow the response of joran
You could always just set the column name in a separate step at the end of the loop:
df <- cbind(df, a = df.tmp$value)
colnames(df)[i+1] <- a
精彩评论