开发者

combining columns from multiple data.frames with a loop

I have 600 tab-delimited .开发者_JAVA百科txt files that look like this:

                       barcode gene.symbol    value
1 TCGA-61-2610-02A-01R-1141-07      15E1.2 -0.78175
2 TCGA-61-2610-02A-01R-1141-07      2'-PDE  -1.0155
3 TCGA-61-2610-02A-01R-1141-07         7A5    0.029
4 TCGA-61-2610-02A-01R-1141-07        A1BG  0.96575
5 TCGA-61-2610-02A-01R-1141-07       A2BP1   -0.301
6 TCGA-61-2610-02A-01R-1141-07         A2M -2.21575

I want to put together all the 600 files in one data frame such that gene.symbol will be the row names and values will be combined with first 12 characters of the barcode being the column name. Searching through SO I think I've got a loop that does this with one caveat. Here's what I have (I'm still learning R so the code might look very crude):

n = 600
df <- read.delim(file=paste("agilent1.txt")
df.tmp <- data.frame()
colnames(df) = c("barcode", "gene.symbol", levels(df$barcode))
df = df[2 :3]

once I have df with the first file's values, the loop starts adding the other files' value columns (the files are named as agilent1.txt, agilent2.txt etc):

for (i in 2:n) {
  df.tmp <- read.delim(file=paste("agilent", i, ".txt", sep="")
  a <- as.character(levels(df.tmp$barcode))
  a <- substr(a, 1, 12)
  df <- cbind(df, a = df.tmp$value)
}

everything work BUT in cbind command, a = df.tmp$value makes the column name a (which makes sense) but I want the value of a to be the column name.

  gene.symbol                 TCGA-61-2614                   a                  a                  a        a
1      15E1.2                      0.80475            -0.47375           -0.26825           -0.13425 -0.78175
2      2'-PDE                   -0.1348125          -0.1565625            0.19475         -0.3819375  -1.0155
3         7A5                       2.2735              2.4405              0.902              1.248    0.029
4        A1BG            0.817166666666667 -0.0471666666666667            -0.1005 -0.283333333333333  0.96575
5       A2BP1           -0.811333333333333   -1.02566666666667 -0.494833333333333             -0.948   -0.301
6         A2M                       -0.719            -1.00575           -1.07275              0.517 -2.21575

It sounds so easy in my mind but I can't seem to find the answer. Any help would be greatly appreciated.

Cheers,

Ahmet


You don't need to use an explicit loop if you use the reshape package. Here is a two liner which will do exactly what you are seeking (if i understand correctly)

require(plyr); require(reshape);
files = paste('agilent', 1:600, '.txt', sep = "") # create list of files
dfs   = ldply(files, read.delim)                  # read files into data frame
cast(dfs, gene ~ barcode)                         # reshape to required format


I suggest you to read the 600 data files and put the toghether:

myfiles <- list.files()
mydat <- c()
for(i in 1:length(myfiles)) {
    temp <- read.table(myfiles[i], header=T)
    mydat <- rbind(mydat, temp)
}

library(reshape2)
newdat <- cast(mydat, gene.symbol ~ barcode, value=value)

If you want the colnames have only 12 characters, you could follow the response of joran


You could always just set the column name in a separate step at the end of the loop:

df <- cbind(df, a = df.tmp$value)
colnames(df)[i+1] <- a
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜