combining columns from multiple data.frames with a loop

2023-04-05 15:00 问答作者：

I have 600 tab-delimited .开发者_JAVA百科txt files that look like this:

                       barcode gene.symbol    value
1 TCGA-61-2610-02A-01R-1141-07      15E1.2 -0.78175
2 TCGA-61-2610-02A-01R-1141-07      2'-PDE  -1.0155
3 TCGA-61-2610-02A-01R-1141-07         7A5    0.029
4 TCGA-61-2610-02A-01R-1141-07        A1BG  0.96575
5 TCGA-61-2610-02A-01R-1141-07       A2BP1   -0.301
6 TCGA-61-2610-02A-01R-1141-07         A2M -2.21575

I want to put together all the 600 files in one data frame such that gene.symbol will be the row names and values will be combined with first 12 characters of the barcode being the column name. Searching through SO I think I've got a loop that does this with one caveat. Here's what I have (I'm still learning R so the code might look very crude):

n = 600
df <- read.delim(file=paste("agilent1.txt")
df.tmp <- data.frame()
colnames(df) = c("barcode", "gene.symbol", levels(df$barcode))
df = df[2 :3]

once I have df with the first file's values, the loop starts adding the other files' value columns (the files are named as agilent1.txt, agilent2.txt etc):

for (i in 2:n) {
  df.tmp <- read.delim(file=paste("agilent", i, ".txt", sep="")
  a <- as.character(levels(df.tmp$barcode))
  a <- substr(a, 1, 12)
  df <- cbind(df, a = df.tmp$value)
}

everything work BUT in cbind command, a = df.tmp$value makes the column name a (which makes sense) but I want the value of a to be the column name.

  gene.symbol                 TCGA-61-2614                   a                  a                  a        a
1      15E1.2                      0.80475            -0.47375           -0.26825           -0.13425 -0.78175
2      2'-PDE                   -0.1348125          -0.1565625            0.19475         -0.3819375  -1.0155
3         7A5                       2.2735              2.4405              0.902              1.248    0.029
4        A1BG            0.817166666666667 -0.0471666666666667            -0.1005 -0.283333333333333  0.96575
5       A2BP1           -0.811333333333333   -1.02566666666667 -0.494833333333333             -0.948   -0.301
6         A2M                       -0.719            -1.00575           -1.07275              0.517 -2.21575

It sounds so easy in my mind but I can't seem to find the answer. Any help would be greatly appreciated.

Cheers,

Ahmet

You don't need to use an explicit loop if you use the reshape package. Here is a two liner which will do exactly what you are seeking (if i understand correctly)

require(plyr); require(reshape);
files = paste('agilent', 1:600, '.txt', sep = "") # create list of files
dfs   = ldply(files, read.delim)                  # read files into data frame
cast(dfs, gene ~ barcode)                         # reshape to required format

I suggest you to read the 600 data files and put the toghether:

myfiles <- list.files()
mydat <- c()
for(i in 1:length(myfiles)) {
    temp <- read.table(myfiles[i], header=T)
    mydat <- rbind(mydat, temp)
}

library(reshape2)
newdat <- cast(mydat, gene.symbol ~ barcode, value=value)

If you want the colnames have only 12 characters, you could follow the response of joran

You could always just set the column name in a separate step at the end of the loop:

df <- cbind(df, a = df.tmp$value)
colnames(df)[i+1] <- a

combining columns from multiple data.frames with a loop

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？