开发者

How to repeat performing a function in R multiple times

I have a dataframe that looks like this

DF:

V1  V2          V3  V4  V5  V6  V7  V8      
0   ss66369915  0   0   G   A   A   A
0   ss66112992  0   0   A   A   A   A
0   ss66369329  0   0   A   A   A   A
0   ss66368644  0   0   A   A   A   A
0   ss66368284  0   0   A   A   G   A
0   ss66126380  0   0   A   G   A   G
0   ss66407282  0   0   A   A   A   A
0   ss66405035  0   0   A   A   A   A
0   ss66405148  0   0   G   G   A   G
0   ss66405271  0   0   G   G   G   G

The data in columns V6 through V9 are biallelic genotypes, so I would like to merge every two columns together into one.

For example, it would look like:

V1  V2          V3  V4 V5_V6 V7 V8     
0   ss66369915  0   0   GA  A   A
0   ss66112992  0   0   AA  A   A
0   ss66369329  0   0   AA  A   A
0   ss66368644  0   0   AA  A   A
0   ss66368284  0   0   AA  G   A
0   ss66126380  0   0   AG  A   G
0   ss66407282  0   0   AA  A   A
0   ss66405035  0   0   AA  A   A
0   ss66405148  0   0   GG  A   G
0   ss66405271  0   0   GG  G   G

I was able to do this using:

DF$V5_V6=paste(DF$V5, DF$V6, sep="")

or

within(DF, V5_V6 <- paste(V5, V6, sep=''))

However my actual dataframe consists of 4776 rows and I would have to merge every two columns starting from column 5 to column 4776.

I was wondering how I could ac开发者_开发百科hieve this without doing it manually. I tried to use a for loop with no success. I am very new to using R.

Thank you!


Maybe you can show the for loop you tried?

Here's one approach using a loop that should do what you want, if I understand what you want. Specifically - this for loop will paste the values of columns 5 & 6, 7 & 8, 9 & 10, etc together. We use the names() function to extract the relevant column names and paste them together. We use [ to index into the object newdat that is created.

#read in data
txt <- "V1  V2          V3  V4  V5  V6  V7  V8      
0   ss66369915  0   0   G   A   A   A
0   ss66112992  0   0   A   A   A   A
0   ss66369329  0   0   A   A   A   A
0   ss66368644  0   0   A   A   A   A
0   ss66368284  0   0   A   A   G   A
0   ss66126380  0   0   A   G   A   G
0   ss66407282  0   0   A   A   A   A
0   ss66405035  0   0   A   A   A   A
0   ss66405148  0   0   G   G   A   G
0   ss66405271  0   0   G   G   G   G"

dat <- read.table(textConnection(txt), header = TRUE)

#Create a new object so as to not interfere with the original
newdat <- dat[, 1:4]

for (colInd in seq(5, (ncol(dat) - 1), by = 2)) {
  colNames <- paste(names(dat)[colInd], names(dat)[colInd + 1], sep = "_")
  newdat[, colNames] <- paste(dat[, colInd], dat[, colInd + 1], sep = "")
}

Results in:

> newdat
   V1         V2 V3 V4 V5_V6 V7_V8
1   0 ss66369915  0  0    GA    AA
2   0 ss66112992  0  0    AA    AA
3   0 ss66369329  0  0    AA    AA
4   0 ss66368644  0  0    AA    AA
5   0 ss66368284  0  0    AA    GA
6   0 ss66126380  0  0    AG    AG
7   0 ss66407282  0  0    AA    AA
8   0 ss66405035  0  0    AA    AA
9   0 ss66405148  0  0    GG    AG
10  0 ss66405271  0  0    GG    GG


You could also do:

library(stringr)
newdat$V5V6 <-  apply(dat[,5:6], 1,  str_c, collapse="")
newdat$V7V8 <-  apply(dat[,7:8], 1,  str_c, collapse="")
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜