How to repeat performing a function in R multiple times
I have a dataframe that looks like this
DF:
V1 V2 V3 V4 V5 V6 V7 V8
0 ss66369915 0 0 G A A A
0 ss66112992 0 0 A A A A
0 ss66369329 0 0 A A A A
0 ss66368644 0 0 A A A A
0 ss66368284 0 0 A A G A
0 ss66126380 0 0 A G A G
0 ss66407282 0 0 A A A A
0 ss66405035 0 0 A A A A
0 ss66405148 0 0 G G A G
0 ss66405271 0 0 G G G G
The data in columns V6 through V9 are biallelic genotypes, so I would like to merge every two columns together into one.
For example, it would look like:
V1 V2 V3 V4 V5_V6 V7 V8
0 ss66369915 0 0 GA A A
0 ss66112992 0 0 AA A A
0 ss66369329 0 0 AA A A
0 ss66368644 0 0 AA A A
0 ss66368284 0 0 AA G A
0 ss66126380 0 0 AG A G
0 ss66407282 0 0 AA A A
0 ss66405035 0 0 AA A A
0 ss66405148 0 0 GG A G
0 ss66405271 0 0 GG G G
I was able to do this using:
DF$V5_V6=paste(DF$V5, DF$V6, sep="")
or
within(DF, V5_V6 <- paste(V5, V6, sep=''))
However my actual dataframe consists of 4776 rows and I would have to merge every two columns starting from column 5 to column 4776.
I was wondering how I could ac开发者_开发百科hieve this without doing it manually. I tried to use a for loop with no success. I am very new to using R.
Thank you!
Maybe you can show the for loop you tried?
Here's one approach using a loop that should do what you want, if I understand what you want. Specifically - this for loop will paste the values of columns 5 & 6, 7 & 8, 9 & 10, etc together. We use the names()
function to extract the relevant column names and paste them together. We use [
to index into the object newdat
that is created.
#read in data
txt <- "V1 V2 V3 V4 V5 V6 V7 V8
0 ss66369915 0 0 G A A A
0 ss66112992 0 0 A A A A
0 ss66369329 0 0 A A A A
0 ss66368644 0 0 A A A A
0 ss66368284 0 0 A A G A
0 ss66126380 0 0 A G A G
0 ss66407282 0 0 A A A A
0 ss66405035 0 0 A A A A
0 ss66405148 0 0 G G A G
0 ss66405271 0 0 G G G G"
dat <- read.table(textConnection(txt), header = TRUE)
#Create a new object so as to not interfere with the original
newdat <- dat[, 1:4]
for (colInd in seq(5, (ncol(dat) - 1), by = 2)) {
colNames <- paste(names(dat)[colInd], names(dat)[colInd + 1], sep = "_")
newdat[, colNames] <- paste(dat[, colInd], dat[, colInd + 1], sep = "")
}
Results in:
> newdat
V1 V2 V3 V4 V5_V6 V7_V8
1 0 ss66369915 0 0 GA AA
2 0 ss66112992 0 0 AA AA
3 0 ss66369329 0 0 AA AA
4 0 ss66368644 0 0 AA AA
5 0 ss66368284 0 0 AA GA
6 0 ss66126380 0 0 AG AG
7 0 ss66407282 0 0 AA AA
8 0 ss66405035 0 0 AA AA
9 0 ss66405148 0 0 GG AG
10 0 ss66405271 0 0 GG GG
You could also do:
library(stringr)
newdat$V5V6 <- apply(dat[,5:6], 1, str_c, collapse="")
newdat$V7V8 <- apply(dat[,7:8], 1, str_c, collapse="")
精彩评论