R: Calculate the relative distance
I have a dataframe like variable x.
x<-"start.x stop.x strand.x start.y stop.y strand.y
1 16954189 16963562 - 16954189 16963562 -
2 16954189 16963562 - 150045170 150065177 -
3 150045170 150065177 - 16954189 16963562 -
4 150045170 150065177 - 150045170 150065177 -
5 97061519 97190927 - 97061519 97190927 -
6 97061519 97190927 - 135190856 135202610 +
7 135190856 135202610 + 97061519 97190927 -
8 135190856 135202610 + 135190856 135202610 +"
dat <- read.table(textConnection(x), header=TRUE)
Normally I calculate for each row the relative distance between start.x and start.y with the following code:
zz <- transform(x,
distance_startsite = abs(as.numeric(start.x) - as.numeric(start.y)))
But before calculating this time, we first need to look to the strand.x and strand.y.
- If the strand.x is "-" the official start site is stop.x
- If the strand.x is "+" the official start site is start.x
- If the strand.y is "-" the official start site is stop.y
- If the stran开发者_运维知识库d.y is "+" the official start site is start.y
Row 1 in table dat must calucate this: abs(as.numeric(stop.x) - as.numeric(stop.y) instead of abs(as.numeric(start.x) - as.numeric(start.y).
My question is, is there a way to calculate this for each row like zz?
Thanks
EDIT: my first thought was something like this:
for (i in 1:nrow(dd)){
if (dat$strand.x[i,] == "-" & dat$stand.y[i,] == "-") {
result[i]<-transform(dat,distance_startsite[i] = abs(as.numeric(stop.x[i,]) - as.numeric(stop.y[i,]))} else
if (dat$strand.x[i,] == "+" & dat$stand.y[i,] == "-") {
result[i]<-transform(dat,distance_startsite[i] = abs(as.numeric(start.x[i,]) - as.numeric(stop.y[i,]))} else
if (dat$strand.x[i,] == "-" & dat$stand.y[i,] == "+") {
result[i]<-transform(dat,distance_startsite[i] = abs(as.numeric(stop.x[i,]) - as.numeric(start.y[i,]))} else
if (dat$strand.x[i,] == "+" & dat$stand.y[i,] == "+") {
result[i]<-transform(dat,distance_startsite[i] = abs(as.numeric(start.x[i,]) - as.numeric(start.y[i,]))}
}
But that doesn't work yet.
If you do this step by step and use some interim variables, you will save yourself a lot of trouble and your code will become much clearer.
Here is what I suggest:
- Add a column with the start and stop values (using your conditions)
- Calculate the absolute difference
Two further observations:
- Your start and stop values are integer values, so you don't need to use
as.numeric
all the time - In your original question you have conflicting conditions for the start site, but no conditions for the stop site, so I took a guess to what you really meant.
The code:
dat$start <- with(dat, ifelse(strand.x=="+", start.x, stop.x))
dat$stop <- with(dat, ifelse(strand.y=="+", start.y, stop.y))
dat$dist <- with(dat, abs(stop-start))
The results:
dat
start.x stop.x strand.x start.y stop.y strand.y dist
1 16954189 16963562 - 16954189 16963562 - 0
2 16954189 16963562 - 150045170 150065177 - 133101615
3 150045170 150065177 - 16954189 16963562 - 133101615
4 150045170 150065177 - 150045170 150065177 - 0
5 97061519 97190927 - 97061519 97190927 - 0
6 97061519 97190927 - 135190856 135202610 + 37999929
7 135190856 135202610 + 97061519 97190927 - 37999929
8 135190856 135202610 + 135190856 135202610 + 0
I tend to agree with@ Andrie, but if you really really want a 'single line solution' (well kind of):
zz <- transform(dat, distance_startsite = abs(ifelse(strand.x=="+", start.x, stop.x)-ifelse(strand.y=="+", start.y, stop.y)))
精彩评论