Binning different lengths in R
input1
dput(a1 100 200 +
a1 250 270 +
a1 333 340 -
a2 450 460 +)
input2
dput(a1 101 106 +
a1 112 117 +
a1 258 259 +
a1 258 259 +
a1 258 259 +
a1 258 259 +
a1 258 259 +
a1 258 259 +
a1 258 259 +
a1 258 259 +
a1 258 259 +
a1 260 262 +
a1 260 262 +
a1 260 262 +
a1 260 262 +
a1 260 262 +
a1 332 333 -
a1 332 333 -
a1 332 333 -
a1 332 333 -
a1 332 333 -
a1 332 333 -
a1 332 333 -
a1 331 333 -
a1 331 333 -
a1 331 333 -
a1 331 333 -
a1 331 333 -
a1 331 333 -)
output
c s e st 1 2 3 4 5 6 7 8 9 10
a1 100 200 + 1 2 0 0 0 0 0 0 0 0
a1 250 270 + 0 0 0 9 5 0 0 0 0 0
a1 330 340 - 0 0 0 0 0 0 0 6 7 0
a2 450 460 + 0 0 0 0 0 0 0 0 0 0
I want to count density of points (input2) using input1 values. Means that a1-100-200 has how many points in this 100 to 200 range?. i.e. 3. A开发者_C百科nd I want to do the same for all the input values. And I want to compare each other. But the problem is that the length of values (200-100=100 or 270-250=20) are different. In order to compare them against each other I need to scale them in a way that I can compare. So I came up with 10 bins window (output). I count the input2 points using input1 bins. Finally I need to plot bins on x-axis and values on y axis xyplot(x(bins),y1(a1:100:200:+)+y2(a1:250:270:+y3...+y4)
"+" means we need to take 100 as start point and 200 as end point when we calculate bins (100-110 will be 1st bin .....) - means exactly opposite (190-200 will be the first bin )
1-10 means 1 to 10 bins
you need to use column 1 and 2 based on column1 key for bins. We remove th values the are not in range
c = character, s =start, e=end, s=strand, 1-10 are bins of input1. yes you are right abt binning. For example 250-270 should have 2 numbers difference because (270-250=20, therefore for for 10 bins it would be 20/10=2)
The question is still not very well formed so I'm not sure I've quite understood what you want, but you probably want to use a combination of table
and cut
.
Your sample data
input1 <- data.frame(
type = paste("a", rep(1:2, times = c(3, 1)), sep = ""),
lower = c(100, 250, 333, 450),
upper = c(200, 270, 340, 460)
)
input2 <- data.frame(
type = rep.int("a1", 28),
lower = rep(c(101, 112, 258, 260, 332, 331), times = c(1, 1, 9, 5, 7, 5)),
upper = rep(c(106, 117, 259, 262, 333), times = c(1, 1, 9, 5, 12))
)
First you define bins based upon the values in input1
.
cut_points <- with(input1, sort(c(start, end)))
Then split input2$start
by type, cut it up by bins and find the count in each.
with(input2, tapply(start, type, function(x) table(cut(x, cut_points))))
Possibly repeat with the end
column.
with(input2, tapply(end, type, function(x) table(cut(x, cut_points))))
精彩评论