Binning different lengths in R

2023-03-25 00:19 问答作者：

input1

dput(a1  100 200 +
a1  250 270 +
a1  333 340 -
a2  450 460 +)

input2

dput(a1  101 106 +
a1  112 117 +
a1  258 259 +
a1  258 259 +
a1  258 259 +
a1  258 259 +
a1  258 259 +
a1  258 259 +
a1  258 259 +
a1  258 259 +
a1  258 259 +
a1  260 262 +
a1  260 262 + 
a1  260 262 + 
a1  260 262 + 
a1  260 262 + 
a1  332 333 -
a1  332 333 -
a1  332 333 -
a1  332 333 -
a1  332 333 -
a1  332 333 -
a1  332 333 -
a1  331 333 -
a1  331 333 -
a1  331 333 -
a1  331 333 -
a1  331 333 -
a1  331 333 -)

output

c   s   e   st  1   2   3   4   5   6   7   8   9   10
a1  100 200 +   1   2   0   0   0   0   0   0   0   0
a1  250 270 +   0   0   0   9   5   0   0   0   0   0
a1  330 340 -   0   0   0   0   0   0   0   6   7   0
a2  450 460 +   0   0   0   0   0   0   0   0   0   0

I want to count density of points (input2) using input1 values. Means that a1-100-200 has how many points in this 100 to 200 range?. i.e. 3. A开发者_C百科nd I want to do the same for all the input values. And I want to compare each other. But the problem is that the length of values (200-100=100 or 270-250=20) are different. In order to compare them against each other I need to scale them in a way that I can compare. So I came up with 10 bins window (output). I count the input2 points using input1 bins. Finally I need to plot bins on x-axis and values on y axis xyplot(x(bins),y1(a1:100:200:+)+y2(a1:250:270:+y3...+y4)

"+" means we need to take 100 as start point and 200 as end point when we calculate bins (100-110 will be 1st bin .....) - means exactly opposite (190-200 will be the first bin )

1-10 means 1 to 10 bins

you need to use column 1 and 2 based on column1 key for bins. We remove th values the are not in range

c = character, s =start, e=end, s=strand, 1-10 are bins of input1. yes you are right abt binning. For example 250-270 should have 2 numbers difference because (270-250=20, therefore for for 10 bins it would be 20/10=2)

The question is still not very well formed so I'm not sure I've quite understood what you want, but you probably want to use a combination of table and cut.

Your sample data

input1 <- data.frame(
  type  = paste("a", rep(1:2, times = c(3, 1)), sep = ""),
  lower = c(100, 250, 333, 450),
  upper = c(200, 270, 340, 460)
)

input2 <- data.frame(
  type = rep.int("a1", 28),
  lower = rep(c(101, 112, 258, 260, 332, 331), times = c(1, 1, 9, 5, 7, 5)),
  upper = rep(c(106, 117, 259, 262, 333), times = c(1, 1, 9, 5, 12))
)

First you define bins based upon the values in input1.

cut_points <- with(input1, sort(c(start, end)))

Then split input2$start by type, cut it up by bins and find the count in each.

with(input2, tapply(start, type, function(x) table(cut(x, cut_points))))

Possibly repeat with the end column.

with(input2, tapply(end, type, function(x) table(cut(x, cut_points))))

继续阅读：binning

Binning different lengths in R

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？