R - Probability of date differences

2023-04-04 20:57 问答作者：

Given df below, I want to get the time between requests, and then get a textual output of a histogram of probabilities that a request will come between 1 second apart, 2 seconds apart, 3 seconds apart, etc.. until 10 seconds. I want to use all of the data when calculating the probabilities, but I only want to see the first 10 seconds of data.

I've tried to get help with this on the ML, but could not. I've received great help on here, so I hope I'm not abusing the help. This should be my last question. Thanks a lot.

df <- read.csv(textConnection('
"SOURCE","REQUEST_DATE"
"A","09/11/2011 09:28:48"
"A","09/11/2011 09:28:47"
"A","09/11/2011 09:15:42"
"A","09/11/2011 09:15:41"
"D","09/13/2011 09:06:53"
"D","09/13/2011 09:06:52"
"D","09/13/2011 08:56:55"
"D","09/13/2011 08:56:52"
"D","09/13/2011 0开发者_运维技巧8:55:43"
"D","09/13/2011 08:39:07"
'), stringsAsFactors=FALSE)

And here's how I'm getting the diff, with the excellent help of Andrie:

df_diff <- ddply(df, .(SOURCE), summarize, TIME_DIFF=-unclass(diff(REQUEST_DATE)))

So, I want something like the following (with made up results)

A 1 55%
A 2 15%
A 3 10%
...
A 10 5%
D 1 10%
D 2 12%
D 3 15%
...
D 10 1%

D 5013 2%, for example, would get cut off, because I only want the top 10 for each source.

The "histogram as text" part is confusing me, but I am guessing you actually want to tabulate within one second breaks:

 df_diff$tdiff_grp <- cut(df_diff$TIME_DIFF, 0:10, right=FALSE)
 with(df_diff, tapply(tdiff_grp, SOURCE, table))
$A
 [0,1)  [1,2)  [2,3)  [3,4)  [4,5)  [5,6)  [6,7)  [7,8)  [8,9) [9,10) 
     0      2      0      0      0      0      0      0      0      0 

$D
 [0,1)  [1,2)  [2,3)  [3,4)  [4,5)  [5,6)  [6,7)  [7,8)  [8,9) [9,10) 
     0      1      0      1      0      0      0      0      0      0

After you clarify what is actually desired, it would be a simple matter to use either prop.table or divide these by their sums (and then multiply by 100) to produce percentages.

EDIT: A simple function can return percentages:

> tbls <- with(df_diff, tapply(tdiff_grp, SOURCE,table))
> lapply(tbls, function(x) 100*x/sum(x) )
$A
 [0,1)  [1,2)  [2,3)  [3,4)  [4,5)  [5,6)  [6,7)  [7,8)  [8,9) [9,10) 
     0    100      0      0      0      0      0      0      0      0   

$D    
 [0,1)  [1,2)  [2,3)  [3,4)  [4,5)  [5,6)  [6,7)  [7,8)  [8,9) [9,10) 
     0     50      0     50      0      0      0      0      0      0

R - Probability of date differences

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？