Performance problems when converting timestamped row data

2023-04-06 16:01 问答作者：

I've written a function that takes a data.frame which represent intervals of data which occur across a 1 minute timeframe. The purpose of the function is to take these 1 minute intervals and convert them into higher intervals. Example, 1 minute becomes 5 minute, 60 minute etc...The data set itself has the potential to have gaps in the data i.e. jumps in time so it must accommodate for these bad data occurrences. I've written the following code which appears to work but the performance is absolutely terrible on large data sets.

I'm hoping that someone could provide some suggestions on how I might be able to speed this up. See below.

compressMinute = function(interval, DAT) {
    #Grab all data which begins at the same interval length
    retSet = NULL
    intervalFilter = which(DAT$time$min %% interval == 0)
    barSet = NULL
    for (x in intervalFilter) {
        barEndTime = DAT$time[x] + 60*interval
        barIntervals = DAT[x,]
        x = x+1
        while(x <= nrow(DAT) & DAT[x,"time"] < barEndTime) {
            barIntervals = rbind(barIntervals,DAT[x,])
            x = x + 1
        }
        bar = data.frame(date=barIntervals[1,"date"],time=barIntervals[1,"time"],open=barIntervals[1,"open"],high=max(barIntervals[1:nrow(barIntervals),"high"]),
                        low=min(barIntervals[1:nrow(barIntervals),"low"]),close=tail(barIntervals,1)$close,volume=sum(barIntervals[1:nrow(barIntervals),"volume"]))
        if (is.null(barSet)) {
            barSet = bar
        } else {
            barSet = rbind(barSet, bar)
        }

    }
    return(barSet)
}

EDIT:

Below is a row of my data. Each row represents a 1 minute interval, I am trying to convert this into arbitrary buckets which are the aggregates of these 1 minute intervals, i.e. 5 minutes, 15 m开发者_运维技巧inutes, 60 minutes, 240 minutes, etc...

date                time    open    high     low   close volume
2005-09-06 2005-09-06 16:33:00 1297.25 1297.50 1297.25 1297.25     98

You probably want to re-use existing facitlities, specifically the POSIXct time types, as well as existing packages.

For example, look at the xts package --- it already has a generic function to.period() as well as convenience wrappers to.minutes(), to.minutes3(), to.minutes10(), ....

Here is an example from the help page:

R> example(to.minutes)

t.mn10R> data(sample_matrix)

t.mn10R> samplexts <- as.xts(sample_matrix)

t.mn10R> to.monthly(samplexts)
         samplexts.Open samplexts.High samplexts.Low samplexts.Close
Jan 2007        50.0398        50.7734       49.7631         50.2258
Feb 2007        50.2245        51.3234       50.1910         50.7709
Mar 2007        50.8162        50.8162       48.2365         48.9749
Apr 2007        48.9441        50.3378       48.8096         49.3397
May 2007        49.3457        49.6910       47.5180         47.7378
Jun 2007        47.7443        47.9413       47.0914         47.7672

t.mn10R> to.monthly(sample_matrix)
         sample_matrix.Open sample_matrix.High sample_matrix.Low sample_matrix.Close
Jan 2007            50.0398            50.7734           49.7631             50.2258
Feb 2007            50.2245            51.3234           50.1910             50.7709
Mar 2007            50.8162            50.8162           48.2365             48.9749
Apr 2007            48.9441            50.3378           48.8096             49.3397
May 2007            49.3457            49.6910           47.5180             47.7378
Jun 2007            47.7443            47.9413           47.0914             47.7672

t.mn10R> str(to.monthly(samplexts))
An ‘xts’ object from Jan 2007 to Jun 2007 containing:
  Data: num [1:6, 1:4] 50 50.2 50.8 48.9 49.3 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:4] "samplexts.Open" "samplexts.High" "samplexts.Low" "samplexts.Close"
  Indexed by objects of class: [yearmon] TZ: 
  xts Attributes:  
 NULL

t.mn10R> str(to.monthly(sample_matrix))
 num [1:6, 1:4] 50 50.2 50.8 48.9 49.3 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:6] "Jan 2007" "Feb 2007" "Mar 2007" "Apr 2007" ...
  ..$ : chr [1:4] "sample_matrix.Open" "sample_matrix.High" "sample_matrix.Low" "sample_matrix.Close"
R>

继续阅读：xts

Performance problems when converting timestamped row data

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？