Performance problems when converting timestamped row data
I've written a function that takes a data.frame which represent intervals of data which occur across a 1 minute timeframe. The purpose of the function is to take these 1 minute intervals and convert them into higher intervals. Example, 1 minute becomes 5 minute, 60 minute etc...The data set itself has the potential to have gaps in the data i.e. jumps in time so it must accommodate for these bad data occurrences. I've written the following code which appears to work but the performance is absolutely terrible on large data sets.
I'm hoping that someone could provide some suggestions on how I might be able to speed this up. See below.
compressMinute = function(interval, DAT) {
#Grab all data which begins at the same interval length
retSet = NULL
intervalFilter = which(DAT$time$min %% interval == 0)
barSet = NULL
for (x in intervalFilter) {
barEndTime = DAT$time[x] + 60*interval
barIntervals = DAT[x,]
x = x+1
while(x <= nrow(DAT) & DAT[x,"time"] < barEndTime) {
barIntervals = rbind(barIntervals,DAT[x,])
x = x + 1
}
bar = data.frame(date=barIntervals[1,"date"],time=barIntervals[1,"time"],open=barIntervals[1,"open"],high=max(barIntervals[1:nrow(barIntervals),"high"]),
low=min(barIntervals[1:nrow(barIntervals),"low"]),close=tail(barIntervals,1)$close,volume=sum(barIntervals[1:nrow(barIntervals),"volume"]))
if (is.null(barSet)) {
barSet = bar
} else {
barSet = rbind(barSet, bar)
}
}
return(barSet)
}
EDIT:
Below is a row of my data. Each row represents a 1 minute interval, I am trying to convert this into arbitrary buckets which are the aggregates of these 1 minute intervals, i.e. 5 minutes, 15 m开发者_运维技巧inutes, 60 minutes, 240 minutes, etc...
date time open high low close volume
2005-09-06 2005-09-06 16:33:00 1297.25 1297.50 1297.25 1297.25 98
You probably want to re-use existing facitlities, specifically the POSIXct
time types, as well as existing packages.
For example, look at the xts package --- it already has a generic function to.period()
as well as convenience wrappers to.minutes()
, to.minutes3()
, to.minutes10()
, ....
Here is an example from the help page:
R> example(to.minutes)
t.mn10R> data(sample_matrix)
t.mn10R> samplexts <- as.xts(sample_matrix)
t.mn10R> to.monthly(samplexts)
samplexts.Open samplexts.High samplexts.Low samplexts.Close
Jan 2007 50.0398 50.7734 49.7631 50.2258
Feb 2007 50.2245 51.3234 50.1910 50.7709
Mar 2007 50.8162 50.8162 48.2365 48.9749
Apr 2007 48.9441 50.3378 48.8096 49.3397
May 2007 49.3457 49.6910 47.5180 47.7378
Jun 2007 47.7443 47.9413 47.0914 47.7672
t.mn10R> to.monthly(sample_matrix)
sample_matrix.Open sample_matrix.High sample_matrix.Low sample_matrix.Close
Jan 2007 50.0398 50.7734 49.7631 50.2258
Feb 2007 50.2245 51.3234 50.1910 50.7709
Mar 2007 50.8162 50.8162 48.2365 48.9749
Apr 2007 48.9441 50.3378 48.8096 49.3397
May 2007 49.3457 49.6910 47.5180 47.7378
Jun 2007 47.7443 47.9413 47.0914 47.7672
t.mn10R> str(to.monthly(samplexts))
An ‘xts’ object from Jan 2007 to Jun 2007 containing:
Data: num [1:6, 1:4] 50 50.2 50.8 48.9 49.3 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:4] "samplexts.Open" "samplexts.High" "samplexts.Low" "samplexts.Close"
Indexed by objects of class: [yearmon] TZ:
xts Attributes:
NULL
t.mn10R> str(to.monthly(sample_matrix))
num [1:6, 1:4] 50 50.2 50.8 48.9 49.3 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:6] "Jan 2007" "Feb 2007" "Mar 2007" "Apr 2007" ...
..$ : chr [1:4] "sample_matrix.Open" "sample_matrix.High" "sample_matrix.Low" "sample_matrix.Close"
R>
精彩评论