开发者

Is there a good algorithm to check for changes in data over a specified period of time?

We have around 7k financial products whose closing prices should theoretically move up and down within a certain percentage range throughout a defined period of time (say a one week or month period).

I have access to an internal system that stores these historical prices (not a relational database!). I would like to produce a report that lists any products whose price has not moved at all or less than say 10% over the time period.

I can't just compare the first value (day 1) to the value at the end (day n) as the price could potential开发者_高级运维ly have moved back to what it was on the last day which would lead to a false positive while the product's price could have spiked somewhere in between of course.

Are there any established algorithms to do this in reasonable compute time?


There isn't any way to do this without looking at every single day.

Suppose the data looks like such:

oooo0oooo

With that one-day spike in the middle. You're not going to catch that unless you check the day that the spike happens - in other words, you need to check every single day.


If this needs to be checked often (for a large number of interval, like daily for the last year, and for the same set of products), you can store the high and low values of each item per week/month. By combining the right weekly and/or monthly bounds with some raw data at the edges of the interval you can get the minimum and maximum value over the interval.


If you can add data to kdb (i.e. you're not limited to read access) you might consider adding the 'number of days since last price change' as a new set of data (i.e. one number per financial instrument). A daily task would then fetch today's mark and yesterday's, and update the numbers stored. Similarly you could maintain recent (last month, last year) highs and lows in kdb. You'd have to run a job over the larger dataset to prime the values initially, but then your daily updates will involve much less data.

Recommend that if you adopt something like this you have some way to rerun for all or part of the dataset (say for adding a new product).

Lastly - is the history normalised against current prices? (i.e. are revaluations for stock splits or similar taken into account). If not, you'd need to detect these discontinuities and divide them out.

EDIT

I'd investigate usng kdb+/Q to implement the signal processing, rather than extracting the raw data to a Java application. As you say, it's highly performant.


You can do this if you can keep track of the min and max value of the price during the time interval - this assumes that the time interval is not being constantly changed. One way of keeping track of the min and max values of a changing set of items is with two heaps placed 'back to back' - you could store this and some pointers necessary to find and remove old items in one or two arrays in your store. The idea of putting two heaps back to back is in Knuth's Art of Computer Programming Vol 3 as Exercise 31 section 5.2.3. Knuth calls this sort of beast a Priority Dequeue, and this seems to be searchable. Min and max are available at constant cost. Cost of modifying it when a new price arrives is log n, where n is the number of items stored.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜