Subsampling SQL-stored data for plots
Suppose you have a program that logs (timestamp, stock_price) to an SQL database every 30 seconds, and you want to generate plots of the stock price over various timescales. If you plot measurements over a 1-hour range, it's OK to use all 120 samples taken during that time. However, if you want to plot price over a 1-year range, you obviously don't want to pull over 1 million samples out of the database. It would be better to pull some representative subset of the samples out of the database.
This reminds me of the Level of Detail technique in computer graphics -- as you move farther from a 3d model, a lower-fidelity version of the model can be used.
Are there common techniques for representing Level of Detail information in a database, or for quickly querying an evenly spaced subset of data (e.g. give me 100 evenly spaced samples from January 2009)?
The solution I've come up with so far is to include a level_of_detail column in the database table. If level_of_detail=0, the row holds a single instantaneous sample. If level_of_detail=n, the row contains an average of the last开发者_JS百科 (sample_interval*(2^n)) seconds of data, and there are 1/(2^n) as many rows at this level. The table has an index on (level_of_detail, timestamp), and when you want to generate a plot, you calculate an appropriate level_of_detail value based on the number of samples you want and query with that constraint. Disadvantages are:
- For N samples, the table needs to store 2*N rows
- The client must know to specify an appropriate level_of_detail constraint
- Some process needs to be responsible for building the averaged rows as samples are added to the table
For SQL Server, you could use ntile
. This orders the dataset, and then splits it in N different groups, returning 1 for the first group and N for the last group.
select MIN(MeasureTime) as PeriodStart
, MAX(MeasureTime) as PeriodEnd
, AVG(StockPrice) as AvgStockPrice
from (
select MeasureTime
, StockPrice
, NTILE(100) over (order by MeasureTime) as the_tile
from @t YourTable
) tiled
group by
the_tile
This would return exactly 100 rows. Here's a copy of the test data if you're interested in trying the query:
declare @t table (MeasureTime datetime, StockPrice int)
declare @dt date
set @dt = '2010-01-01'
while @dt < '2011-01-01'
begin
insert @t values (@dt, DATEDIFF(day,'2010-01-01',@dt))
select @dt = DATEADD(day,1,@dt)
end
精彩评论