Charting Millions of Rows
Got a quick question on Charting.
Need: I need to implement charting for my client and the client dataset contains millions of rows. Data is collected about the target every 10 seconds or so, and it builds up quite an amount of data. I Need to chart this data.
I looked up Google Finance to see how they have done it, to chart MSFT http://www.google.com/finance?q=msft
Looks like, at any given time, they are NOT plotting ALL the points. Depending on the ti开发者_运维知识库me-range you select, the data selected and plotted varies.
I would like to get some inputs on how to massage the millions of rows of data, and make it ready to do a graph like that of Google's, and pointers on how to implement the charting with the massaged data.
thanks Sean
For stock charts the standard way to do it is to select (or calculate) a number of options:
- Date Range
- Interval
- Chart Type
You can ignore chart type for now. But the other two are important.
For instance, if you have 1 million data points over a week period. Give the user the option to chart over that week range by 15 minute, 1 hour or 1 day. Then you just pick the data points that represent the start and end of each interval.
For instance, if they picked 1 day, you pick the opening and closing price of each day.
On Google Finance the magic goes on the data table from the DB, they take more resolution data on nearest dates and less resolution data on old date and they use timeline graphics (I know that there are some good open source stuff).
For instance: You get from DB a minute resolution from today, a hour resolution on last week, a day resolution on last 6 months and so on.
I hope either u know what you want to provide to client OR they know what they want (WHAT could be formalized in Requirement Spec.)
Determining what you want to give can immensely change how you want to do it.
Let take a hypothetical (but commonly used) scenarios
Lets say u want to show point on XY (x-time y-price)
1. Lets say user chooses granularity.
say 1 second.
give option to User to see hourly / daily (if daily then for last 3/5 days at most)
2. Lets say user want to see data for 1 day
Now you know that you need to generated a query that will return 10hrs*60min*60sec ticks
If user wants to see a Days data as a tick, then you give him option of seeing week/month/years... Again now you just need to return (1yr*365day) of tick point if user is seeing a year
If user is changing the resolution/granularity change the window.
One more scenario could be 10milisec tick on one day of data.
IT is pointless to show 10millisec tick on a Week or more graph.
精彩评论