Architecture for exploration and analysis of large data
We are planning to build a data exploration system for a large set of events (in the order of millions). Events consists of time, lat/long coordinates and some other properties with domain-constrained values like type and userId.
The goal is to provide a visualization of the data on three panels:
- Map (events clustered in markers or in a heat map)
- Time histogram (distribution of events by date)
- Attributes histogram (histogram of attributes: type, users,... )
Users will interactively drill down data by filtering on attributes (facets), time interval or spatial range.
We are thinking of an OLAP server, but don't know if this is the most appropiate solution.
Which architecture/system could handle this operation on such a large data set? Any experiences or suggestions on this? Preferably with open 开发者_运维技巧source componenents.
Thanks
Formally, MathGL can handle (change, make histogram, plot, and so on) such data set easily. Usually I plot larger data sets (up to several Gb or about >1e8 numbers). MathGL is free (GPL, and partially LGPL) plotting library.
精彩评论