Reducing the granularity of a data set
I have an in-memory cache which stores a set of information by a certain level of aggregation - in the Students example below let's say I store it by Year, Subject, Teacher:
# Students Year Subject Teacher
1 30 7 Math Mrs Smith
2 28 7 Math Mr Cork
3 20 8 Math Mrs Smith
4 20 8 English Mr White
5 18 8 English Mr Book
6 10 12 Math Mrs Jones
Now unfortunately my cache doesn't have GROUP BY or similar functions - so when I want to look at things at a higher level of aggregation, I will have to 'roll up' the data myself. For example, if I aggregate Students by Year, Subject the aforementioned data would look like so:
# Students Year Subject
1 58 7 Math
2 20 8 Math
3 38 8 English
4 10 12 Math
My question is thus - how would I best do this in Java? Theoretically I could be pulling back tens of thousands of objects from this cache, so being开发者_运维技巧 able to 'roll up' these collections quickly may become very important.
My initial (perhaps naive) thought would be to do something along the following lines;
Until I exhaust the list of records:
- Each 'unique' record that I come across is added as a key to a hashmap.
- If I encounter a record that has the same data for this new level of aggregation, add its quantity to the existing one.
Now for all I know this is a fairly common problem and there's much better ways of doing this. So I'd welcome any feedback as to whether I'm pointing myself in the right direction.
"Get a new cache" not an option I'm afraid :)
-Dave.
Your "initial thought" isn't a bad approach. The only way to improve on it would be to have an index for the fields on which you are aggregating (year and subject). (That's basically what a dbms does when you define an index.) Then your algorithm could be recast as iterating through all index values; you wouldn't have to check the results hash for each record.
Of course, you would have to build the index when populating the cache and maintain it as data is modified.
精彩评论