Reducing the granularity of a data set

2023-03-11 11:05 问答作者：

I have an in-memory cache which stores a set of information by a certain level of aggregation - in the Students example below let's say I store it by Year, Subject, Teacher:

#    Students    Year    Subject    Teacher
1    30          7       Math       Mrs Smith
2    28          7       Math       Mr Cork
3    20          8       Math       Mrs Smith
4    20          8       English    Mr White
5    18          8       English    Mr Book
6    10          12      Math       Mrs Jones

Now unfortunately my cache doesn't have GROUP BY or similar functions - so when I want to look at things at a higher level of aggregation, I will have to 'roll up' the data myself. For example, if I aggregate Students by Year, Subject the aforementioned data would look like so:

#    Students    Year    Subject
1    58          7       Math
2    20          8       Math 
3    38          8       English
4    10          12      Math

My question is thus - how would I best do this in Java? Theoretically I could be pulling back tens of thousands of objects from this cache, so being开发者_运维技巧 able to 'roll up' these collections quickly may become very important.

My initial (perhaps naive) thought would be to do something along the following lines;

Until I exhaust the list of records:

Each 'unique' record that I come across is added as a key to a hashmap.
If I encounter a record that has the same data for this new level of aggregation, add its quantity to the existing one.

Now for all I know this is a fairly common problem and there's much better ways of doing this. So I'd welcome any feedback as to whether I'm pointing myself in the right direction.

"Get a new cache" not an option I'm afraid :)

-Dave.

Your "initial thought" isn't a bad approach. The only way to improve on it would be to have an index for the fields on which you are aggregating (year and subject). (That's basically what a dbms does when you define an index.) Then your algorithm could be recast as iterating through all index values; you wouldn't have to check the results hash for each record.

Of course, you would have to build the index when populating the cache and maintain it as data is modified.

继续阅读：aggregate algorithm

Reducing the granularity of a data set

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？