开发者

data structure for counting frequencies in a database table-like format

i was wondering if there is a data structure optimized to count frequencies against data that is stored in a database table-like format. for example, the data comes in a (comma) delimited format below.

col1, col2, col3
x, a, green
x, b, blue
...
y, c, green

now i simply want to count the frequency of col1=x or col1=x and col2=green. i have been storing the data in a database table, but in my profiling and from empirical observation, database connection is the bott开发者_运维知识库le-neck. i have tried using in-memory database solutions too, and that works quite well; the only problem is memory requirements and quirky init/destroy calls.

also, i work mainly with java, but have experience with .net, and was wondering if there was any api to work with "tabular" data in a linq way using java.

any help is appreciated.


How about a nested TreeMap? For example, say you have the following records:

col1=v, col2=v2
col1=v, col2=v3

You want to be able to query the structure and ask, "how many times did col1 have the value v?"

I'd use the following code to insert values into the structure:

TreeMap tm = new TreeMap();
//the map hasn't seen this column name yet
if(!tm.containsKey(columnName)){
    //mark the column value as being seen once
    tm.put(columnName, (new TreeMap()).put(colVal, 1));
}else{
    //the map has seen the column name.
    TreeMap valueMap = tm.get(columnName);
    if(valueMap.containsKey(colVal)){
        //we've seen this column value before.
        //Increment the number of times we've seen it
        int valCount = valueMap.get(colVal);
        valueMp.put(colVal, valCount++);
    }else{
        //we've have not seen this column value before.
        valueMap.put(colVal, 1);
    }
}


There is a Multiset data structure that keeps track of the frequencies for you. Here is the sample code using that data structure (from google-guava).

void frequencyCounter()
{
    Multiset<String> counter = HashMultiset.create();

    counter.add("col1" + "=" + "x");
    counter.add("col2" + "=" + "x");
    counter.add("col2" + "=" + "x");

    System.out.println("how many times did col2 have the value x?");
    System.out.println(counter.count("col2" + "=" + "x"));
}

Points to be noted.

  • i am concatenating the column name (col1) and its value (x) with (=) as the delimiter while adding to the Multiset
  • I am repeating the same process to check for the frequency a
    particular value in a given column
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜