开发者

Constructing frequency table of Objects in ArrayList<Object[]>

I try to implement Incognito k-anonymization algorithm in Java. A part of this algorithm is the frequency set construction for a given table. The columns of the table vary each time so I decided to represent the table as an ArrayList of Object[] where Object[] size is the number of the columns. In this Object I store each row's values for each columns.

I try to construct the frequency table using the following method:

ArrayList<Object[]> table = new ArrayList<Object[]>();
....// table filling//.....
ArrayList<Object[]> frequencySet = new ArrayList<Object[]>();
for(int i=0;i<table.size();i++)
     {
         Integer count = 1;
         int j = 0;
         for(j=i+1;j<table.size();j++)
         {
             if(Arrays.equals(table.get(i), table.get(j)))
             {
                 //System.out.println(i+" equals to "+j);
                 count++;
                 table.remove(j);
                 j = j-1;
             }
         }
         int size = arguments.size()+1;
         Object[] anObject = new Object[size];
   开发者_JAVA百科      System.arraycopy(table.get(i), 0, anObject, 0, arguments.size());
         anObject[size-1] = count;
         frequencySet.add(anObject);
     }

The problem is that the algorithm is very slow, and I figured out that the most of the time is consumed in this method. (For 100.000 data it needs 13min to run- I don't know if this is normal). Is there a faster way to construct frequency table?


Never use remove on an ArrayList, it's O(size()). Also, your count variable gets wrapped and unwrapped each time you increment it. Make its type int and wrap it into Integer only at the very end.

Without knowing anything about the type of Objects you store, I assume the methods equals and hashCode are redefined for them. Then the best thing that comes into mind is wrap the array of Object's into a class Row (it's a good thing to do anyway), redefine equals and hashCode for Row (using Arrays.equals and Arrays.hashCode) and count the occurrences of each Row in one pass using a

HashMap<Row, Integer> count;


for (Row row : table) {
    if (count.containsKey(row)) {
        count.put(row, count.get(row) + 1);
    } else {
        count.put(row, 1);
    }
}


Sort them, and then count reoccurances with a loop after that. That will bring it down to O(n log n)

or use a hashtable to do your counting instead. That should be a linear time calculation.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜