开发者

Sorting of 2 or more massive resultsets?

I need to be able to sort multiple intermediate result sets and enter them to a file in sorted order. Sort is based on a single column/key value. Each result set record will be list of values (like a record in a table)

  1. The intermediate result sets are got by querying entirely different databases.
  2. The intermediate result sets are already sorted based on some key(or column). They need to be combined and sorted again on the same key(or column) before writing it to a file.
  3. Since these result sets can be massive(order of MBs) this cannot be done in memory.

My Solution broadly :

To use a hash and a random access file . Since the result sets are already sorted, when retrieving the result sets , I will store the sorted column 开发者_如何学Govalues as keys in a hashmap.The value in the hashmap will be a address in the random access file where every record associated with that column value will be stored.

Any ideas ?


Have a pointer into every set, initially pointing to the first entry

Then choose the next result from the set, that offers the lowest entry

Write this entry to the file and increment the corresponding pointer

This approach has basically no overhead and time is O(n). (it's Merge-Sort, btw)

Edit

To clarify: It's the merge part of merge sort.


If you've got 2 pre-sorted result sets, you should be able to iterate them concurrently while writing the output file. You just need to compare the current row in each set: Simple example (not ready for copy-and-paste use!):

ResultSet a,b;
//fetch a and b
a.first();
b.first();
while (!a.isAfterLast() || !b.isAfterLast()) {
  Integer valueA = null;
  Integer valueB = null;

  if (a.isAfterLast()) {
    writeToFile(b);
    b.next();
  }
  else if (b.isAfterLast()) {
    writeToFile(a);
    a.next();
  } else {
    int valueA = a.getInt("SORT_PROPERTY");
    int valueB = b.getInt("SORT_PROPERTY");
    if (valueA < valueB) {
      writeToFile(a);
      a.next();
    } else {
      writeToFile(b);
      b.next();
    }
  }



}


Sounds like you are looking for an implementation of the Balance Line algorithm.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜