开发者

Iterating over the grouped union of many interators

I'm trying to loop over a dict of many iterators ... they are many terabytes in size but sorted. A simple example is like this:

t = { 'a': iter([1,1,1,2,2,3,3,4,6,7,7,7]),
'b': iter([2,2,2,3,3,4,6,6,6,7,7,7]),
'c': iter([1,1,1,4,4,6,6,7,7]),
'd': iter([1,1,1,3,3,3,7,7,7])
}

I need to yield a dict for each unique item that is itself an iterator (again because each grouping may be terabytes in size). In this example I would need something like:

{'a':iter([1,1,1]),
'b':iter(),
'c':iter([1,1,1]),
'd':iter([1,1,1])
}

{'a':iter([2,2]),
'b':iter([2,2,2]),
'c':iter(),
'd':iter()
}

{'a':iter([3,3]),
'b':iter([3,3]),
'c':iter(),
'd':iter([3,3,3])
}

{'a':iter([4]),
'b':iter([4]),
'c':iter([4,4]),
'd':iter()
}

Th开发者_开发问答ere are no 5's so we just skip it

{'a':iter([6]),
'b':iter([6,6,6]),
'c':iter([6,6]),
'd':iter()
}

{'a':iter([7,7,7]),
'b':iter([7,7,7]),
'c':iter([7,7]),
'd':iter([7,7,7])
}

StopIteration

Its also okay if the "empty iterators" are just missing from the dict.

I'm pretty sure I need a groupby but I just can't seem to get together.

Thanks for the help.


So far I've been able to come up with something like this:

grouped = {}
for key, item in t.items():
  grouped[key] = groupby(item):

current_items = {}
for key, val in grouped.items():
  current_items[key] = val.next()

while current_items:
  #find the first one
  this_item = min((item for item, _ in current_items.items()))
  outdict = {}
  for key, (item, rows) in current_items.items():
    if item == this_item:
      #move the item to the output
      outdict[key] = rows
    try:
      #advance the iterator
      current_items[key] = grouped.next()
    except StopIteration:
      #must be out of items
      current_items.pop(key)
      grouped.pop(key)
  yield outdict

If anyone knows a more pythonic way to do it I'd be glad to see it.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜