Ordered Data Structure that allows to efficiently remove duplicate items
I need a data structure that
- Must be ordered (adding elements a, b and cto an empty structure, will make them be at positions0, 1 and 2).
- Allows to add repeated items. This is, I can have a list with a, b, c, a, b.
- Allows removing all ocurrences of a given item (if I do something like delete(1), it will delete all ocurrences of1in the structure). If I have elementsa, b, c, d, c, eand remove element 开发者_开发百科c, I should geta, b, d, e.
- I just need to access the elements in two ways. The first, is when deleting a given ocorrence (see point above) and the other is when I convert the data I have in this structure to a list.
I can't really pick what the best data structure could be in here. I thought at first about something like a List(the problem is having an O(n) operation when removing items), but maybe I'm missing something? What about trees/heaps? Hashtables/maps?
I'll have to assume I'll do as much adding as removing with this data structure.
Thanks
I think you may have to write a dedicated data structure (depending on your efficiency requirements).
Something like a doubly linked list with an extra nextEqualItemPtr in it and a HashMap pointing to the first of each item.
Then you can quickly find the first "b" to remove and follow all the nextEqualItemPtrs to remove them all (double linked so easy to keep list intact). Overhead is keeping the map up to date really. The nextEqualItemPtr list of a new item can just point to the node returned by map.put(key).nextEqualItemPtr
I would definitely use something simple first, and only plug this kind of thing in if/when it is too slow.
the Bag interface of Apache Collections (homepage)  should fulfill your requirements. It has many implementations so maybe also one that keeps track of insertion order (your first point).
And it has:
- removeAll
- remove(count)
It is also quite fast compared to using a normal LinkedList or ArrayList but i'm not sure about having indices of elements inserted.
It is described as
Bag interface for collections that have a number of copies of each object
 
         加载中,请稍侯......
 加载中,请稍侯......
      
精彩评论