Ordered Data Structure that allows to efficiently remove duplicate items
I need a data structure that
- Must be ordered (adding elements
a, b and c
to an empty structure, will make them be at positions0, 1 and 2
). - Allows to add repeated items. This is, I can have a list with
a, b, c, a, b
. - Allows removing all ocurrences of a given item (if I do something like
delete(1)
, it will delete all ocurrences of1
in the structure). If I have elementsa, b, c, d, c, e
and remove element 开发者_开发百科c
, I should geta, b, d, e
. - I just need to access the elements in two ways. The first, is when deleting a given ocorrence (see point above) and the other is when I convert the data I have in this structure to a list.
I can't really pick what the best data structure could be in here. I thought at first about something like a List(the problem is having an O(n)
operation when removing items), but maybe I'm missing something? What about trees/heaps? Hashtables/maps?
I'll have to assume I'll do as much adding as removing with this data structure.
Thanks
I think you may have to write a dedicated data structure (depending on your efficiency requirements).
Something like a doubly linked list with an extra nextEqualItemPtr in it and a HashMap pointing to the first of each item.
Then you can quickly find the first "b" to remove and follow all the nextEqualItemPtrs to remove them all (double linked so easy to keep list intact). Overhead is keeping the map up to date really. The nextEqualItemPtr list of a new item can just point to the node returned by map.put(key).nextEqualItemPtr
I would definitely use something simple first, and only plug this kind of thing in if/when it is too slow.
the Bag
interface of Apache Collections (homepage) should fulfill your requirements. It has many implementations so maybe also one that keeps track of insertion order (your first point).
And it has:
removeAll
remove(count)
It is also quite fast compared to using a normal LinkedList
or ArrayList
but i'm not sure about having indices of elements inserted.
It is described as
Bag interface for collections that have a number of copies of each object
精彩评论