Remove duplicates from a sorted ArrayList while keeping some elements from the duplicates
Okay at first I thought this would be pretty straightforward. But I can't think of an efficient way to solve this. I figured a brute force way to solve this but that's not very elegant. I have an ArrayList. Contacts is a VO class that has multiple members - name, regions, id. There are duplicates in ArrayList because different regions appear multiple times. The list is sorted by ID. Here is an example:
Entry 0 - Name: John Smith; Region: N; ID: 1
Entry 1 - Name: John Smith; Region: MW; ID: 1 Entry 2 - Name: John Smith; Region: S; ID: 1 Entry 3 - Name: Jane Doe; Region: NULL; ID: 2 Entry 4 - Name: Jack Black; Region: N; ID: 3 Entry 6 - Name: Jack Black; Region: MW; ID: 3 Entry 7 - Name: Joe Don; Region: NE; ID: 4I want to transform the list to below by combining duplicate regions together for the same ID. Therefore, the final list should have only 4 distinct elements with the regions combined.
So the output should look like th开发者_如何学Cis:-
Entry 0 - Name: John Smith; Region: N,MW,S; ID: 1
Entry 1 - Name: Jane Doe; Region: NULL; ID: 2 Entry 2 - Name: Jack Black; Region: N,MW; ID: 3 Entry 3 - Name: Joe Don; Region: NE; ID: 4What are your thoughts on the optimal way to solve this? I am not looking for actual code but ideas or tips to go about the best way to get it done.
Thanks for your time!!!
You can iterate them while dumping them (and merging duplicates) into a TreeMap. Then create a list from the sorted view of the TreeMap's values.
In the sample code I'm assuming you have an Entry class with id, name and regions fields this last one being a List of Region instances. This could easily be changed to a Set, and Region to Strings or whatever you're using. The sample copies the entries before inserting them into the map since they will be modified when merged to other entries.
SortedMap<Integer, Entry> mergedEntriesMap = new TreeMap<Integer, Entry>();
for (Entry e : entries) {
if (mergedEntriesMap.contains(e.id)) {
Entry m = mergedEntriesMap.get(e);
m.regions.addAll(e.regions);
} else {
Entry m = new Entry();
// copy the entry to keep the original array clean
m.id = e.id;
m.name = e.name;
m.regions = new ArrayList<Region>(e.regions);
mergedEntriesMap.put(m.id, m);
}
}
List<Entry> mergedEntries = new ArrayList<Entry>(mergedEntriesMap.values());
Is the initial data stuck in this format? If not, you might want to look into changing the query you're using to retrieve your data by grouping all the ids together and forming a comma separated list column. Here's an example in sql
SELECT Id, [Name], Regions = replace
((SELECT Region AS [data()]
FROM RegionTable
WHERE Id = u.Id
ORDER BY Region FOR xml path('')), ' ', ', ')
FROM [User] u
WHERE Id IS NOT NULL
GROUP BY Id, [Name]
This is a pseudocode to accomplish what you want. At the abstract level, you have a list of Pair<K,V> (first, second)
, sorted by K
, and no two pair are truly equal (i.e. you can have (k1,v1)
and (k1,v2)
, but you can't have two (k1,v1)
in the list.
You want to merge consecutive pairs (k,v1),(k,v2),(k,v3)
to one group (k,[v1,v2,v3])
.
List<Pair<K,V>> in;
List<Pair<K,List<V>>> out = [ ];
Pair<K,V> lastP = SENTINEL_PAIR; // lastP.first matches nothing
Pair<K,List<V>> lastGroup;
for (Pair<K,V> p : in) {
if (p.first == lastP.first) { // same group as last
lastGroup.second.add(p.second);
} else { // start a new group
lastGroup = (p.first, [ p.second ]);
out.add(lastGroup);
}
lastP = p;
}
In your case, K
is the ID, and V
is the region. This is O(N)
.
Have you taken a look at google's Multimap
? It is pretty much created for this type of data structure in which there is a key that maps to a Collection
of items. So in this case a String
name would map to a Collection
of Region
objects.
Multimap<String, Region> names = HashMultimap.create();
for (Entry entry : entries) {
names.put(entry.getName(), entry.getRegion());
}
// Now u can get the collection of regions by name
Collection<Region> johnsRegions = names.get("John Smith");
精彩评论