开发者

Remove duplicates from a sorted ArrayList while keeping some elements from the duplicates

Okay at first I thought this would be pretty straightforward. But I can't think of an efficient way to solve this. I figured a brute force way to solve this but that's not very elegant. I have an ArrayList. Contacts is a VO class that has multiple members - name, regions, id. There are duplicates in ArrayList because different regions appear multiple times. The list is sorted by ID. Here is an example:

Entry 0 - Name: John Smith; Region: N; ID: 1

Entry 1 - Name: John Smith; Region: MW; ID: 1

Entry 2 - Name: John Smith; Region: S; ID: 1

Entry 3 - Name: Jane Doe; Region: NULL; ID: 2

Entry 4 - Name: Jack Black; Region: N; ID: 3

Entry 6 - Name: Jack Black; Region: MW; ID: 3

Entry 7 - Name: Joe Don; Region: NE; ID: 4

I want to transform the list to below by combining duplicate regions together for the same ID. Therefore, the final list should have only 4 distinct elements with the regions combined.

So the output should look like th开发者_如何学Cis:-

Entry 0 - Name: John Smith; Region: N,MW,S; ID: 1

Entry 1 - Name: Jane Doe; Region: NULL; ID: 2

Entry 2 - Name: Jack Black; Region: N,MW; ID: 3

Entry 3 - Name: Joe Don; Region: NE; ID: 4

What are your thoughts on the optimal way to solve this? I am not looking for actual code but ideas or tips to go about the best way to get it done.

Thanks for your time!!!


You can iterate them while dumping them (and merging duplicates) into a TreeMap. Then create a list from the sorted view of the TreeMap's values.

In the sample code I'm assuming you have an Entry class with id, name and regions fields this last one being a List of Region instances. This could easily be changed to a Set, and Region to Strings or whatever you're using. The sample copies the entries before inserting them into the map since they will be modified when merged to other entries.

SortedMap<Integer, Entry> mergedEntriesMap = new TreeMap<Integer, Entry>();
for (Entry e : entries) {
  if (mergedEntriesMap.contains(e.id)) {
    Entry m = mergedEntriesMap.get(e);
    m.regions.addAll(e.regions);
  } else {
    Entry m = new Entry();
    // copy the entry to keep the original array clean
    m.id = e.id;
    m.name = e.name;
    m.regions = new ArrayList<Region>(e.regions);
    mergedEntriesMap.put(m.id, m);
  }
}

List<Entry> mergedEntries = new ArrayList<Entry>(mergedEntriesMap.values());


Is the initial data stuck in this format? If not, you might want to look into changing the query you're using to retrieve your data by grouping all the ids together and forming a comma separated list column. Here's an example in sql

SELECT      Id, [Name], Regions = replace
            ((SELECT Region AS [data()]
            FROM RegionTable
            WHERE  Id = u.Id
            ORDER BY Region FOR xml path('')), ' ', ', ')
FROM        [User] u
WHERE       Id IS NOT NULL
GROUP BY Id, [Name]


This is a pseudocode to accomplish what you want. At the abstract level, you have a list of Pair<K,V> (first, second), sorted by K, and no two pair are truly equal (i.e. you can have (k1,v1) and (k1,v2), but you can't have two (k1,v1) in the list.

You want to merge consecutive pairs (k,v1),(k,v2),(k,v3) to one group (k,[v1,v2,v3]).

List<Pair<K,V>> in;
List<Pair<K,List<V>>> out = [ ];

Pair<K,V> lastP = SENTINEL_PAIR; // lastP.first matches nothing
Pair<K,List<V>> lastGroup;

for (Pair<K,V> p : in) {
  if (p.first == lastP.first) {  // same group as last
    lastGroup.second.add(p.second);
  } else {                       // start a new group
    lastGroup = (p.first, [ p.second ]);
    out.add(lastGroup);
  }
  lastP = p;
}

In your case, K is the ID, and V is the region. This is O(N).


Have you taken a look at google's Multimap? It is pretty much created for this type of data structure in which there is a key that maps to a Collection of items. So in this case a String name would map to a Collection of Region objects.

Multimap<String, Region> names = HashMultimap.create();
for (Entry entry : entries) {
    names.put(entry.getName(), entry.getRegion());
}
// Now u can get the collection of regions by name
Collection<Region> johnsRegions = names.get("John Smith");
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜