Find all numbers that appear in each of a set of lists
I have several ArrayLists of Integer objects, stored in a HashMap.
I want to get a list (ArrayList) of all the numbers (Integer objects) that appear in each list.
My thinking so far is:
- Iterate through each ArrayList and put all the values into a HashSet
- This will give us a "listing" of all the values in the lists, but only once
- Iterate through the HashSet 2.1 With each iteration perform ArrayList.contains() 2.2 If none of the ArrayLists return false for the operation add the number to a "master list" which contains all the final values.
If you can come up with something faster or more efficient, funny thing is as I wrote this I came up with a reasonably good solution. But I'll still post it just in case it is useful for someone else.
But of course开发者_运维问答 if you have a better way please do let me know.
I am not sure that I understand your goal. But if you wish to find the intersection of a collection of List<Integer> objects, then you can do the following:
public static List<Integer> intersection(Collection<List<Integer>> lists){
if (lists.size()==0)
return Collections.emptyList();
Iterator<List<Integer>> it = lists.iterator();
HashSet<Integer> resSet = new HashSet<Integer>(it.next());
while (it.hasNext())
resSet.retainAll(new HashSet<Integer>(it.next()));
return new ArrayList<Integer>(resSet);
}
This code runs in linear time in the total number of items. Actually this is average linear time, because of the use of HashSet.
Also, note that if you use ArrayList.contains() in a loop, it may result in quadratic complexity, since this method runs in linear time, unlike HashSet.contains() that runs in constant time.
You have to change step 1: - Use the shortest list instead of your hashSet (if it isn't in the shortest list it isn't in all lists...)
Then call contains on the other lists and remove value as soon as one return false (and skip further tests for this value)
At the end the shortest list will contain the answer...
some code:
public class TestLists {
private static List<List<Integer>> listOfLists = new ArrayList<List<Integer>>();
private static List<Integer> filter(List<List<Integer>> listOfLists) {
// find the shortest list
List<Integer> shortestList = null;
for (List<Integer> list : listOfLists) {
if (shortestList == null || list.size() < shortestList.size()) {
shortestList = list;
}
}
// create result list from the shortest list
final List<Integer> result = new LinkedList<Integer>(shortestList);
// remove elements not present in all list from the result list
for (Integer valueToTest : shortestList) {
for (List<Integer> list : listOfLists) {
// no need to compare to itself
if (shortestList == list) {
continue;
}
// if one list doesn't contain value, remove from result and break loop
if (!list.contains(valueToTest)) {
result.remove(valueToTest);
break;
}
}
}
return result;
}
public static void main(String[] args) {
List<Integer> l1 = new ArrayList<Integer>(){{
add(100);
add(200);
}};
List<Integer> l2 = new ArrayList<Integer>(){{
add(100);
add(200);
add(300);
}};
List<Integer> l3 = new ArrayList<Integer>(){{
add(100);
add(200);
add(300);
}};
List<Integer> l4 = new ArrayList<Integer>(){{
add(100);
add(200);
add(300);
}};
List<Integer> l5 = new ArrayList<Integer>(){{
add(100);
add(200);
add(300);
}};
listOfLists.add(l1);
listOfLists.add(l2);
listOfLists.add(l3);
listOfLists.add(l4);
listOfLists.add(l5);
System.out.println(filter(listOfLists));
}
}
- Create a
Set
(e.g.HashSet
) from the firstList
. - For each remaining list:
- call
set.retainAll (list)
if bothlist
andset
are small enough - otherwise call
set.retainAll (new HashSet <Integer> (list))
- call
I cannot say after which thresholds second variant of step 2. becomes faster, but I guess maybe > 20
in size or so. If your lists are all small, you can not bother with this check.
As I remember Apache Collections have more efficient integer-only structures if you care not only about O(*) part, but also about the factor.
Using the Google Collections Multiset makes this (representation-wise) a cakewalk (though I also like Eyal's answer). It's probably not as efficient time/memory-wise as some of the other's here, but it's very clear what's going on.
Assuming the lists contain no duplicates within themselves:
Multiset<Integer> counter = HashMultiset.create();
int totalLists = 0;
// for each of your ArrayLists
{
counter.addAll(list);
totalLists++;
}
List<Integer> inAll = Lists.newArrayList();
for (Integer candidate : counter.elementSet())
if (counter.count(candidate) == totalLists) inAll.add(candidate);`
if the lists might contain duplicate elements, they can be passed through a set first:
counter.addAll(list) => counter.addAll(Sets.newHashSet(list))
Finally, this is also ideal if you want might want some additional data later (like, how close some particular value was to making the cut).
Another approach that slightly modifies Eyal's (basically folding together the act of filtering a list through a set and then retaining all the overlapping elements), and is more lightweight than the above:
public List<Integer> intersection(Iterable<List<Integer>> lists) {
Iterator<List<Integer>> listsIter = lists.iterator();
if (!listsIter.hasNext()) return Collections.emptyList();
Set<Integer> bag = new HashSet<Integer>(listsIter.next());
while (listsIter.hasNext() && !bag.isEmpty()) {
Iterator<Integer> itemIter = listsIter.next().iterator();
Set<Integer> holder = new HashSet<Integer>(); //perhaps also pre-size it to the bag size
Integer held;
while (itemIter.hasNext() && !bag.isEmpty())
if ( bag.remove(held = itemIter.next()) )
holder.add(held);
bag = holder;
}
return new ArrayList<Integer>(bag);
}
精彩评论