开发者

Java - Removing duplicates in an ArrayList

I'm working on a program that uses an ArrayList to store Strings. The program prompts the user with a menu and allows the user to choose an operation to perform. Such operations are adding Strings to the List, printing the entries etc. What I want to be able to do is create a method called removeDuplicates(). This method will search the ArrayList and remove any duplicated values. I want to leave one instance of the duplicated value(s) within the list. I also want this method to return the total number of duplicates removed.

I've been trying to use nested loops to accomplish this but I've been running into trouble because when entries get deleted, the indexing of the ArrayList gets altered and things don't work as they should. I know conceptually what I need to do but I'm having trouble implementing this idea in code.

Here is some pseudo code:

start with first entry; check each subsequent entry in the list and see if i开发者_开发百科t matches the first entry; remove each subsequent entry in the list that matches the first entry;

after all entries have been examined, move on to the second entry; check each entry in the list and see if it matches the second entry; remove each entry in the list that matches the second entry;

repeat for entry in the list

Here's the code I have so far:

public int removeDuplicates()
{
  int duplicates = 0;

  for ( int i = 0; i < strings.size(); i++ )
  {
     for ( int j = 0; j < strings.size(); j++ )
     {
        if ( i == j )
        {
          // i & j refer to same entry so do nothing
        }

        else if ( strings.get( j ).equals( strings.get( i ) ) )
        {
           strings.remove( j );
           duplicates++;
        }
     }
 }

   return duplicates;
}

UPDATE: It appears that Will is looking for a homework solution that involves developing the algorithm to remove duplicates, rather than a pragmatic solution using Sets. See his comment:

Thx for the suggestions. This is part of an assignment and I believe the teacher had intended for the solution to not include sets. In other words, I am to come up with a solution that will search for and remove duplicates without implementing a HashSet. The teacher suggested using nested loops which is what I'm trying to do but I've been having some problems with the indexing of the ArrayList after certain entries are removed.


Why not use a collection such as Set (and an implementation like HashSet) which naturally prevents duplicates?


You can use nested loops without any problem:

public static int removeDuplicates(ArrayList<String> strings) {

    int size = strings.size();
    int duplicates = 0;

    // not using a method in the check also speeds up the execution
    // also i must be less that size-1 so that j doesn't
    // throw IndexOutOfBoundsException
    for (int i = 0; i < size - 1; i++) {
        // start from the next item after strings[i]
        // since the ones before are checked
        for (int j = i + 1; j < size; j++) {
            // no need for if ( i == j ) here
            if (!strings.get(j).equals(strings.get(i)))
                continue;
            duplicates++;
            strings.remove(j);
            // decrease j because the array got re-indexed
            j--;
            // decrease the size of the array
            size--;
        } // for j
    } // for i

    return duplicates;

}


You could try this one liner to take a copy of the String preserving order.

List<String> list;
List<String> dedupped = new ArrayList<String>(new LinkedHashSet<String>(list));

This approach is also O(n) amortized instead of O(n^2)


Just to clarify my comment on matt b's answer, if you really want to count the number of duplicates removed, use this code:

List<String> list = new ArrayList<String>();

// list gets populated from user input...

Set<String> set = new HashSet<String>(list);
int numDuplicates = list.size() - set.size();


List<String> lst = new ArrayList<String>();

lst.add("one");
lst.add("one");
lst.add("two");
lst.add("three");
lst.add("three");
lst.add("three");
Set se =new HashSet(lst);
lst.clear();
lst = new ArrayList<String>(se);
for (Object ls : lst){
    System.out.println("Resulting output---------" + ls);   
}


I've been trying to use nested loops to accomplish this but I've been running into trouble because when entries get deleted, the indexing of the ArrayList gets altered and things don't work as they should

Why don't you just decrease the counter each time you delete an entry.

When you delete an entry the elements will move too:

ej:

String [] a = {"a","a","b","c" }

positions:

a[0] = "a";
a[1] = "a";    
a[2] = "b";
a[3] = "c";

After you remove your first "a" the indexes are:

a[0] = "a";
a[1] = "b";
a[2] = "c";

So, you should take this into consideration and decrease the value of j ( j--) to avoid "jumping" over a value.

See this screenshot:

Java - Removing duplicates in an ArrayList


public Collection removeDuplicates(Collection c) {
// Returns a new collection with duplicates removed from passed collection.
    Collection result = new ArrayList();

    for(Object o : c) {
        if (!result.contains(o)) {
            result.add(o);
        }
    }

    return result;
}

or

public void removeDuplicates(List l) {
// Removes duplicates in place from an existing list
    Object last = null;
    Collections.sort(l);

    Iterator i = l.iterator();
    while(i.hasNext()) {
        Object o = i.next();
        if (o.equals(last)) {
            i.remove();
        } else {
            last = o;
        }
    }
}

Both untested.


Assuming you can't use a Set like you said, the easiest way of solving the problem is to use a temporary list, rather than attempting to remove the duplicates in place:

public class Duplicates {

    public static void main(String[] args) {
        List<String> list = new ArrayList<String>();
        list.add("one");
        list.add("one");
        list.add("two");
        list.add("three");
        list.add("three");
        list.add("three");

        System.out.println("Prior to removal: " +list);
        System.out.println("There were " + removeDuplicates(list) + " duplicates.");
        System.out.println("After removal: " + list);
    }

    public static int removeDuplicates(List<String> list) {
        int removed = 0;
        List<String> temp = new ArrayList<String>();

        for(String s : list) {
            if(!temp.contains(s)) {
                temp.add(s);
            } else {
                //if the string is already in the list, then ignore it and increment the removed counter
                removed++;
            }
        }

        //put the contents of temp back in the main list
        list.clear();
        list.addAll(temp);

        return removed;
    }

}


You could do something like this, must of what people answered above is one alternative, but here's another.

for (int i = 0; i < strings.size(); i++) {
    for (int j = j + 1; j > strings.size(); j++) {
      if(strings.get(i) == strings.get(j)) {
            strings.remove(j);
            j--;
       }`
    }
  }

return strings;


Using a set is the best option to remove the duplicates:

If you have a list of of arrays you can remove the duplicates and still retain array list features:

 List<String> strings = new ArrayList<String>();
 //populate the array
 ...
 List<String> dedupped = new ArrayList<String>(new HashSet<String>(strings));
 int numdups = strings.size() - dedupped.size();

if you can't use a set, sort the array (Collections.sort()) and iterate over the list, checking if the current element is equal to the previous element, if it is, remove it.


Using a set is the best option (as others suggested).

If you want to compare all elements in a list with eachother you should slightly adapt your for loops:

for(int i = 0; i < max; i++)
    for(int j = i+1; j < max; j++)

This way you don't compare each element only once instead of twice. This is because the second loop start at the next element compared to the first loop.

Also when removing from a list when iterating over them (even when you use a for loop instead of an iterator), keep in mind that you reduce the size of the list. A common solution is to keep another list of items you want to delete, and then after you finished deciding which to delete, you delete them from the original list.


public ArrayList removeDuplicates(ArrayList <String> inArray)
{
    ArrayList <String> outArray = new ArrayList();
    boolean doAdd = true;
    for (int i = 0; i < inArray.size(); i++)
    {
        String testString = inArray.get(i);
        for (int j = 0; j < inArray.size(); j++)
        {
            if (i == j)
            {
                break;
            }
            else if (inArray.get(j).equals(testString))
            {
                doAdd = false;
                break;
            }

        }
        if (doAdd)
        {
            outArray.add(testString);
        }
        else
        {
            doAdd = true;
        }

    }
    return outArray;

}


You could replace the duplicate with an empty string*, thus keeping the indexing in tact. Then after you've completed you can strip out the empty strings.

*But only if an empty string isn't valid in your implementation.


The problem you are seeing in your code is that you remove an entry during iteration, thus invalidating the iteration location.

For example:

{"a", "b", "c", "b", "b", "d"} 
       i         j  

Now you are removing strings[j].

{"a", "b", "c", "b", "d"} 
       i         j  

The inner loop ends and j is incremented.

{"a", "b", "c", "b", "d"} 
       i              j

Only one duplicate 'b' detected...oops.

best practice in these cases is to store the locations that have to be removed, and remove them after you have finished iterating through the arraylist. (One bonus, the strings.size() call can be optimized outside of the loops by you or the compiler)

Tip, you can start iterating with j at i+1, you've already checked the 0 - i!


The inner for loop is invalid. If you delete an element, you cannot increment j, since j is now pointing at the element after the one you deleted, and you will need to inspect it.

In other words, you should use a while loop instead of a for loop, and only increment j if the elements at i and j do not match. If they do match, remove the element at j. size() will decrease by 1 and j will now be pointing at the following element, so there is no need to increase j.

Also, there is no reason to inspect all elements in the inner loop, just the ones following i, since duplicates before i have already been removed by prior iterations.


public <Foo> Entry<Integer,List<Foo>> uniqueElementList(List<Foo> listWithPossibleDuplicates) {
  List<Foo> result = new ArrayList<Foo>();//...might want to pre-size here, if you have reliable info about the number of dupes
  Set<Foo> found = new HashSet<Foo>(); //...again with the pre-sizing
  for (Foo f : listWithPossibleDuplicates) if (found.add(f)) result.add(f);
  return entryFactory(listWithPossibleDuplicates.size()-found.size(), result);
}

and then some entryFactory(Integer key, List<Foo> value) method. If you want to mutate the original list (possibly not a good idea, but whatever) instead:

public <Foo> int removeDuplicates(List<Foo> listWithPossibleDuplicates) {
  int original = listWithPossibleDuplicates.size();
  Iterator<Foo> iter = listWithPossibleDuplicates.iterator();
  Set<Foo> found = new HashSet<Foo>();
  while (iter.hasNext()) if (!found.add(iter.next())) iter.remove();
  return original - found.size();
}

for your particular case using strings, you may need to deal with some additional equality constraints (e.g., are upper and lower case versions the same or different?).

EDIT: ah, this is homework. Look up Iterator/Iterable in the Java Collections framework, as well as Set, and see if you don't come to the same conclusion I offered. The generics part is just gravy.


I am bit late to join this question, but I have come with a better solution regarding the same using GENERIC type. All the above provided solutions are just a solution. They are increasing a lead to the complexity of whole runtime thread.

RemoveDuplicacy.java

We can minimize it using a technique which should do the required , at the Load Time.

Example : For suppose when you are using a arraylist of the class type as :

ArrayList<User> usersList = new ArrayList<User>();
        usersList.clear();

        User user = new User();
        user.setName("A");
        user.setId("1"); // duplicate
        usersList.add(user);

        user = new User();
        user.setName("A");
        user.setId("1"); // duplicate
        usersList.add(user);

        user = new User();
        user.setName("AB");
        user.setId("2"); // duplicate
        usersList.add(user);

        user = new User();
        user.setName("C");
        user.setId("4");
        usersList.add(user);

        user = new User();
        user.setName("A");
        user.setId("1"); // duplicate
        usersList.add(user);

        user = new User();
        user.setName("A");
        user.setId("2"); // duplicate
        usersList.add(user);


}

The Class for which is the base for the arraylist used above : User class

class User {
    private String name;
    private String id;

    /**
     * @param name
     *            the name to set
     */
    public void setName(String name) {
        this.name = name;
    }

    /**
     * @return the name
     */
    public String getName() {
        return name;
    }

    /**
     * @param id
     *            the id to set
     */
    public void setId(String id) {
        this.id = id;
    }

    /**
     * @return the id
     */
    public String getId() {
        return id;
    }

}

Now in java there are two Overrided methods present of Object (parent) Class, which can help here in the means to serve our purpose better.They are :

@Override
    public int hashCode() {

        final int prime = 31;
        int result = 1;
        result = prime * result + ((id == null) ? 0 : id.hashCode());
        return result;

    }

    @Override
    public boolean equals(Object obj) {

        if (this == obj)
            return true;

        if (obj == null)
            return false;

        if (getClass() != obj.getClass())
            return false;

        User other = (User) obj;

        if (id == null) {
            if (other.id != null)
                return false;

        } else if (!id.equals(other.id))
            return false;

        return true;

    }

You have to override these methods in the User class

Here is the complete code :

https://gist.github.com/4584310

Let me know if you have any queries.


You can add the list into a HashSet and then again convert that hashset into list to remove the duplicates.

public static int removeDuplicates(List<String> duplicateList){
    List<String> correctedList = new ArrayList<String>();
    Set<String> a = new HashSet<String>();
    a.addAll(duplicateList);
    correctedList.addAll(a);
    return (duplicateList.size()-correctedList.size());
}

here it will return the number of duplicates. You can also use the correctList with all unique values


Below is the code to remove duplicate elements from a list without changing the order of the list,without using temporary list and without using any set variables.This code saves the memory and boosts performance.

This is a generic method which works with any kind of list.

This was the question asked in one of the interviews. Searched in many forums for the solution but could not find one,so thought this is the correct forum to post the code.

    public List<?> removeDuplicate(List<?> listWithDuplicates) {
    int[] intArray = new int[listWithDuplicates.size()];
    int dupCount = 1;
    int arrayIndex = 0;
    int prevListIndex = 0; // to save previous listIndex value from intArray
    int listIndex;

    for (int i = 0; i < listWithDuplicates.size(); i++) {
        for (int j = i + 1; j < listWithDuplicates.size(); j++) {
            if (listWithDuplicates.get(j).equals(listWithDuplicates.get(i)))
                dupCount++;

            if (dupCount == 2) {
                intArray[arrayIndex] = j; // Saving duplicate indexes to an array
                arrayIndex++;
                dupCount = 1;
            }
        }
    }

    Arrays.sort(intArray);

    for (int k = intArray.length - 1; k >= 0; k--) {
        listIndex = intArray[k];
        if (listIndex != 0 && prevListIndex != listIndex){
            listWithDuplicates.remove(listIndex);
            prevListIndex = listIndex;
        }
    }
    return listWithDuplicates;
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜