开发者

How to remove duplicate objects in a List<MyObject> without equals/hashcode?

I have to remove duplicated objects in a List. It is a List from the object Blog that 开发者_开发百科looks like this:

public class Blog {
    private String title;
    private String author;
    private String url;
    private String description;
    ...
}

A duplicated object is an object that have title, author, url and description equal to other object.

And I can't alter the object. I can't put new methods on it.

How do I do this?


Here is the complete code which works for this scenario:

class Blog {
    private String title;
    private String author;
    private String url;
    private String description; 

    public String getTitle() {
        return title;
    }

    public void setTitle(String title) {
        this.title = title;
    }

    public String getAuthor() {
        return author;
    }

    public void setAuthor(String author) {
        this.author = author;
    }

    public String getUrl() {
        return url;
    }

    public void setUrl(String url) {
        this.url = url;
    }

    public String getDescription() {
        return description;
    }

    public void setDescription(String description) {
        this.description = description;
    }

    Blog(String title, String author, String url, String description)
    {
        this.title = title;
        this.author = author;
        this.url = url;
        this.description = description; 
    }

    @Override
    public boolean equals(Object obj) {
        // TODO Auto-generated method stub
        if(obj instanceof Blog)
        {
            Blog temp = (Blog) obj;
            if(this.title.equals(temp.title) && this.author.equals(temp.author) && this.url.equals(temp.url) && this.description.equals(temp.description))
                return true;
        }
        return false;
    }

    @Override
    public int hashCode() {
        // TODO Auto-generated method stub
        
        return (this.title.hashCode() + this.author.hashCode() + this.url.hashCode() + this.description.hashCode());        
    }
}

Here is the main function which will eliminate the duplicates:

public static void main(String[] args) {
    Blog b1 = new Blog("A", "sam", "a", "desc");
    Blog b2 = new Blog("B", "ram", "b", "desc");
    Blog b3 = new Blog("C", "cam", "c", "desc");
    Blog b4 = new Blog("A", "sam", "a", "desc");
    Blog b5 = new Blog("D", "dam", "d", "desc");
    List<Blog> list = new ArrayList();
    list.add(b1);
    list.add(b2);
    list.add(b3);
    list.add(b4);       
    list.add(b5);
    
    //Removing Duplicates;
    Set<Blog> s= new HashSet<Blog>();
    s.addAll(list);         
    list = new ArrayList<Blog>();
    list.addAll(s);        
    //Now the List has only the identical Elements
}


If you can't edit the source of the class (why not?), then you need to iterate over the list and compare each item based on the four criteria mentioned ("title, author, url and description").

To do this in a performant way, I would create a new class, something like BlogKey which contains those four elements and which properly implements equals() and hashCode(). You can then iterate over the original list, constructing a BlogKey for each and adding to a HashMap:

Map<BlogKey, Blog> map = new HashMap<BlogKey, Blog>();
for (Blog blog : blogs) {
     BlogKey key = createKey(blog);
     if (!map.containsKey(key)) {
          map.put(key, blog);
     }
}
Collection<Blog> uniqueBlogs = map.values();

However the far simplest thing is to just edit the original source code of Blog so that it correctly implements equals() and hashCode().


Make sure Blog has methods equals(Object) and hashCode() defined, and addAll(list) then to a new HashSet(), or new LinkedHashSet() if the order is important.

Better yet, use a Set instead of a List from the start, since you obviously don't want duplicates, it's better that your data model reflects that rather than having to remove them after the fact.


Use set:

yourList = new ArrayList<Blog>(new LinkedHashSet<Blog>(yourList));

This will create list without duplicates and the element order will be as in original list.

Just do not forget to implement hashCode() and equals() for your class Blog.


  1. override hashCode() and equals(..) using those 4 fields
  2. use new HashSet<Blog>(blogList) - this will give you a Set which has no duplicates by definition

Update: Since you can't change the class, here's an O(n^2) solution:

  • create a new list
  • iterate the first list
  • in an inner loop iterate the second list and verify if it has an element with the same fields

You can make this more efficient if you provide a HashSet data structure with externalized hashCode() and equals(..) methods.


If for some reasons you don't want to override the equals method and you want to remove duplicates based on multiple properties, then we can create a generic method to do that.

We can write it in 2 versions:

1. Modify the original list:

@SafeVarargs
public static <T> void removeDuplicatesFromList(List<T> list, Function<T, ?>... keyFunctions) {

    Set<List<?>> set = new HashSet<>();

    ListIterator<T> iter = list.listIterator();
    while(iter.hasNext()) {
        T element = iter.next();

        List<?> functionResults = Arrays.stream(keyFunctions)
                .map(function -> function.apply(element))
                .collect(Collectors.toList());

        if(!set.add(functionResults)) {
            iter.remove();
        }
    }
}

2. Return a new list:

@SafeVarargs
public static <T> List<T> getListWithoutDuplicates(List<T> list, Function<T, ?>... keyFunctions) {

    List<T> result = new ArrayList<>();

    Set<List<?>> set = new HashSet<>();

    for(T element : list) {
        List<?> functionResults = Arrays.stream(keyFunctions)
                .map(function -> function.apply(element))
                .collect(Collectors.toList());

        if(set.add(functionResults)) {
            result.add(element);
        }
    }

    return result;
}

In both cases we can consider any number of properties.

For example, to remove duplicates based on 4 properties title, author, url and description:

removeDuplicatesFromList(blogs, Blog::getTitle, Blog::getAuthor, Blog::getUrl, Blog::getDescription);

The methods work by leveraging the equals method of List, which will check the equality of its elements. In our case the elements of functionResults are the values retrieved from the passed getters and we can use that list as an element of the Set to check for duplicates.

Complete example:

public class Duplicates {

    public static void main(String[] args) {

        List<Blog> blogs = new ArrayList<>();
        blogs.add(new Blog("a", "a", "a", "a"));
        blogs.add(new Blog("b", "b", "b", "b"));
        blogs.add(new Blog("a", "a", "a", "a"));    // duplicate
        blogs.add(new Blog("a", "a", "b", "b"));
        blogs.add(new Blog("a", "b", "b", "b"));
        blogs.add(new Blog("a", "a", "b", "b"));    // duplicate

        List<Blog> blogsWithoutDuplicates = getListWithoutDuplicates(blogs, 
                Blog::getTitle, Blog::getAuthor, Blog::getUrl, Blog::getDescription);
        System.out.println(blogsWithoutDuplicates); // [a a a a, b b b b, a a b b, a b b b]
        
        removeDuplicatesFromList(blogs, 
                Blog::getTitle, Blog::getAuthor, Blog::getUrl, Blog::getDescription);
        System.out.println(blogs);                  // [a a a a, b b b b, a a b b, a b b b]
    }

    private static class Blog {
        private String title;
        private String author;
        private String url;
        private String description;

        public Blog(String title, String author, String url, String description) {
            this.title = title;
            this.author = author;
            this.url = url;
            this.description = description;
        }

        public String getTitle() {
            return title;
        }

        public String getAuthor() {
            return author;
        }

        public String getUrl() {
            return url;
        }

        public String getDescription() {
            return description;
        }

        @Override
        public String toString() {
            return String.join(" ", title, author, url, description);
        }
    }
}


You can use distinct to remove duplicates

List<Blog> blogList = ....// add your list here

blogList.stream().distinct().collect(Collectors.toList());


Here is one way of removing a duplicate object.

The blog class should be something like this or similar, like proper pojo

public class Blog {

    private String title;
    private String author;
    private String url;
    private String description;

    private int hashCode;



    public String getTitle() {
        return title;
    }
    public void setTitle(String title) {
       this.title = title;
    }
    public String getAuthor() {
        return author;
    }
    public void setAuthor(String author) {
        this.author = author;
    }
    public String getUrl() {
        return url;
    }
    public void setUrl(String url) {
        this.url = url;
    }
    public String getDescription() {
        return description;
    }
    public void setDescription(String description) {
        this.description = description;
    }

    @Override
    public boolean equals(Object obj) {

        Blog blog = (Blog)obj;

        if(title.equals(blog.title) &&
                author.equals(blog.author) &&
                url.equals(blog.url) &&
                description.equals(blog.description))
        {
            hashCode = blog.hashCode;
            return true;
        }else{
            hashCode = super.hashCode();
            return false;
        }
    }

}

And use it like this to remove duplicates objects. The key data structure here is the Set and LinkedHashSet. It will remove duplicates and also keep order of entry

    Blog blog1 = new Blog();
    blog1.setTitle("Game of Thrones");
    blog1.setAuthor("HBO");
    blog1.setDescription("The best TV show in the US");
    blog1.setUrl("www.hbonow.com/gameofthrones");

    Blog blog2 = new Blog();
    blog2.setTitle("Game of Thrones");
    blog2.setAuthor("HBO");
    blog2.setDescription("The best TV show in the US");
    blog2.setUrl("www.hbonow.com/gameofthrones");

    Blog blog3 = new Blog();
    blog3.setTitle("Ray Donovan");
    blog3.setAuthor("Showtime");
    blog3.setDescription("The second best TV show in the US");
    blog3.setUrl("www.showtime.com/raydonovan");

    ArrayList<Blog> listOfBlogs = new ArrayList<>();

    listOfBlogs.add(blog1);
    listOfBlogs.add(blog2);
    listOfBlogs.add(blog3);


    Set<Blog> setOfBlogs = new LinkedHashSet<>(listOfBlogs);

    listOfBlogs.clear();
    listOfBlogs.addAll(setOfBlogs);

    for(int i=0;i<listOfBlogs.size();i++)
        System.out.println(listOfBlogs.get(i).getTitle());

Running this should print

Game of Thrones
Ray Donovan

The second one will be removed because it is a duplicate of the first object.


use this code

 public List<Blog> removeDuplicates(List<Blog> list) {
    // Set set1 = new LinkedHashSet(list);
    Set set = new TreeSet(new Comparator() {

        @Override
        public int compare(Object o1, Object o2) {
            if (((Blog) o1).get().equalsIgnoreCase(((Blog) o2).getId()) /*&&
                    ((Blog)o1).getName().equalsIgnoreCase(((Blog)o2).getName())*/) {
                return 0;
            }
            return 1;
        }
    });
    set.addAll(list);

    final List newList = new ArrayList(set);
    return newList;
}


If your Blog class has an appropriate equals() method defined on it, the simplest way is just to create a Set out of your list, which will automatically remove duplicates:

List<Blog> blogList = ...; // your initial list
Set<Blog> noDups = new HashSet<Blog>(blogList)

The chances are this will work transparently with the rest of your code - if you're just iterating over the contents, for example, then any instance of Collection is as good as another. (If iteration order matters, then you may prefer a LinkedHashSet instead, which will preserve the original ordering of the list).

If you really need the result to be a List then keeping with the straightforward approach, you can just convert it straight back again by wrapping in an ArrayList (or similar). If your collections are relatively small (less than a thousand elements, say) then the apparent inefficiencies of this approach are likely to be immaterial.


You could override the equals() method, with title, author, url and description. (and the hashCode() since if you override one you should override the other). Then use a HashSet of type <blog>.


First step you need is to implement the equals method and compare your fields. After that the steps vary.

You could create a new empty list and loop over the original, using: if(!list2.contains(item)) and then do an add.

Another quick way to do it, is to cram them all into a Set and pull them back into a List. This works because Sets do not allow duplicates to begin with.


And i can't alter the object. I can't put new methods on it.

How do i do this?

In case you also mean how do I make the object immutable and prevent subclassing: use the final keyword

public final class Blog { //final classes can't be extended/subclassed
   private final String title; //final members have to be set in the constructor and can't be changed
   private final String author;
   private final String url;
   private final String description;
    ...
}

Edit: I just saw some of your comments and it seems you want to change the class but can't (third party I assume).

To prevent duplicates you might use a wrapper that implements appropriate equals() and hashCode(), then use the Set aproach mentioned by the others:

 class BlogWrapper {
   private Blog blog; //set via constructor etc.

   public int hashCode() {
     int hashCode = blog.getTitle().hashCode(); //check for null etc.
     //add the other hash codes as well
     return hashCode;
   }

   public boolean equals(Object other) {
     //check if both are BlogWrappers
     //remember to check for null too!
     Blog otherBlog = ((BlogWrapper)other).getBlog(); 
     if( !blog.getTitle().equals(otherBlog.getTitle()) {
       return false;
     }
     ... //check other fields as well
     return true
   }
 }

Note that this is just a rough and simple version and doesn't contain the obligatory null checks.

Finally use a Set<BlogWrapper>, loop through all the blogs and try to add new BlogWrapper(blog) to the set. In the end, you should only have unique (wrapped) blogs in the set.


I tried doing several ways for removing duplicates from a list of java objects
Some of them are
1. Override equals and hashCode methods and Converting the list to a set by passing the list to the set class constructor and do remove and add all
2. Run 2 pointers and remove the duplicates manually by running 2 for loops one inside the other like we used to do in C language for arrays
3.Write a anonymous Comparator class for the bean and do a Collections.sort and then run 2 pointers to remove in forward direction.



And more over my requirement was to remove almost 1 million duplicates from almost 5 million objects.
So after so many trials I ended up with third option which I feel is the most efficient and effective way and it turned out to be evaluating within seconds where as other 2 options are almost taking 10 to 15 mins.
First and Second options are very ineffective because when my objects increase the time taken to remove the duplicates increase in exponential way.

So Finally third option is the best.


This can be logically solved using a property. Here I have a property called a key.

  1. Take out any String property in the object and put it in the list.
  2. Check in the list weather that property contains if so then remove it.
  3. Return the object list.

List<Object> objectList = new ArrayList<>();
 List<String> keyList = new ArrayList<>();
  objectList.forEach( obj -> {
   if(keyList.contains(unAvailabilityModel.getKey())) 
         objectList.remove(unAvailabilityModel); 
    else
        keyList.add(unAvailabilityModel.getKey();
});
return objectList;


We can also use Comparator to check duplicate elements. Sample code is given below,

private boolean checkDuplicate(List studentDTOs){

    Comparator<StudentDTO > studentCmp = ( obj1,  obj2)
            ->{
        if(obj1.getName().equalsIgnoreCase(obj2.getName())
                && obj1.getAddress().equalsIgnoreCase(obj2.getAddress())
                && obj1.getDateOfBrith().equals(obj2.getDateOfBrith())) {
            return 0;
        }
        return 1;
    };
    Set<StudentDTO> setObj = new TreeSet<>(studentCmp);
    setObj.addAll(studentDTOs);
    return setObj.size()==studentDTOs.size();
}


    import java.util.ArrayList;

    import java.util.HashSet;

    class Person

{
    public int age;
    public String name;
    public int hashCode()
    {
       // System.out.println("In hashcode");
        int hashcode = 0;
        hashcode = age*20;
        hashcode += name.hashCode();
        System.out.println("In hashcode :  "+hashcode);
        return hashcode;
    }
     public boolean equals(Object obj)
        {
            if (obj instanceof Person)
                {
                    Person pp = (Person) obj;
                    boolean flag=(pp.name.equals(this.name) && pp.age == this.age); 
                    System.out.println(pp);
                    System.out.println(pp.name+"    "+this.name);
                    System.out.println(pp.age+"    "+this.age);
                    System.out.println("In equals : "+flag);
                    return flag;
                }
                else 
                    {
                    System.out.println("In equals : false");
                        return false;
                    }
         }
    public void setAge(int age)
    {
        this.age=age;
    }
    public int getAge()
    {
        return age;
    }
    public void setName(String name )
    {
        this.name=name;
    }
    public String getName()
    {
        return name;
    }
    public String toString()
    {
        return "[ "+name+", "+age+" ]";
    }
}
class ListRemoveDuplicateObject 
{
    public static void main(String[] args) 
    {
        ArrayList<Person> al=new ArrayList();
        
            Person person =new Person();
            person.setName("Neelesh");
            person.setAge(26);
            al.add(person);
            
            person =new Person();
            person.setName("Hitesh");
            person.setAge(16);
            al.add(person);
        
            person =new Person();
            person.setName("jyoti");
            person.setAge(27);
            al.add(person);
            
            person =new Person();
            person.setName("Neelesh");
            person.setAge(60);
            al.add(person);
            
            person =new Person();
            person.setName("Hitesh");
            person.setAge(16);
            al.add(person);

            person =new Person();
            person.setName("Mohan");
            person.setAge(56);
            al.add(person);
            
            person =new Person();
            person.setName("Hitesh");
            person.setAge(16);
            al.add(person);

        System.out.println(al);
        HashSet<Person> al1=new HashSet();
        al1.addAll(al);
        al.clear();
        al.addAll(al1);
        System.out.println(al);
    }
}

output

[[ Neelesh, 26 ], [ Hitesh, 16 ], [ jyoti, 27 ], [ Neelesh, 60 ], [ Hitesh, 16 ], [ Mohan,56 ], [ Hitesh, 16 ]]

In hashcode :  -801018364
In hashcode :  -2133141913
In hashcode :  101608849
In hashcode :  -801017684
In hashcode :  -2133141913
[ Hitesh, 16 ]
Hitesh    Hitesh
16    16
In equals : true
In hashcode :  74522099
In hashcode :  -2133141913
[ Hitesh, 16 ]
Hitesh    Hitesh
16    16
In equals : true
[[ Neelesh, 60 ], [ Neelesh, 26 ], [ Mohan, 56 ], [ jyoti, 27 ], [ Hitesh, 16 ]]


First override equals() method:

@Override
public boolean equals(Object obj)
{
    if(obj == null) return false;
    else if(obj instanceof MyObject && getTitle() == obj.getTitle() && getAuthor() == obj.getAuthor() && getURL() == obj.getURL() && getDescription() == obj.getDescription()) return true;
    else return false;
}

and then use:

List<MyObject> list = new ArrayList<MyObject>;
for(MyObject obj1 : list)
{
    for(MyObject obj2 : list)
    {
        if(obj1.equals(obj2)) list.remove(obj1); // or list.remove(obj2);
    }
}


Create a new class that wraps your Blog object and provides the equality/hashcode method you need. For maximum efficiency I would add two static methods on the wrapper, one to convert Blogs list -> Blog Wrapper list and the other to convert Blog Wrapper list -> Blog list. Then you would:

  1. Convert your blog list to blog wrapper list
  2. Add your blog wrapper list to a Hash Set
  3. Get the condensed blog wrapper list back out of the Hash Set
  4. Convert the blog wrapper list to a blog list

Code for Blog Wrapper would be something like this:

import java.util.ArrayList;
import java.util.List;

public class BlogWrapper {
    public static List<Blog> unwrappedList(List<BlogWrapper> blogWrapperList) {
        if (blogWrapperList == null)
            return new ArrayList<Blog>(0);

        List<Blog> blogList = new ArrayList<Blog>(blogWrapperList.size());
        for (BlogWrapper bW : blogWrapperList) {
            blogList.add(bW.getBlog());
        }

        return blogList;
    }

    public static List<BlogWrapper> wrappedList(List<Blog> blogList) {
        if (blogList == null)
            return new ArrayList<BlogWrapper>(0);

        List<BlogWrapper> blogWrapperList = new ArrayList<BlogWrapper>(blogList
                .size());
        for (Blog b : blogList) {
            blogWrapperList.add(new BlogWrapper(b));
        }

        return blogWrapperList;
    }

    private Blog blog = null;

    public BlogWrapper() {
        super();
    }

    public BlogWrapper(Blog aBlog) {
        super();
        setBlog(aBlog);
    }

    public boolean equals(Object other) {
        // Your equality logic here
        return super.equals(other);
    }

    public Blog getBlog() {
        return blog;
    }

    public int hashCode() {
        // Your hashcode logic here
        return super.hashCode();
    }

    public void setBlog(Blog blog) {
        this.blog = blog;
    }
}

And you could use this like so:

List<BlogWrapper> myBlogWrappers = BlogWrapper.wrappedList(your blog list here);
Set<BlogWrapper> noDupWrapSet = new HashSet<BlogWrapper>(myBlogWrappers);
List<BlogWrapper> noDupWrapList = new ArrayList<BlogWrapper>(noDupSet);
List<Blog> noDupList = BlogWrapper.unwrappedList(noDupWrapList);

Quite obviously you can make the above code more efficient, particularly by making the wrap and unwrap methods on Blog Wrapper take collections instead of Lists.

An alternative route to wrapping the Blog class would be to use a byte code manipulation library like BCEL to actually change the equals and hashcode method for Blog. But of course, that could have unintended consequences to the rest of your code if they require the original equals/hashcode behaviour.


The easiest and the most effective way would be to allow eclipse to generate and override the equals and hashcode method. Just select the attributes to be checked for duplicates when prompted and you should be all set.

Also once the list is ready, put it into a Set and you have the duplicates gone.


It is recommended to override equals() and hashCode() to work with hash-based collections, including HashMap, HashSet, and Hashtable, So doing this you can easily remove duplicates by initiating HashSet object with Blog list.

List<Blog> blogList = getBlogList();
Set<Blog> noDuplication = new HashSet<Blog>(blogList);

But Thanks to Java 8 which have very cleaner version to do this as you mentioned you can not change code to add equals() and hashCode()

Collection<Blog> uniqueBlogs = getUniqueBlogList(blogList);

private Collection<Blog> getUniqueBlogList(List<Blog> blogList) {
    return blogList.stream()
            .collect(Collectors.toMap(createUniqueKey(), Function.identity(), (blog1, blog2) -> blog1))
            .values();
}
List<Blog> updatedBlogList = new ArrayList<>(uniqueBlogs);

Third parameter of Collectors.toMap() is merge Function (functional interface) used to resolve collisions between values associated with the same key.


This question has already a great bunch of possible solutions. Not to mention it's age

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜