开发者

Comparing an array and getting the difference

How would I compare two arrays that might have different lengths and get the difference between each array?

For example:

Cat cat = new Cat();
Dog dog = new 开发者_如何学CDog();
Alligator alligator = new Alligator();

Animal animals[] = { cat, dog };
Animal animals2[] = { cat, dog, alligator };

How would I compare them two arrays and make it return the instance of Alligator?


I would suggest that your question needs to be clarified. Currently, everyone is guessing what about what you are actually asking.

  • Are the arrays intended to represent sets, or lists, or something in between? In other words, does element order matter, and can there be duplicates?
  • What does "equal" mean? Does new Cat() "equal" new Cat()? Your example suggests that it does!!
  • What do you mean by the "difference"? Do you mean set difference?
  • What do you want to happen if the two arrays have the same length?
  • Is this a once-off comparison or does it occur repeatedly for the same arrays?
  • How many elements are there in the arrays (on average)?
  • Why are you using arrays at all?

Making the assumption that these arrays are intended to be true sets, then you probably should be using HashSet instead of arrays, and using collection operations like addAll and retainAll to calculate the set difference.

On the other hand, if the arrays are meant to represent lists, it is not at all clear what "difference" means.

If it is critical that the code runs fast, then you most certainly need to rethink your data structures. If you always start with arrays, you are not going to be able to calculate the "differences" fast ... at least in the general case.

Finally, if you are going to use anything that depends on the equals(Object) method (and that includes any of the Java collection types, you really need to have a clear understanding of what "equals" is supposed to mean in your application. Are all Cat instances equal? Are they all different? Are some Cat instances equal and others not? If you don't figure this out, and implement the equals and hashCode methods accordingly you will get confusing results.


I suggest that you put your objects in sets and then use an intersection of the sets:

// Considering you put your objects in setA and setB

Set<Object> intersection = new HashSet<Object>(setA);
intersection.retainAll(setB);

After that you can use removeAll to get a difference to any of the two sets:

setA.removeAll(intersection);
setB.removeAll(intersection);

Inspired by: http://hype-free.blogspot.com/2008/11/calculating-intersection-of-two-java.html


Well, you could maybe use Set instead and use the removeAll() method.

Or you could use the following simple and slow algorithm for doing:

List<Animal> differences = new ArrayList<Animal>();

    for (Animal a1 : animals) {
       boolean isInSecondArray = false;
       for (Animal a2 : animals2) {
           if (a1 == a2)  {
                isInSecondArray = true;
                break;
           }
       } 

       if (!isInSecondArray)
           differences.add(a1)
    }

Then differences will have all the objects that are in animals array but not in animals2 array. In a similar way you can do the opposite (get all the objects that are in animals2 but not in animals).


You may want to look at this article for more information:

http://download-llnw.oracle.com/javase/tutorial/collections/interfaces/set.html

As was mentioned, removeAll() is made for this, but you will want to do it twice, so that you can create a list of all that are missing in both, and then you could combine these two results to have a list of all the differences.

But, this is a destructive operation, so if you don't want to lose the information, copy the Set and operate on that one.

UPDATE:

It appears that my assumption of what is in the array is wrong, so removeAll() won't work, but with a 5ms requirement, depeending on the number of items to search it could be a problem.

So, it would appear a HashMap<String, Animal> would be the best option, as it is fast in searching.

Animal is an interface with at least one property, String name. For each class that implements Animal write code for Equals and hashCode. You can find some discussion here: http://www.ibm.com/developerworks/java/library/j-jtp05273.html. This way, if you want the hash value to be a combination of the type of animal and the name then that will be fine.

So, the basic algorithm is to keep everything in the hashmaps, and then to search for differences, just get an array of keys, and search through to see if that key is contained in the other list, and if it isn't put it into a List<Object>, storing the value there. You will want to do this twice, so, if you have at least a dual-core processor, you may get some benefit out of having both searches being done in separate threads, but then you will want to use one of the concurrent datatypes added in JDK5 so that you don't have to worry about synchronizations in the combined list of differences.

So, I would write it first as a single-thread and test, to get some ideas as to how much faster it is, also comparing it to the original implmemntation. Then, if you need it faster, try using threads, again, compare to see if there is a speed increase.

Before making any optimization ensure you have some metrics on what you already have, so that you can compare and see if the one change will lead to an increase in speed.

If you make too many changes at a time, one may have a large improvement on speed, but others may lead to a performance decrease, and it wouldn't be seen, which is why each change should be one at a time.

Don't lose the other implementations though, by using unit tests and testing perhaps 100 times each, you can get an idea as to what improvement each change gives you.


I don't care about perf for my usages (and you shouldn't either, unless you have a good reason to, and you find out via your profiler that this code is the bottleneck).

What I do is similar to functional's answer. I use LINQ set operators to get the exception on each list:

http://msdn.microsoft.com/en-us/library/bb397894.aspx

Edit:

Sorry, I didn't notice this is Java. Sorry, I'm off in C# la-la land, and they look very similar :)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜