Java collection for this use case
Let's say we have a bunch of Car objects.
Each Car has some distinguishing properties e.g. manufacturer, model, year, etc. (these can be used to create distinct hashCodes).
Each car has a List of PurchaseOffer objects (a PurchaseOffer object contains pricing\retailer info).
We receive Lists of Cars from several different sources, each Car with a single PurchaseOffer. Thing is, these lists may overlap - a Car can appear in more than one list.
We wish to aggregate the lists into a single collection of Cars where each Car holds all encountered PurchaseOffers for it.
My Problem is choosing what to collection to use in this 开发者_Python百科aggregation process:
Feels natural to use java.util.HashSet for holding our cars, that way when going over the different lists of Cars, we can check if a car already exists in the Set in amortized O(1), however - you cannot retrieve an element from a Set (in our case - when we go encounter a Car that already exists in the Set - we would have liked to retrieve that Car from the Set based on its identifying hashCode and add PurchaseOffers to it).
I can use a HashMap where each Car's hashCode maps to the actual Car object, but it probably isn't the school-book solution since it is unsafe - I would have to make sure myself that every hashCode maps to a Car with that hashCode - there could be inconsistency. Of course, can make a designated data structure that guarantees this consistency - Shouldn't one already exist ?
Can anyone suggest the data-structure I am after, or point out a design mistake ? Thanks.
Since this is a many-to-many relationship, you need a bi-directional multi-map. Car is the key for the first one, with a List of PurchaseOrder as the value. The PurchaseOrder is the key for the second one, with a List of Cars as the value.
The underlying implementation is two HashMaps.
Put an API on top of it to get the behavior you need. Or see if Google Collections can help you. It's a combination of a BiMap and two MultiMaps.
I think that you really do need (at least) a HashMap<Car, List<PurchaseOffer>>
... as suggested by @Andreas_D
Your objection that each Car
already has a List<PurchaseOffer>
is beside the point. The list in the HashMap
is the aggregate list, containing all PurchaseOffer
objects from all Car
objects that stand for the same physical car.
The point of creating a new list is to avoid changing the original lists on the original Car
objects. (If that was not a concern, then you could pick one instance of Car
from the set that represent a physical car, and merge the PurchaseOffer
objects from the others into that list.)
I'm not entirely sure why @duffymo suggested a bi-directional map between, but I think it is because the different Car
objects from different sources may have complementary (or contradictory) information for the same physical car. By keeping all instances, you avoid discarding information. (Once again, if you are happy to discard mutate and/or discard information, you could attempt to merge the information about each individual car into a single Car
object.
If you really didn't care about preserving information and were prepared to merge stuff willy-nilly then the following approach would probably work:
HashMap<Car, Car> map = new HashMap<Car, Car>(...);
for (Car car : carsToBeAggregated) {
Car master = nap.get(car);
if (master == null) {
map.put(car, car);
} else {
master.offers.addAll(car.offers);
// optionally, merge other Car information from car to master
}
}
You should NOT be trying to use the Car.hashCode()
as a key for anything. Hashcode values are not unique identifiers: there is a distinct possibility that two different cars will end up with the same hashcode value. If you attempt to use them as if they were unique identifiers you'll get into trouble ...
The basic datastructure should be a HashMap<Car, List<PurchaseOffer>>
. This allows for storing and receiving all offers for one selected car.
Now you may have to find a suitable implementation for Car.equals()
to assure, that "cars" coming from different source are really the same. What about basing equals()
on a unique identifier for a real world car (VIN)?
I would prefer to use a HashMap<Car, List<PurchaseOffer>>
, as suggested before (Andreas, Stephen), mainly if the Car object does not hold the list of PurchaseOffers.
Otherwise I would consider using a HashMap<Car, Car>
or, better IMO, a HashMap<ID, Car>
if there is an unique ID for each Car.
It can not simply map the Car's hashCode to the Car, as mentioned in the question, since distinct Cars can have the same hashCode!
(Anyway, I would create an own class for storing and managing the Cars. This would contain the HashMap, or whichever - so it's easy to change the implementation without needing to change its interface)
create tout custom class that extends hash
Set,
override method contains(Object o)
check there os hash code is same or not and return result according, and add object to set of and only if it not containing that object
How about a defining a new custom Aggregation class? Define the hashcode such that the id of the car acts as the key and override the equals() accordingly. Define a custom method for accepting your original car and do a union operation on the lists. Finally store the custom objects in a HashSet for achieving constant time look up.
In purist terms, aggregation is a behavior beyond the scope of a single object. Visitor pattern tries to address a similar problem.
Alternatively if you have a sql datastore, a simple select using group by would do the trick.
Welp, yeah,
HashMap<Car, List<PurchaseOffer>>
would be perfect if it wasn't for the fact that eachCar
contains aList<PurchaseOffer>
as a property. Can say that aCar
object is composed of two parts: an identifying part (let's say each car indeed has a unique VIN), and the list ofPurchaseOffer
s.
In this case split the Car class in two classes - the CarType class with the identifying attributes, and then the list part (maybe both together used by Car
). Then use Map<CarType, Lost<PurchaseOffer>
for your datastructure (or MultiMap<CarType, PurchaseOffer>
).
//alt. 1
List<Offer> offers;
List<Car> cars;
Map<Car, List<Offer>> mapCarToOffers;
Map<Offer, List<Car>> mapOfferToCars;
public void List<Offer> getOffersForCar(Car aCar);
public void List<Car> getCarsForOffer(Offer anOffer);
Alternative 1 would make use of the hashCode()
of Car
and Offer
//alt. 2
List<Offer> offers;
List<Car> cars;
Map<Integer, List<Offer>> mapCarIdToOffers;
Map<Integer, List<Car>> mapOfferIdToCars;
public void List<Offer> getOffersForCarId(int aCarId);
public void List<Car> getCarsForOfferId(int anOfferId);
Alternative 2 would make use of the hashCode()
of Integer
. This would allay your concerns about "safety" as the hash codes for Integer
objects should not overlap where the values are unique. This incurs the additional overhead of having to maintain unique IDs for each Car
and Offer
object, however, I am guessing that you probably already have those from your business requirements.
Note, you may choose to use other classes as alternative to int
s for ID's (e.g. String
).
For both alternatives, implement the List
s with ArrayList
or LinkedList
- which one is better is up to you to determine based on other requirements, such as the frequency of insertion/deletion vs lookup. Implement the Map
s with HashMap
- see comments above about how hash codes are used.
As a side note, in our software, we use these both of the above to represent similar types of many-to-many data. Very similar to your use case. Both alternatives work very well.
Why not use an object database for this? You could store any object graph you wanted, and you'd get a search API with which you could do any relationship/retrieval mechanism you wanted. A simple collection could work, but it sounds like you want a more complex relationship than a collection would provide. Look into db4o (http://db4o.com) - it's very powerful for this sort of thing.
精彩评论