distance between sets even when the sets are unbalanced?
I need to find a best distance equation to find the distance between two sets.
Distance equations are like euclidean, manhantan or any..I have to find the optimal minimal distance between two entities. Entities are sets with elements(floating values). Sets can be equi number of elements or may vary.
for ex:
s1={2.1,3.5,2.7,4.9},
s2={4.2,3.1,2.3}
How can I find the distance between two such kind of sets ?
In my case, 开发者_如何学Pythoneach element is indexed to one position...for ex: s1={w,x,y,z}, s2={w,y,z}..in second set above, x value is missing. Even such scenarios, I have to find the distance.
but euclidean or any distance equation I know, wont solve this problem. Am I missing any distance equations that suits my problem or do I have to normalize the sets in some fashion? does there is any optimal method to find distance between such sets. If possible, please let me know the best distance equations existed.
Edit
thanks for your valuable feedback..based on the distance i want to draw conclusion that whether two entities are similar or not..For example, if two persons are tagged with their context information(sensors information), drawing some conclusions i should say that they both are contextually differ or close from each other..Context information can be vector or set or any array. So, i have to use best distance equation to find the contextual distance between two persons which also can be useful to evaluate their similarities. I need to write some criteria so that it only selects most best context information to find the distance equation. For ex, context information can be given as =(pressure, temperature, intensity, humidity,...)..person c1 context information(1.2,3.5,2.7,9.2) and person c2 context inforamtion(2.1,3.5,4.6)[some times possibility of missing some sensor values]..my challenge is to find optimal distance between two persons[how similar they are]... thanks @all
You need to give more details on what you want to do with this distance...
Have a look at the wikipedia article on distance and norms
To define a distance you just need to define a function that verify the following properties:
Symetry:
Spearation :
Triangular inequality :
So for example :
if x and y are 2 sets:
d1(x,y) = abs(max(x)-max(y)) is not a distance (no separation)
d2(x,y)= cardinal(symetricaldifference(x,y)) (symetrical difference is x union y minus x inter y) is a distance
proof:
d2(x,y) = d(y,x) ok
d2(x,y) = 0 => x=y ok
d2(x,z) > d2(x,y) + d2(y,z) ok just draw it and you will see it works
depending on what you want to do with the distance you can find more intereseting ones..
One more example:
You could chose:
x={a1...an} y={b1...bm}
then d3(x,y)=min(Sum(abs(ai-bj))) + d2(x,y)
// The first element is not well written but means minimize the sum of the absolute value of the difference of all couples of elements (their will be some single elements when the size of the sets are different) and d2 is here in case you have 2 set : {a1...an} and {a1...an,0} so the distance is not 0 (separation)
This is a distance and I thinks it's relevant to compare sets.
精彩评论