Computing degree of similarity among a group of sets
Suppose there are 4 sets:
s1={1,2,3,4};
s2={2,3开发者_开发问答,4}; s3={2,3,4,5}; s4={1,3,4,5};Is there any standard metric to present the similarity degree of this group of 4 sets?
Thank you for the suggestion of Jaccard method. However, it seems pairwise. How can I compute the similarity degree of the whole group of sets?
Pairwise, you can compute the Jaccard distance of two sets. It's simply the distance between two sets, if they were vectors of booleans in a space where {1, 2, 3…} are all unit vectors.
Your question isn't very specific. But I suppose you mean something like the "edit distance" between them? I.e. how much you need to change s1 to get to s2?
Check out the Wikipedia article on Edit distance.
As Tobu said I'd use the Jaccard Index which is just the intersection divided by the union of the sets.
you could compute the size of the intersection between each set
You could compute the Euclidean distance between them, and build a dendrogram from that to visualize similarity.
精彩评论