开发者

How can I tell which objects in a list have the most in common with another object of the same type?

What I'd like to do is have a collection of objects with properties, and pass in an object to act as a query template. How can I sort or prioritize the objects whose property values have the most in common with a given input object of the same type?

More details:

        List<A> myList = new List<A>() {new A() {b开发者_如何学JAVA="x"},
                                        new A() {c="r"},
                                        new A() {b="x",c="r"},};

        var myTemplate = new A() {b = "x", c="r"};

I'd like this example to match on the third item, but in the case where property c is null or "f", it should return the first and third item. If property c is "r", but b is null or "f", it should return the second and third item, because they match on c.


You'll basically have to come up with a formula for determining how similar the two objects are. Pick a weight for each property and then use simple comparison to say whether that property should be counted as the same. Fuzzy matching of some type could be used, though that is going to be more complex.

Something simple could be:

public byte Similarity(SomeType other)
{
    byte similarity = 0;
    if (this.Property1 == other.Property1)
        similarity += 25;
    if (this.Property2 == other.Property2)
        similarity += 13;
    if (this.Property3 == other.Property3)
        similarity += 12;
    if (SomeFuzzyComparisonReturnsVerySimilar(this.Property4, other.Property4))
        similarity += 50;
    return similarity;
}

That is a simple method that I am defining to return a number from 0 to 100; 100 being the same and 0 being totally different.

Once you have that, it is a fairly simple matter to select out the items that are similar enough for you to consider; eg:

var similarObjects = ListOfSomeTypes.Where(s => s.Similarity(templateObject) > 75);

Or to sort them:

var sortedBySimilarity = ListOfSomeTypes.OrderByDescending(s => s.Similarity(templateObject));

Ultimately though my point is that you have to come up with your own definition of "having the most in common with", once you have that the rest will probably be pretty easy. Not that coming up with that will necessarily be easy.

With the additional details in your question, a possible formula would be:

public byte Similarity(A other)
{
    byte similarity = 0;
    if (this.b == null | other.b == null)
        similarity += 25;
    else if (this.b == other.b)
        similarity += 50;
    if (this.c == null | other.c == null)
        similarity += 25;
    else if (this.c == other.c)
        similarity += 50;
    return similarity;
}

This weights exact matches highest, null values in one object slightly less, and differences not at all.


I've done a ton of fuzzy matching over huge data sets, and there are lots of scenarios to consider. You seem to be approaching a simple or generic case, and for those cases without lots of data involved some kind of general string distance comparisons seem appropriate.

If performance matters, my best advice is "know your data." Write your own scoring, as suggested above.

Having said that, we use Levenshtein distance for fuzzy string matching. It is very non-specific in terms of the "distance" between two strings, so it may or may not be appropriate for a given problem. Here is a quick copy/paste of the algorithm in C#. It ports to most languages very easily. This will throw an exception on null inputs, so be sure to add your own special case handling as you see fit.

public static int LevenshteinDistance(string s, string t)
{
    var sLen = s.Length;
    var tLen = t.Length;

    var d = new int[sLen + 1, tLen + 1];

    for (var i = 0; i <= sLen; d[i, 0] = i++) { }
    for (var j = 0; j <= tLen; d[0, j] = j++) { }

    for (var i = 1; i <= sLen; i++)
    {
        for (var j = 1; j <= tLen; j++)
        {
            var cost = (t[j - 1] == s[i - 1]) ? 0 : 1;
            d[i, j] = Math.Min(
                Math.Min(d[i - 1, j] + 1,   // a deletion
                d[i, j - 1] + 1),           // an insertion
                d[i - 1, j - 1] + cost);    // a substitution
        }
    }

    return d[sLen, tLen];
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜