Algorithm to select groups of similar items in 2d array

2022-12-29 18:08 问答作者：

There is a 2d array of items (in my case they are called Intersections).

A certain item is given as a start. The task is to find all items directly or indirectly connected to this item that satisfy a certain function.

So the basic algorithm is like this:

Add the start to the result list. Repeat until no modification: Add each item in the array that satisfies the function and touches any item in the result list to the result list.

My current implementation looks like this:

private IList<Intersection> SelectGroup (
    Intersection start,
    Func<Intersection, Intersection, bool> select)
{
    List<Intersection> result = new List<Intersection> ();

    Queue<Intersection> source = new Queue<Intersection> ();
    source.Enqueue (start);

    while (source.Any ()) {
        var s = source.Dequeue ();
        result.Add (s);

        foreach (var neighbour in Neighbours (s)) {
            if (select (start, neighbour)
                && !result.Contains (neighbour)
                && !source.Contains (neighbour)) {
                source.Enqueue (neighbour);
            }
        }
    }

    Debug.Assert (result.Distinct ().Count () == result.Count ());
    Debug.Assert (result.All (x => select (x, result.First ())));

    return result;
}

private List<Intersection> Neighbours (IIntersection intersection)
{
    int x = intersection.X;
    int y = intersection.Y;

    List<Intersection> list = new List<Intersection> ();

    if (x > 1) {
        list.Add (GetIntersection (x - 1, y));
    }
    if (y > 1) {
        list.Add (GetIntersection (x, y - 1));
    }
    if (x < Size) {
        list.Add (GetIntersection (x + 1, y));
    }
    if (y < Size) {
        list.Add (GetIntersection (x, y + 1));
    }

    return list;
}

(The select function takes a start item and returns true iff the second item satisfies.)

This does its job and turned out to be reasonable fast for the usual array sizes (about 20*20). However, I'm interested in further improvements. Any ideas?

Example (X satisfies in relation to other Xs, . does never satisfy):

....
XX..
.XX.
X...

In thi开发者_C百科s case, there are 2 groups: a central group of 4 items and a group of a single item in the lower left. Selecting the group (for instance by starting item [3, 3]) returns the former, while the latter can be selected using the starting item and sole return value [1, 4].

Example 2:

.A..
..BB
A.AA

This time there are 4 groups. The 3 A groups are not connected, so they are returned as separate groups. The bigger A and B groups are connected, but A does not related to B so they are returned as separate groups.

Step 1: trivial change for huge benefit
Simple, immediate improvement: Both your result.Contains and source.Contains memberships are on list types, so they'll be O(n) in the size of those lists, not very efficient. Since you really don't care about any particular ordering, I'd change both of those into HashSet for constant-time lookups.
Note that your current implementation would be O(n^2) in the worst case, which happens when the entire array is valid (by the time you're inserting the last few elements, you'd be checking each one against the entire rest of the grid).

Step 2: further optimization
Better structural change: Keep a boolean visited array of the same size as your Intersection array, and every time you look at an Intersection, mark it as visited. This way, you don't have to check if something's in result or source every time, and better yet, you don't have to re-evaluate your select predicate. Otherwise, given something like this:

XXX
X.X
XXX

you'd be evaluating select on the center dot four times, which could be bad if it's expensive. This way, your

if (select (start, neighbour)
  && !result.Contains (neighbour)
  && !source.Contains (neighbour))

condition turns into: if (!visited(neighbour) && select(start, neighbour), which will only evaluate select at most once on any given intersection.
Also, if you do this, you don't even need to make result and contains hashes any more, since you won't be doing containment-checks on them.

I am not really good at C# but reading your algorithm description, I can have a few suggestions for you:

1/ Do you know Dynamic programming or Bellman Algorithm? The idea is that for each iteration, you only get the newest result and continue searching. Dynamic Programming

For example :

1 st iteration : x1, x2

2 nd iteration : x3, x4

the third iteration, you only search and compare back with x3 and x4. It will save you a lot of CPU calculation and iteration

2/ Use HashSet to have a better searching and getting data for contains

Based on tzaman's answer, I used a HashSet. Since the queue does not support hashing, I found an improvement that only requires lookup in the result list instead of both result and source list:

HashSet<Intersection> result = new HashSet<Intersection> ();
result.Add (start); // need to add first item in advance

Queue<Intersection> source = new Queue<Intersection> ();
source.Enqueue (start);

while (source.Any ()) {
    var s = source.Dequeue ();

    foreach (var neighbour in Neighbours (s)) {
        if (select (start, neighbour) && !result.Contains (neighbour)) {
            result.Add (neighbour); // add immediately to hashset
            source.Enqueue (neighbour);
        }
    }
}

继续阅读：algorithm

Algorithm to select groups of similar items in 2d array

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？