开发者

Algorithm to select groups of similar items in 2d array

There is a 2d array of items (in my case they are called Intersections).

A certain item is given as a start. The task is to find all items directly or indirectly connected to this item that satisfy a certain function.

So the basic algorithm is like this:

Add the start to the result list. Repeat until no modification: Add each item in the array that satisfies the function and touches any item in the result list to the result list.

My current implementation looks like this:

private IList<Intersection> SelectGroup (
    Intersection start,
    Func<Intersection, Intersection, bool> select)
{
    List<Intersection> result = new List<Intersection> ();

    Queue<Intersection> source = new Queue<Intersection> ();
    source.Enqueue (start);

    while (source.Any ()) {
        var s = source.Dequeue ();
        result.Add (s);

        foreach (var neighbour in Neighbours (s)) {
            if (select (start, neighbour)
                && !result.Contains (neighbour)
                && !source.Contains (neighbour)) {
                source.Enqueue (neighbour);
            }
        }
    }

    Debug.Assert (result.Distinct ().Count () == result.Count ());
    Debug.Assert (result.All (x => select (x, result.First ())));

    return result;
}

private List<Intersection> Neighbours (IIntersection intersection)
{
    int x = intersection.X;
    int y = intersection.Y;

    List<Intersection> list = new List<Intersection> ();

    if (x > 1) {
        list.Add (GetIntersection (x - 1, y));
    }
    if (y > 1) {
        list.Add (GetIntersection (x, y - 1));
    }
    if (x < Size) {
        list.Add (GetIntersection (x + 1, y));
    }
    if (y < Size) {
        list.Add (GetIntersection (x, y + 1));
    }

    return list;
}

(The select function takes a start item and returns true iff the second item satisfies.)

This does its job and turned out to be reasonable fast for the usual array sizes (about 20*20). However, I'm interested in further improvements. Any ideas?

Example (X satisfies in relation to other Xs, . does never satisfy):

....
XX..
.XX.
X...

In thi开发者_C百科s case, there are 2 groups: a central group of 4 items and a group of a single item in the lower left. Selecting the group (for instance by starting item [3, 3]) returns the former, while the latter can be selected using the starting item and sole return value [1, 4].

Example 2:

.A..
..BB
A.AA

This time there are 4 groups. The 3 A groups are not connected, so they are returned as separate groups. The bigger A and B groups are connected, but A does not related to B so they are returned as separate groups.


Step 1: trivial change for huge benefit
Simple, immediate improvement: Both your result.Contains and source.Contains memberships are on list types, so they'll be O(n) in the size of those lists, not very efficient. Since you really don't care about any particular ordering, I'd change both of those into HashSet for constant-time lookups.
Note that your current implementation would be O(n^2) in the worst case, which happens when the entire array is valid (by the time you're inserting the last few elements, you'd be checking each one against the entire rest of the grid).

Step 2: further optimization
Better structural change: Keep a boolean visited array of the same size as your Intersection array, and every time you look at an Intersection, mark it as visited. This way, you don't have to check if something's in result or source every time, and better yet, you don't have to re-evaluate your select predicate. Otherwise, given something like this:

XXX
X.X
XXX

you'd be evaluating select on the center dot four times, which could be bad if it's expensive. This way, your

if (select (start, neighbour)
  && !result.Contains (neighbour)
  && !source.Contains (neighbour))

condition turns into: if (!visited(neighbour) && select(start, neighbour), which will only evaluate select at most once on any given intersection.
Also, if you do this, you don't even need to make result and contains hashes any more, since you won't be doing containment-checks on them.


I am not really good at C# but reading your algorithm description, I can have a few suggestions for you:

1/ Do you know Dynamic programming or Bellman Algorithm? The idea is that for each iteration, you only get the newest result and continue searching. Dynamic Programming

For example :

1 st iteration : x1, x2

2 nd iteration : x3, x4

the third iteration, you only search and compare back with x3 and x4. It will save you a lot of CPU calculation and iteration

2/ Use HashSet to have a better searching and getting data for contains


Based on tzaman's answer, I used a HashSet. Since the queue does not support hashing, I found an improvement that only requires lookup in the result list instead of both result and source list:

HashSet<Intersection> result = new HashSet<Intersection> ();
result.Add (start); // need to add first item in advance

Queue<Intersection> source = new Queue<Intersection> ();
source.Enqueue (start);

while (source.Any ()) {
    var s = source.Dequeue ();

    foreach (var neighbour in Neighbours (s)) {
        if (select (start, neighbour) && !result.Contains (neighbour)) {
            result.Add (neighbour); // add immediately to hashset
            source.Enqueue (neighbour);
        }
    }
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜