开发者

Count all subsets of an array where the largest number is the sum of the remaining numbers

I've been struggling with level 3 of the Greplin challenge. For those not familiar, here is the problem:

you must find all subsets of an array where the largest number is the sum of the remaining numbers. For example, for an input of:

(1, 2, 3, 4, 6)

the subsets would be

1 + 2 = 3

1 + 3 = 4

2 + 4 = 6开发者_如何学C

1 + 2 + 3 = 6

Here is the list of numbers you should run your code on. The password is the number of subsets. In the above case the answer would be 4.

3, 4, 9, 14, 15, 19, 28, 37, 47, 50, 54, 56, 59, 61, 70, 73, 78, 81, 92, 95, 97, 99

I was able to code a solution that builds all 4 million plus combinations of the 22 numbers and then tests them all which will get me the right answer. Problem is it takes over 40 minutes to crunch through. Searching around the web it seems like several people were able to write an algorithm to get the answer to this in less than a second. Can anyone explain in pseudo-code a better way to tackle this than the computationally expensive brute-force method? It's driving me nuts!


The trick is that you only need to keep track of counts of how many ways there are to do things. Since the numbers are sorted and positive, this is pretty easy. Here is an efficient solution. (It takes under 0.03s on my laptop.)

#! /usr/bin/python

numbers = [
    3, 4, 9, 14, 15, 19, 28, 37, 47, 50, 54, 56,
    59, 61, 70, 73, 78, 81, 92, 95, 97, 99]

max_number = max(numbers)
counts = {0: 1}
answer = 0
for number in numbers:
    if number in counts:
        answer += counts[number]
    prev = [(s,c) for (s, c) in counts.iteritems()]
    for (s, c) in prev:
        val = s+number;
        if max_number < val:
            continue
        if val not in counts:
            counts[val] = c
        else:
            counts[val] += c
print answer


We know the values are nonzero and grow monontonically left to right.

An idea is to enumerate the the possible sums (any order, left to right is fine) and then enumerate the subsets to the left of of that value, because values on the right can't possibly participate (they'd make the sum too big). We don't have have to instantiate the set; just as we consider each value, see how if affects the sum. It can either be too big (just ignore that value, can't be in the set), just right (its the last member in the set), or too small, at which point it might or might not be in the set.

[This problem made me play with Python for the first time. Fun.]

Here's the Python code; according to Cprofile.run this takes .00772 seconds on my P8700 2.54Ghz laptop.

values = [3, 4, 9, 14, 15, 19, 28, 37, 47, 50, 54, 56, 59, 61, 70, 73, 78, 81, 92, 95, 97, 99]

def count():
   # sort(values) # force strictly increasing order
   last_value=-1
   duplicates=0
   totalsets=0
   for sum_value in values: # enumerate sum values
      if last_value==sum_value: duplicates+=1
      last_value=sum_value
      totalsets+=ways_to_sum(sum_value,0) # faster, uses monotonicity of values
   return totalsets-len(values)+duplicates

def ways_to_sum(sum,member_index):
   value=values[member_index]
   if sum<value:
      return 0
   if sum>value:
      return ways_to_sum(sum-value,member_index+1)+ways_to_sum(sum,member_index+1)
   return 1

The resulting count I get is 179. (Matches another poster's result.)

EDIT: ways_to_sum can be partly implemented using a tail-recursion loop:

def ways_to_sum(sum,member_index):
   c=0
   while True:
      value=values[member_index]
      if sum<value: return c
      if sum==value: return c+1
      member_index+=1
      c+=ways_to_sum(sum-value,member_index)

This takes .005804 seconds to run :-} Same answer.


This runs in less than 5ms (python). It uses a variant of dynamical programming called memoized recursion. The go function computed the number of subsets of the first p+1 elements, which sum up to target. Because the list is sorted it's enough to call the function once for every element (as target) and sum the results:

startTime = datetime.now()
li = [3, 4, 9, 14, 15, 19, 28, 37, 47, 50, 54, 56, 59, 61, 70, 73, 78, 81, 92, 95, 97, 99]
memo = {}
def go(p, target):
    if (p, target) not in memo:
        if p == 0:
            if target == li[0]:
                memo[(p,target)] = 1
            else:
                memo[(p,target)] = 0
        else:
            c = 0       
            if li[p] == target: c = 1
            elif li[p] < target: c = go(p-1,target-li[p])
            c += go(p-1, target)
            memo[(p,target)] = c
    return memo[(p,target)]

ans = 0
for p in range(1, len(li)):
    ans += go(p-1, li[p])

print(ans)
print(datetime.now()-startTime)


This works

public class A {

  static int[] a = {3,4,9,14,15,19,28,37,47,50,54,56,59,61,70,73,78,81,92,95,97,99};

  public static void main(String[] args) {
    List<Integer> b = new ArrayList<Integer>();

    int count = 0;
    for (int i = 0; i < a.length; i++) {
        System.out.println(b);
        for (Integer t:b) {
        if(a[i]==t)
        {
        System.out.println(a[i]);
            count++;
            }
        }

        int size = b.size();
        for (int j = 0; j < size; j++) {
        if(b.get(j) + a[i] <=99)
            b.add(b.get(j) + a[i]);
        }
            b.add(a[i]);
    }

    System.out.println(count);

  }
}

pseudo code(with explanation):

  1. store the following variables

    i.) 'count' of subsets till now

    ii.)an array b which contains sums of all possible subsets

    2.iterate through the array (say a). for each element a[i]

    i.)go through array b and count the number of occurrences of a[i]. add this to 'count'

    ii.)go through array b and for each element b[j].add (a[i]+b[j]) to b because this is a possible subset sum. (if a[i]+b[j]> max element in a, u can ignore to add it)

    iii.)add a[i] to b.

3.u have count :)


I used the combination generator class in Java available here:

http://www.merriampark.com/comb.htm

Iterating through the combos and looking for valid subsets took less than a second. (I don't think using outside code is in keeping with the challenge, but I also didn't apply.)


public class Solution {

   public static void main(String arg[]) {
    int[] array = { 3, 4, 9, 14, 15, 19, 28, 37, 47, 50, 54, 56, 59, 61,
        70, 73, 78, 81, 92, 95, 97, 99 };
    int N = array.length;
    System.out.println(N);
    int count = 0;
    for (int i = 1; i < 1 << N; i++) {
        int sum = 0;
        int max = 0;
        for (int j = 0; j < N; j++) {
        if (((i >> j) & 1) == 1) {
            sum += array[j];
            max = array[j];
        }
        }
        if (sum == 2 * max)
        count++;
    }
    System.out.println(count);
    }

    public static boolean isP(int N) {
    for (int i = 3; i <= (int) Math.sqrt(1.0 * N); i++) {
        if (N % i == 0) {
        System.out.println(i);
        // return false;
        }
    }
    return true;
    }
}

Hope it helps, but don't just copy and paste.


I don't want to beat a dead horse, but most of the solutions posted here miss a key opportunity for optimization and therefore take 6 times longer to execute.

Rather than iterating through the input array and searching for sums that match each value, it is far more efficient to calculate all the possible RELEVANT sums only once, then see which of those sums appear in the original input array. (A "relevant" sum is any subset sum <= the max value in the array.)

The second approach runs approximately 6 times faster -- generally milliseconds rather than centiseconds -- simply because it calls the recursive sum-finding function about 1/6th as many times!

The code for this approach and full explanation can be found in this github repo (it's in PHP because that's what was called for by the person who gave me this challenge):

https://github.com/misterrobinson/greplinsubsets

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜