Minimum set difference

2023-02-27 05:26 问答作者：

i came across this question on this website called codility, but i cant really figure out how to solve it, would appreciate the help

Given an array A of n integers, and the sequence S of n elements 1 or -1 we define the value:

Assume the sum of zero elements is equal zero. Write a function

int min_abs_sum(int[] A);

than given an array A of n integers from the range [-100..100] computes the lowest possible value of val(A,S) (for any sequence S with elements 1 or -1). You can assume that n<=20000 .

For example given array: a={1,5,2,-2}

your function should return 0, since for sequence S=(-1,1,-1,1) the val(A,S)=0.

Here are two links for some peoples result, it doesnt show the solution but it does show the c开发者_StackOverflow社区omplexity of their algorithms, the first link shows the complexity at which the program should run and the second one is slower.

1st link 100% marks

2nd link 86% marks

This is poorly worded version of the partition problem. You are going to split the array A into 2 groups as close to equal as possible. The one with the larger sum you'll be assigning +1 in the S array, and the other group will get -1. Pick a solution to the partition problem and adjust it to return an answer to this one. Actually it's the variant of partition that seeks the best possible value as opposed to 2 equal sets.

EDIT here is some python code based on the paper linked by @Jerry Coffin

def min_abs_sum(A):
vals = []
for x in A:
    for v in vals:
        n = v+x
        if (abs(n)<=1000000) and (n not in vals): vals.append(n)
        n = v-x
        if (abs(n)<=1000000) and (n not in vals): vals.append(n)
    if (x not in vals): vals.append(x)
    if (-x not in vals): vals.append(-x)
return (min([abs(x) for x in vals]))

The one million value is half of 20000 (max numbers in A) times 100/2. I've used a list instead of an array, which means some things will be faster and some slower than what they do in the paper. It is conceivable that the min is achieved by summing the first half of the numbers and subtracting the second half - or something like that which requires large intermediate sums. I'm using a list rather than an array, but the size is still bounded. Sorry, I don't do Java.

This basically works out to partitioning a into two pieces with the sums of the absolute values of the two pieces as close to equal as possible.

You then want to multiply those elements by 1 or -1 to make one partition all negative and the other partition all positive. As you do that, you sum them to get the final answer.

From an algorithmic viewpoint, I believe the partitioning step is almost certainly NP-completely (phrases like "subset sum" and "partition problem" come to mind). From a programming viewpoint, it's pretty simple though -- exhaustively test possibilities until you get the best one. As long as the number of element is small (up to a dozen or so [edit: since it's O(2^N, you could probably increase that to somewhere in the 30-40 range) it'll be reasonably fast.

~~I believe it should be proportional to O(N!) though, so if the array gets at all large, the time taken will quickly become unreasonable.~~Since you're only dividing into two sets and order within sets doesn't matter, it's O(2^N) instead of O(N!). This doesn't grow nearly as quickly as O(N!), but still quickly enough to make large sets unreasonable to process.

I should add, however, that Codility seems to specialize in problems that may initially appear to be NP-complete, but really aren't -- if you've missed any detail in your description, the problem may be substantially easier.

Edit: rereading it, the problem may be that I ignored a crucial detail: the restricted range. I'm not sure how you use it offhand, but I'm pretty sure it's crucial to producing an efficient solution. My immediate guess is that it's based on something similar to changing a comparison-based sort to a counting (aka bucket) sort. I haven't thought through it in any real detail though...

Edit2: Doing a bit of looking (and being prompted by @Moron), the limited range is important, and my thinking about how it figures into a solution was generally correct. @Moron was kind enough to point to the Wikipedia entry for the subset sum problem, but I didn't find that particularly well written. A bit of looking turned up a paper from Cornell with an explanation I found a bit cleaner/more understandable.

The following Java solution will score 100% in Codility. My solution is based on the 'Partition with Duplicate Elements' section of the paper linked by @Jerry Coffin but I also incorporated several additional optimizations.

import java.util.Arrays;

class Solution {

    public int solution ( int[] A ) {

        int n=A.length,r=0,c=1,sum=0,mid=0;

        // Add all numbers, replace them with their absolute value, and sort them
        for(int i=0;i<n;i++) {
          A[i]=Math.abs(A[i]);
          sum+=A[i];
        }
        Arrays.sort(A); // This minimizes the speed of growth of r in the loop below and allows us to count duplicates while scanning the array
        mid=sum/2; // If the number is odd the result is rounded down (the best possible solution is 1 instead of 0).

        // Find the subset of numbers whose sum is closest to half the total sum    
        boolean[] bs=new boolean[mid+101]; // We only need to check sets that are equal or less than half the sum of the numbers (to avoid having to check the sum in the inner loop I sacrifice 100 booleans since 100 is the maximum value allowed)
        bs[0]=true; // The set with zero elements always adds to 0
        for(int i=0;i<n;i++){
          if( A[i]==0 ) continue;
          // Count duplicate entries
          if(i<n-1 && A[i]==A[i+1] ){
            c++;
            continue;
          } 
          // Scan the subset sum result array from right to left and add A[i] c times to existing subset sums
          for (int j = r; j >= 0; j--)
            if(bs[j] ){
              int m= Math.min(mid, j+A[i]*c );
              for(int k= j+A[i];k<=m && !bs[k];k+=A[i] ) bs[k]=true; // To avoid duplicate work when dealing with multiples of previous numbers the loop stops if we find an entry has already been set.
            }
          r=Math.min(mid, r+A[i]*c); // New rightmost subset sum can be no more than the mid point
          while(!bs[r]) r--; // Scan back to rightmost subset sum
          if(r==mid) break; // Found an optimal solution; no need to continue
          c=1;
    }
    return sum-2*r; // The rightmost subset sum that does not go over half the sum is the best solution, compute the difference of the complementary subsets (r and sum-r).
  }

}

So, the goal would be to get as close to 0 as possible.

My first thought was that I would sort the array in descending order, then iterate over the list doing something like this:

int total = 0;
foreach(int i in a)
{
    total = Math.Min(Math.Abs(total - i), Math.Abs(total + i));
}

which would work for a={1,5,2,-2} (total will be the following 5,4,2,0)

But I'm not sure that it works in all cases. I'll look into it a bit more then see if there is a case that it doesn't work for.

EDIT:

Ok, I guess the brute force will work?

    public static int MinArray(int[] array)
    {
        int val = int.MaxValue;

        for (int i = 0; i < Math.Pow(2, array.Length); i++)
        {
            val = Math.Min(CalcMin(array, i), val);
        }

        return val;
    }

    private static int CalcMin(int[] array, int negs)
    {
        int ret = 0;

        for (int i = 0; i < array.Length; i++)
        {
            int neg = 1;

            if (negs != 0)
            {
                neg = negs % 2 == 1 ? -1 : 1;
                negs >>= 1;
            }
            ret += array[i] * neg;
        }

        return Math.Abs(ret);
    }

So, what I'm doing is taking every iteration of S (which is calculated by taking the binary of i in the MinArray) and finding the min that way.

With a little bit of modification, you could also get the correct value for S (if that is a requirement. If not, making it a requirement might score you some points on the interview?)

This could possibly work fast:

maxvalue = 100

def solve(data):
    def mark_sum(s):
        # wrap sum around maxvalue
        if s >= maxvalue:
            s -= maxvalue * 2
        elif sum < -maxvalue:
            s += maxvalue * 2
        # mark sum
        if s >= 0:
            s_new_pos[s] = True
        else:
            s_new_neg[s + maxvalue] = True

    s_old_pos = [False] * maxvalue  # marks for sums [0..99]
    s_old_neg = [False] * maxvalue  # marks for sums [-100..-1]
    s_old_pos[0] = True  # seed array with zero sum for zero elements
    for n in data:
        s_new_pos = [False] * maxvalue
        s_new_neg = [False] * maxvalue
        for i in range(maxvalue):   # mark new possible sums
            if s_old_pos[i]:
                mark_sum(i + n)
                mark_sum(i - n)
            if s_old_neg[i]:
                mark_sum(i - 100 + n)
                mark_sum(i - 100 - n)
        s_old_pos = s_new_pos
        s_old_neg = s_new_neg
    for i in range(maxvalue):
        if s_old_pos[i]:
            return i
        if s_old_neg[-1 - i]:
            return abs(-1 - i)
    raise AssertionError('my bad')

No need to check for all possible sums (up to 1000000). They can be just wrapped around max_value. This replaces n with max_value in time complexity.

Still unsure about correctness :(

def min_abs_sum(A):
    A[:] = sorted([ abs(i) for i in A if i != 0 ], reverse=True)
    s = sum(A)
    h = s / 2
    r = find_balance_iter(h, A)
    return abs(2*(h-r) - s)

def find_balance_iter(v, A):
    r = v
    n = len(A)
    for i in xrange(n):
        if i and A[i] == A[i-1]:
            continue
        for j in xrange(n-i-1):
            vv = v - A[i]
            rr = vv
            AA = A[i+j+1:]
            while True:
                if vv == 0 or vv in AA:
                    return 0
                if vv < 0 or not AA:
                    rr = vv
                    break
                if vv < AA[-1]:
                    rr = min(vv-AA[-1], rr, key=compare)
                    break
                vv -= AA[0]
                AA[:] = AA[1:]
            r = min(r, rr, key=compare)
    return r

def compare(a):
    return (abs(a), a)

继续阅读：algorithm

Minimum set difference

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？