Generating Balls in Boxes
Given two sorted vectors a
and b
, find all vectors which are sums of a
and some permutation of b
, and which are unique once sorted.
You can create one of the sought vectors in the following way:
- Take vector
a
and a permutation of vectorb
. - Sum them together so
c[i]=a[i]+b[i]
. - Sort
c
.
I'm interested in finding the set of b
-permutations that yield the entire set of unique c
vectors.
Example 0: a='ccdd'
and b='xxyy'
'cycydxdx'
, 'cxcxdydy'
, 'cxcydxdy'
.
Notice that the permutations of b
: 'xyxy'
and 'yxyx'
are equal, because in both cases the "box c" and the "box d" both get exactly one 'x'
and one 'y'
.
I guess this is similar to putting M
balls in M
boxes (one in each) with some groups of balls and boxes being identical.
a='aabbbcdddd'
and b='xxyyzzttqq'
your problem will be 10 balls in 4 boxes. There are 4 distinct boxes of size 2, 3, 1 and 4. The balls are pair wise indistinguishable.
Example 1: Given strings are a='xyy'
and b='kkd'
.
'kkd'
, 'dkk'
.
Reason: We see that all unique permutations of b
are 'kkd'
, 'kdk'
and 'dkk'
. However with our restraints, the two first permutations are considered equal as the indices on which the differ maps to the same char 'y'
in string a
.
Example 2: Given strings are a='xyy'
and b='khd'
.
'khd'
, 'dkh'
, 'hkd'
.
Example 3: Given strings are a='xxxx'
and b='khhd'
.
'khhd'
.
I can solve the problem of generating开发者_如何学JAVA unique candidate b
permutations using Narayana Pandita's algorithm as decribed on Wikipedia/Permutation.
'xx'
+'hd'
join→'xh','xd'
sort→'xd','xh'
).
As my M
is often very big, and as similarities in the strings are common, I currently generate way more b
permutations than actually goes through the set filter. I would love to have an algorithm generating the correct ones directly. Any improvement is welcome.
To generate k-combinations of possibly repeated elements (multiset), the following could be useful: A Gray Code for Combinations of a Multiset (1995).
For a recursive solution you try the following:
Count the number of times each character appears. Say they are x1 x2 ... xm, corresponding to m distinct characters.
Then you need to find all possible ordered pairs (y1 y2 ... ym) such that
0 <= yi <= xi
and Sum yi = k.
Here yi is the number of times character i appears.
The idea is, fix the number of times char 1 appears (y1). Then recursively generate all combinations of k-y1 from the remaining.
psuedocode:
List Generate (int [] x /* array index starting at 1*/,
int k /* size of set */) {
list = List.Empty;
if (Sum(x) < k) return list;
for (int i = 0; i <= x[1], i++) {
// Remove first element and generate subsets of size k-i.
remaining = x.Remove(1);
list_i = Generate(remaining, k-i);
if (list_i.NotEmpty()) {
list = list + list_i;
} else {
return list;
}
}
return list;
}
PRIOR TO EDITS:
If I understood it correctly, you need to look at string a, see the symbols that appear exactly once. Say there are k such symbols. Then you need to generate all possible permutations of b, which contain k elements and map to those symbols at the corresponding positions. The rest you can ignore/fill in as you see fit.
I remember posting C# code for that here: How to find permutation of k in a given length?
I am assuming xxyy will give only 1 unique string and the ones that appear exactly once are the 'distinguishing' points.
Eg in case of a=xyy, b=add
distinguishing point is x
So you select permuations of 'add' of length 1. Those gives you a
and d
.
Thus add
and dad (or dda)
are the ones you need.
For a=xyyz b=good
distinguishing points are x and z
So you generate permutations of b of length 2 giving
go
og
oo
od
do
gd
dg
giving you 7 unique permutations.
Does that help? Is my understanding correct?
Ok, I'm sorry I never was able to clearly explain the problem, but here is a solution.
We need two functions combinations
and runvector(v)
. combinations(s,k)
generates the unique combinations of a multiset of a length k
. For s='xxyy'
these would be ['xx','xy','yy']
. runvector(v)
transforms a multiset represented as a sorted vector into a more simple structure, the runvector. runvector('cddeee')=[1,2,3]
.
To solve the problem, we will use recursive generators. We run through all the combinations that fits in box1 and the recourse on the rest of the boxes, banning the values we already chose. To accomplish the banning, combinations
will maintain a bitarray across of calls.
In python the approach looks like this:
def fillrest(banned,out,rv,b,i):
if i == len(rv):
yield None
return
for comb in combinations(b,rv[i],banned):
out[i] = comb
for rest in fillrest(banned,out,rv,b,i+1):
yield None
def balls(a,b):
rv = runvector(a)
banned = [False for _ in b]
out = [None for _ in rv]
for _ in fill(out,rv,0,b,banned):
yield out[:]
>>> print list(balls('abbccc','xyyzzz'))
[['x', 'yy', 'zzz'],
['x', 'yz', 'yzz'],
['x', 'zz', 'yyz'],
['y', 'xy', 'zzz'],
['y', 'xz', 'yzz'],
['y', 'yz', 'xzz'],
['y', 'zz', 'xyz'],
['z', 'xy', 'yzz'],
['z', 'xz', 'yyz'],
['z', 'yy', 'xzz'],
['z', 'yz', 'xyz'],
['z', 'zz', 'xyy']]
The output are in 'box' format, but can easily be merged back to simple strings: 'xyyzzzz'
, 'xyzyzz'
...
精彩评论