Split up a collection, for each subset respecting probabilities for properties of its items

2023-02-16 18:08 问答作者：

For a small game (for which I am a bit forced to use C++, so STL-based solutions can be interesting here), I encountered following neat problem. I was wondering if there is any literature on the subject that I could read, or clever implementations.

Collection S of unique items {E1, E2, E3}, each item E having a set of properties, {P1, P2, P3...}

This collection should be split up in S1, S2, S3, S4. It is defined how large S1..4 have to be exactly. We can assume the collection can be correctly split up in those sizes for the remainder of the problem.

Now, for S1, a number of constraints can appear, {C1, C2..}, which specify that for instance, no items with the property P1 may appear in it. Another constraint could be that it should favour the items with property P2 with a factor of 0.8 (we can assume these types of constraints are normalized for all of the subsets per property).

The "weighting" is not that hard to implement. I simply fill some array with candidate numbers, the ones with higher weight are represented more in this array. I then select a random element of the array. the size of the array determines accuracy/granularity (in my case, a small array suffices).

The problem is forbidding some items to appear. It can easily lead to a situation where one item in S needs to be placed in one of the subsets S1, S2, S3, or S4, but this can no longer happen because the subsets are either all full, or the ones that are not full have a specific constraint that this item cannot appear in the set. So you have to backtrack the placement. Doing so too often may violate the weighted probability too much.

How is this problem called, or does it easily map to another (probably NP) problem?

EDIT: Example:

S = {A, B, C, D, E, F, G, H, I, J, K, L, M }

S1 = [ 0.8 probability of having VOWEL, CANNOT HAVE I or K, SIZE = 6 ]

S2 = [ 0.2 probability of having VOWEL, CANNOT HAVE M, B, E, SIZE = 7 ]

Now, suppose we start filling by FOR(LETTER IN S):

L开发者_如何学CETTER A, create a fill array based on property constraints (0.8 vs 0.2):

[ 1, 1, 1, 1, 1, 1, 1, 2, 2].

Pick a random element from that array: 1.

Now, put A in S1.

For letter I, for instance, the only candidate would be 2, since S1 has a constraint that I cannot appear in it.

Keep doing this, eventually you might end up with:

C = { M } // one more letter to distribute

S1 = A, B, D, E, F, G

S2 = C, F, G, I, K, L

Now, where to place M? I tcannot be placed in S1, since that one is full, and it cannot be placed in S2 because it has a constraint that M cannot be placed in it.

The only way is to backtrack some placement, but then we might mess with the weighted distribution too much (f.i., giving S2 one vowel of S1, which flips around the natural distribution)

Note that this become slightly more complex (in the sense that more backtracks would be needed) when more subsets are in play, instead of just 2.

This has resemblance to a constraint satisfaction problem (CSP) with hard and soft constraints. There are a couple of standard algorithms for that, but you have to check, if they apply to your particular problem instance.

Check wikipedia for starters.

How about this heuristic:

1 Taking into consideration limitations due to constraints and full sets, locate any elements that only meet the criteria for a single set and place them there. If at any point, one of these insertion causes a set to become full, re-evaluate the elements for meeting the criteria for only a single set.

2 Now look only at elements that could fit in exactly two sets. For each element compute the differences in the required probabilities for each set if you added that element vs if you did not. Insert the element into the set where the insert results in the best short term result (first fit / greedy algorithm). If an insert fills up a set, re-evaluate the elements for meeting the criteria for only two sets

3 Continue for elements that fit in 3 sets, 4 sets ... n sets.

At this point all elements will be placed into sets meeting all the constraints, but the probabilities are probably not optimal. You could continue by swapping elements between the sets (only allowing swaps that don't violate constraints), by using a gradient descent or random-restart hill clibing algorithm on a function describing the how closely all the probability's are met. This will tend to converge towards the optimal solution but is not guaranteed to meet it. Continue until you meet your requirements to within an acceptable amount, or until a fixed time limit is met, or until the improvements possible is below a set threshold.

继续阅读：algorithm

Split up a collection, for each subset respecting probabilities for properties of its items

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？