Python script to calculate aded combinations from a dictionary
I am trying to write a script that will take a dictionary of items, each containing properties of values from 0 - 10, and add the various elements to select which combination of items achieve the desired totals. I also need the script to do this, using only items that have the same "slot" in common.
For example:
item_list = {
'item_1': {'slot': 'top', 'prop_a': 2, 'prop_b': 0, 'prop_c': 2, 'prop_d': 1 },
'item_2': {'slot': 'top', 'prop_a': 5, 'prop_b': 0, 'prop_c': 1, 'prop_d':-1 },
'item_3': {'slot': 'top', 'prop_a': 2, 'prop_b': 5, 'prop_c': 2, 'prop_d':-2 },
'item_4': {'slot': 'mid', 'prop_a': 5, 'prop_b': 5, 'prop_c':-5, 'prop_d': 0 },
'item_5': {'slot': 'mid', 'prop_a':10, 'prop_b': 0, 'prop_c':-5, 'prop_d': 0 },
'item_6': {'slot': 'mid', 'prop_a':-5, 'prop_b': 2, 'prop_c': 3, 'prop_d': 5 },
'item_7': {'slot': 'bot', 'prop_a': 1, 'prop_b': 3, 'prop_c':-4, 'prop_d': 4 },
'item_8': {'slot': 'bot', 'prop_a': 2, 'prop_b': 2, 'prop_c': 0, 'prop_d': 0 },
'item_9': {'slot': 'bot', 'prop_a': 3, 'prop_b': 1, 'prop_c': 4, 'prop_d':-4 },
}
The script would then need to select which combinations from the "item_list" dict that using 1 item per "slot" that would achieve a desired result when added.
For example, if the desired result was: 'prop_a': 3, 'prop_b': 3, 'prop_c': 8, 'prop_d': 0, t开发者_如何学Gohe script would select 'item_2', 'item_6', and 'item_9', along with any other combination that worked.
'item_2': {'slot': 'top', 'prop_a': 5, 'prop_b': 0, 'prop_c': 1, 'prop_d':-1 }
'item_6': {'slot': 'mid', 'prop_a':-5, 'prop_b': 2, 'prop_c': 3, 'prop_d': 5 }
'item_9': {'slot': 'bot', 'prop_a': 3, 'prop_b': 1, 'prop_c': 4, 'prop_d':-4 }
'total': 'prop_a': 3, 'prop_b': 3, 'prop_c': 8, 'prop_d': 0
Any ideas how to accomplish this? It does not need to be in python, or even a thorough script, but just an explanation on how to do this in theory would be enough for me. I have tried working out looping through every combination, but that seems to very quickly get our of hand and unmanageable. The actual script will need to do this for about 1,000 items using 20 different "slots", each with 8 properties.
Thanks for the help!
Since the properties can have both positive and negative values, and you need all satisfactory combinations, I believe there is no "essential" optimization possible -- that is, no polynomial-time solution (assuming P != NP...;-). All solutions will come down to enumerating all the one-per-slot combinations and checking the final results, with very minor tweaks possible that may save you some percent effort here or there, but nothing really big.
If you have 1000 items in 20 possible slots, say equally distributed at about 50 items per slot, there are around 50**20
possibilities overall, i.e, 9536743164062500000000000000000000
-- about 10**34
(a myriad billions of billions of billions...). You cannot, in general, "prune" any subtree from the "all-solutions search", because no matter the prop values when you have a hypothetical pick for the first 20-p
slots, there still might be a pick of the remaining p
slots that could satisfy the constraint (or, more than one).
If you could find an exact polynomial-time solution for this, a NP-complete problem, you'd basically have revolutionized modern mathematics and computer science -- Turing prizes and Field medals would only be the start of the consequent accolades. This is not very likely.
To get down to a feasible problem, you'll have to relax your requirements in some ways (accept the possibility of finding just a subset of the solutions, accept a probabilistic rather than deterministic approach, accept approximate solutions, ...).
Once you do, some small optimizations may make sense -- for example, start with summing constants (equal to one more than the smallest negative value of each propriety) to all the property values and targets, so that every prop value is > 0 -- now you can sort the slots by (e.g.) value for some property, or the sum of all properties, and do some pruning based on the knowledge that adding one more slot to a partial hypothetical solution will increase each cumulative prop value by at least X and the total by at least Y (so you can prune that branch if either condition makes the running totals exceed the target). This kind of heuristic approximation need not make the big-O behavior any better, in general, but it may reduce the expected multiplier value by enough to get the problem closer to being computationally feasible.
But it's not even worth looking for such clever little tricks if there's no requirement relaxation possible: in that case, the problem will stay computationally unfeasible, so looking for clever little optimizations would not be practically productive anyway.
This problem is essentially a generalization of the subset-sum problem (which is NP-complete, yes) to multiple dimensions. To restate the problem (to make sure we're solving the same problem): you have 1000 items divided into 20 classes (which you call slots). Each item has an integer value in [-10,10] for each of 8 properties; thus each item can be considered to have a value which is an 8-dimensional vector. You want to pick one item from each slot, so that the total value (adding these 8-dimensional vectors) is a given vector.
In the example you gave, you have 4 dimensions, and the 9 items in 3 classes have values (2,0,2,1), (5,0,1,-1), … etc., and you want to pick one item from each class to make the sum (3,3,8,0). Right?
Brute-force
First, there is the brute-force search that enumerates all possibilities. Assuming your 1000 items are divided equally into the 20 classes (so you have 50 in each), you have 50 choices for each class, which means you'd have to check 5020=9536743164062500000000000000000000 choices (and for each of them you need to add up the 20 elements along each of the 8 coordinates and check, so the running time would be ∝ 5020·20·8): this is not feasible.
Dynamic-programming, single-shot
Then there is the dynamic-programming solution, which is different, and in practice often works where brute-force is infeasible, but in this case unfortunately seems infeasible as well. (You'd improve it exponentially if you got better bounds on your "property values".) The idea here is to keep track of one way of reaching each possible sum. The sum of 20 numbers from [-10,10] lies in [-200,200], so there are "only" 4008=655360000000000000000 possible sums for your 8-dimensional vector. (This is a tiny fraction of the other search space, but it's no consolation to you. You can also take, for each "property", the difference between the sums of [largest item in each class] and [smallest item in each class] to replace the 400 with a smaller number.) The idea of the dynamic-programming algorithm is the following.
Let last[(a,b,c,d,e,f,g,h)][k] denote one item you can take from the kth class (along with one item each from the first k-1 classes) to make the sum exactly (a,b,c,d,e,f,g,h). Then, pseudocode:
for k=1 to 20: for each item i in class k: for each vector v for which last[v][k-1] is not null: last[v + value(i)][k] = i
Then, if your desired final sum is s, you pick item last[s][k] from the kth class, item last[s-value(i)][k-1] from the (k-1)th class, and so on. This takes time ∝ 20·50·4008·8 in the worst case (only a loose upper bound, not a tight analysis).
Dynamic-programming, separately
So much for "perfect" solutions. However, if you allow heuristic solutions and those that "will most likely work in practice", you can do better (even for solving the problem exactly). For instance, you can solve the problem separately for each of the 8 dimensions. This is even easier to implement, takes only time ∝ 20·50·400·8=3200000 in the worst case, and you can do it quite easily. If you keep last[][] as a list, instead of a single element, then at the end you have (effectively) a list of subsets which achieve the given sum for that coordinate (in "product form"). In practice, not many subsets may add up exactly to the sum you want, so you can start with the coordinate for which the number of subsets is smallest, then try each of those subsets for the other 7 coordinates. The complexity of this step depends on the data in the problem, but I suspect (or one can hope) that either (1) there will be very few sets with equal sums, in which case this intersection will whittle down the number of sets to check, or (2) there will be many sets with a given sum, in which case you'll find one quite early.
In any case, doing the dynamic-programming separately for each coordinate first is definitely going to allow you to search over a much smaller space in the second stage.
Approximate algorithms
If you don't need the sums to be exactly equal and will accept sums that are within a certain factor of your required sum, there is a well-known idea used to get an FPTAS (fully polynomial-time approximation scheme) for the subset-sum problem, which runs in time polynomial in (number of items, etc.) and 1/ε. I've exhausted my time to explain this, but you can look it up — basically, it just replaces the 4008 space by a smaller one, by e.g. rounding numbers up to the nearest multiple of 5, or whatever.
This sounds like a variation of the Knapsack problem, which is commonly solved with dynamic programming.
But, you could probably write a fairly simple solution (but slower) using recursion:
def GetItemsForSlot(item_list, slot):
return [ (k,v) for (k,v) in item_list.items() if v['slot'] == slot]
def SubtractWeights(current_weights, item_weights):
remaining_weights = {}
for (k,v) in current_weights.items():
remaining_weights[k] = current_weights[k] - item_weights[k]
return remaining_weights
def AllWeightsAreZero(remaining_weights):
return not [v for v in remaining_weights.values() if v != 0]
def choose_items(item_list, remaining_weights, available_slots,
accumulated_items=[ ]):
print "choose_items: ", remaining_weights, available_slots, \
accumulated_items
# Base case: we have no more available slots.
if not available_slots:
if AllWeightsAreZero(remaining_weights):
# This is a solution.
print "SOLUTION FOUND: ", accumulated_items
return
else:
# This had remaining weight, not a solution.
return
# Pick the next available_slot
slot = available_slots[0]
# Iterate over each item for this slot, checking to see if they're in a
# solution.
for name, properties in GetItemsForSlot(item_list, slot):
choose_items(item_list, # pass the items recursively
SubtractWeights(remaining_weights, properties),
available_slots[1:], # pass remaining slots
accumulated_items + [name]) # Add this item
if __name__ == "__main__":
total_weights = {
'prop_a': 3,
'prop_b': 3,
'prop_c': 8,
'prop_d': 0
}
choose_items(item_list, total_weights, ["top", "mid", "bot"])
This was tested, and seemed to work. No promises though :)
Keeping slot & prop_a as properties of the same object made it a little harder to work with. I'd suggest using classes instead of a dictionary to make the code easier to understand.
I have tried working out looping through every combination, but that seems to very quickly get our of hand and unmanageable. The actual script will need to do this for about 1,000 items using 20 different "slots", each with 8 properties.
It might help your thinking to load the structure in a nice object hierarchy first and then solve it piecewise.
Example:
class Items(dict):
def find(self, **clauses):
# TODO!
class Slots(dict):
# TODO!
items = Items()
for item, slots in item_list.items():
items[item] = Slots(slots)
# consider abstracting out slot based on location (top, mid, bot) too
print items.find(prop_a=3, prop_b=3, prop_c=8, prop_d=0)
精彩评论